Mark Payne created NIFI-15669:
---------------------------------

             Summary: Refactor ConsumeKinesis to avoid dependency on KCL
                 Key: NIFI-15669
                 URL: https://issues.apache.org/jira/browse/NIFI-15669
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne


The existing ConsumeKinesis Processor depends on the Kinesis Client Library, 
which is a reasonable choice, in general. However, the Kinesis Client Library 
is very heavy-weight, brings in a lot of transitive dependencies and was 
designed with some assumptions in mind that do not align well with use in NiFi.

Specifically, startup time is very long - typically 10-15 minutes for the first 
iteration as it waits for all consumers to join before allowing any consumer to 
pull data. While this makes sense for a big, long-running job that you might 
run from Spark or Flink, it is too long to be reasonable for NiFi and often 
leads to users thinking it's not working.

More importantly, the KCL library, when using Enhanced Fan-Out Consumer, 
buffers up to 11 Kinesis Events per shard. This limit of 11 events is hard 
coded and cannot be configured. An Event is effectively a batch of records, 
which typically amounts to 1-2 MB. For a few shards and/or where NiFi is 
keeping up without issue, this works well. However, if backpressure gets 
applied we end up buffering 2 MB * 11 = 22 MB or so per shard. Many users have 
Kinesis Streams with hundreds of shards, some even with 1,000+ shards. With 
1,000 shards that's 22 GB of data buffered in heap. And this is considering 
only the 'payload'. There is additional overhead from metadata and Java objects.

We need a faster, more stable solution. The new solution can still make use of 
DynamoDB for checkpointing and support more or less the same options and 
configuration in a backward compatible manner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to