dariuszseweryn commented on PR #10053:
URL: https://github.com/apache/nifi/pull/10053#issuecomment-3079256710

   > The ConsumeKafka Processor does different output FlowFiles when records 
have different associated Schemas. However, instead of comparing whether a 
Record Schema is compatible with Schema from the first Record, it groups output 
based on Record Schema equality. It still sends invalid records to a 
parse.failure Relationship, but it provides a logical grouping based on common 
Record Schema references. This fits the scenario of embedded schema references, 
and can also work for scenarios where schema inference produces different 
Record Schema results for different Records. This approach would also avoid 
some of the performance concerns related to evaluate a KinesisRecord multiple 
times, and avoids the need for changes to Record Writers.
   
   The downside of such approach is that offset tracking gets impossible 
without per-row sequence/sub-sequence numbers. Was this considered for in 
`ConsumeKafka`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to