[jira] [Commented] (NIFI-4675) PublishKafka_0_10 can't use demarcator and kafka key at the same time

ASF GitHub Bot (JIRA) Wed, 06 Dec 2017 13:03:17 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280906#comment-16280906
 ]


ASF GitHub Bot commented on NIFI-4675:
--------------------------------------

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2326
  
    Given your description of the user understanding the decision they're 
making I think I agree with you.
    
    @markap14  what do you think?
    
    @jasper-k we'll want to make sure kafka 10, 11, and 1 are all updated for 
this but lets see what mark thinks too.
    
    Thanks for contributing and sharing your findings


> PublishKafka_0_10 can't use demarcator and kafka key at the same time
> ---------------------------------------------------------------------
>
>                 Key: NIFI-4675
>                 URL: https://issues.apache.org/jira/browse/NIFI-4675
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Jasper Knulst
>              Labels: performance
>
> At the moment you can't split up a flowfile using a demarcator AND set the 
> Kafka key (kafka.key) attribute for all resulting Kafka records at the same 
> time. The code explicitly prevents this.
> Still it would be a valuable performance booster to have the ability to use 
> both at the same time in all cases where 1 flowfile contains many individual 
> kafka records. Flowfiles would not have to be pre split (explosion of NiFi 
> overhead) if you want to set the key. 
> Note:
> Using demarcator and kafka key at the same time will normally make every 
> resulting kafka record from 1 incoming flowfile to have the same kafka key 
> (see REMARK).
> I know a live NiFi deployment where this fix/feature (provided as custom fix) 
> led to a 500 - 600% increase in throughput. Others could and should benefit 
> as well.
> REMARK
> The argument against this feature has been that it is not a good idea to 
> intentionally generate many duplicate Kafka keys. I would argue that it is up 
> to the user to decide. Most would use Kafka as a pure distributed log system 
> and key uniqueness is not important. The kafka key can be really valuable 
> grouping placeholder though. The only case where this would get problematic 
> is on  compaction of Kafka topics when kafka keys are deduplicated. But after 
> we put sufficient warnings and disclaimers for this risk in the tooltips it 
> is up to the user to decide whether to use the performance booster.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4675) PublishKafka_0_10 can't use demarcator and kafka key at the same time

Reply via email to