[ 
https://issues.apache.org/jira/browse/NIFI-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281240#comment-16281240
 ] 

ASF GitHub Bot commented on NIFI-4675:
--------------------------------------

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/2326
  
    @jasper-k the travis-ci builds show there are some contrib-check failures.  
Make sure you run that locally so you can see them more easily.  "mvn clean 
install -Pcontrib-check"  Please rebase to latest master, resolve the 
contrib-check, and force push.



> PublishKafka_0_10 can't use demarcator and kafka key at the same time
> ---------------------------------------------------------------------
>
>                 Key: NIFI-4675
>                 URL: https://issues.apache.org/jira/browse/NIFI-4675
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Jasper Knulst
>              Labels: performance
>
> At the moment you can't split up a flowfile using a demarcator AND set the 
> Kafka key (kafka.key) attribute for all resulting Kafka records at the same 
> time. The code explicitly prevents this.
> Still it would be a valuable performance booster to have the ability to use 
> both at the same time in all cases where 1 flowfile contains many individual 
> kafka records. Flowfiles would not have to be pre split (explosion of NiFi 
> overhead) if you want to set the key. 
> Note:
> Using demarcator and kafka key at the same time will normally make every 
> resulting kafka record from 1 incoming flowfile to have the same kafka key 
> (see REMARK).
> I know a live NiFi deployment where this fix/feature (provided as custom fix) 
> led to a 500 - 600% increase in throughput. Others could and should benefit 
> as well.
> REMARK
> The argument against this feature has been that it is not a good idea to 
> intentionally generate many duplicate Kafka keys. I would argue that it is up 
> to the user to decide. Most would use Kafka as a pure distributed log system 
> and key uniqueness is not important. The kafka key can be really valuable 
> grouping placeholder though. The only case where this would get problematic 
> is on  compaction of Kafka topics when kafka keys are deduplicated. But after 
> we put sufficient warnings and disclaimers for this risk in the tooltips it 
> is up to the user to decide whether to use the performance booster.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to