[
https://issues.apache.org/jira/browse/NIFI-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680705#comment-16680705
]
ASF GitHub Bot commented on NIFI-5805:
--------------------------------------
Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/3160#discussion_r232117307
--- Diff:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroRecordSetWriter.java
---
@@ -68,16 +76,40 @@
.required(true)
.build();
+ static final PropertyDescriptor ENCODER_POOL_SIZE = new Builder()
+ .name("encoder-pool-size")
+ .displayName("Encoder Pool Size")
+ .description("Avro Writers require the use of an Encoder. Creation
of Encoders is expensive, but once created, they can be reused. This property
controls the maximum number of Encoders that" +
+ " can be pooled and reused. Setting this value too small can
result in degraded performance, but setting it higher can result in more heap
being used.")
--- End diff --
Just for clarification, I'd suggest adding a note mentioning that, this
property doesn't have any effect with 'Embed Avro Schema' strategy.
> Avro Record Writer service creates byte buffer for every Writer created
> -----------------------------------------------------------------------
>
> Key: NIFI-5805
> URL: https://issues.apache.org/jira/browse/NIFI-5805
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> When we use the Avro RecordSet Writer, and do not embed the schema, the
> Writer uses the Avro BinaryEncoder object to serialize the data. This object
> can be initialized, but instead we create a new one for each writer. This
> results in creating a new 64 KB byte[] each time. When we are writing many
> records to a given FlowFile, this is not a big deal. However, when used in
> PublishKafkaRecord or similar processors, where a new writer must be created
> for every Record, this can have a very significant performance impact.
> An improvement would be to have the user configure the maximum number of
> BinaryEncoder objects to pool and then use a simple pooling mechanism to
> reuse these objects.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)