[
https://issues.apache.org/jira/browse/NIFI-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341181#comment-16341181
]
ASF GitHub Bot commented on NIFI-4794:
--------------------------------------
GitHub user markap14 opened a pull request:
https://github.com/apache/nifi/pull/2437
NIFI-4794: Updated event writers to avoid creating a lot of byte[] by…
… reusing buffers. Also removed synchronization on EventWriter when rolling
over the writer and just moved the writing of the header to happen before
making the writer available to any other threads. This reduces thread
contention during rollover.
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
- [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number
you are trying to resolve? Pay particular attention to the hyphen "-" character.
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [ ] Is your initial contribution a single, squashed commit?
### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to
.name (programmatic access) for each of the new properties?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered?
### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/markap14/nifi NIFI-4794
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/2437.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2437
----
commit 3bbd64bfebe81cb099a4b8017d839e591b3d9bc7
Author: Mark Payne <markap14@...>
Date: 2018-01-25T17:16:56Z
NIFI-4794: Updated event writers to avoid creating a lot of byte[] by
reusing buffers. Also removed synchronization on EventWriter when rolling over
the writer and just moved the writing of the header to happen before making the
writer available to any other threads. This reduces thread contention during
rollover.
----
> Improve Garbage Collection required by Provenance Repository
> ------------------------------------------------------------
>
> Key: NIFI-4794
> URL: https://issues.apache.org/jira/browse/NIFI-4794
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> The EventIdFirstSchemaRecordWriter that is used by the provenance repository
> has a writeRecord(ProvenanceEventRecord) method. Within this method, it
> serializes the given record into a byte array by serializing to a
> ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once
> this is done, it calls toByteArray() on that BAOS so that it can write the
> byte[] directly to another OutputStream.
> This can create a rather large amount of garbage to be collected. We can
> improve this significantly:
> # Instead of creating a new ByteArrayOutputStream each time, create a pool
> of them. This avoids constantly having to garbage collect them.
> # If said BAOS grows beyond a certain size, we should not return it to the
> pool because we don't want to keep a huge impact on the heap.
> # Instead of wrapping the BAOS in a new DataOutputStream, the
> DataOutputStream should be pooled/recycled as well. Since it must create an
> internal byte[] for the writeUTF method, this can save a significant amount
> of garbage.
> # Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use
> ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that
> new array/copying the data, and the GC overhead.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)