GitHub user markap14 opened a pull request:
https://github.com/apache/nifi/pull/1493
NIFI-3356: Initial implementation of writeahead provenance repository
- The idea behind NIFI-3356 was to improve the efficiency and throughput of
the Provenance Repository, as it is often the bottleneck. While testing the
newly designed repository,
a handful of other, fairly minor, changes were made to improve efficiency
as well, as these came to light when testing the new repository:
- Use a BufferedOutputStream within StandardProcessSession (via a
ClaimCache abstraction) in order to avoid continually writing to
FileOutputStream when writing many small FlowFiles
- Updated threading model of MinimalLockingWriteAheadLog - now performs
serialization outside of lock and writes to a 'synchronized' OutputStream
- Change minimum scheduling period for components from 30 microseconds to 1
nanosecond. ScheduledExecutor is very inconsistent with timing of task
scheduling. With the bored.yield.duration
now present, this value doesn't need to be set to 30 microseconds. This
was originally done to avoid processors that had no work from dominating the
CPU. However, now that we will yield
when processors have no work, this results in slowing down processors
that are able to perform work.
- Allow nifi.properties to specify multiple directories for FlowFile
Repository
- If backpressure is engaged while running a batch of sessions, then stop
batch processing earlier. This helps FlowFiles to move through the system much
more smoothly instead of the
herky-jerky queuing that we previously saw at very high rates of
FlowFiles.
- Added NiFi PID to log message when starting nifi. This was simply an
update to the log message that provides helpful information.
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
- [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number
you are trying to resolve? Pay particular attention to the hyphen "-" character.
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [ ] Is your initial contribution a single, squashed commit?
### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the LICENSE file, including the main
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to
.name (programmatic access) for each of the new properties?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered?
### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/markap14/nifi NIFI-3356
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/1493.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1493
----
commit babbe4e42a6cc2b0d124691bb398f1ec33d3b8c8
Author: Mark Payne <[email protected]>
Date: 2016-12-09T15:52:33Z
NIFI-3356: Initial implementation of writeahead provenance repository
- The idea behind NIFI-3356 was to improve the efficiency and throughput of
the Provenance Repository, as it is often the bottleneck. While testing the
newly designed repository,
a handful of other, fairly minor, changes were made to improve efficiency
as well, as these came to light when testing the new repository:
- Use a BufferedOutputStream within StandardProcessSession (via a
ClaimCache abstraction) in order to avoid continually writing to
FileOutputStream when writing many small FlowFiles
- Updated threading model of MinimalLockingWriteAheadLog - now performs
serialization outside of lock and writes to a 'synchronized' OutputStream
- Change minimum scheduling period for components from 30 microseconds to 1
nanosecond. ScheduledExecutor is very inconsistent with timing of task
scheduling. With the bored.yield.duration
now present, this value doesn't need to be set to 30 microseconds. This
was originally done to avoid processors that had no work from dominating the
CPU. However, now that we will yield
when processors have no work, this results in slowing down processors
that are able to perform work.
- Allow nifi.properties to specify multiple directories for FlowFile
Repository
- If backpressure is engaged while running a batch of sessions, then stop
batch processing earlier. This helps FlowFiles to move through the system much
more smoothly instead of the
herky-jerky queuing that we previously saw at very high rates of
FlowFiles.
- Added NiFi PID to log message when starting nifi. This was simply an
update to the log message that provides helpful information.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---