[
https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878836#comment-15878836
]
ASF subversion and git services commented on NIFI-3356:
-------------------------------------------------------
Commit 96ed405d708894ee5400ebbdbf335325219faa09 in nifi's branch
refs/heads/master from [~markap14]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=96ed405 ]
NIFI-3356: Initial implementation of writeahead provenance repository
- The idea behind NIFI-3356 was to improve the efficiency and throughput of the
Provenance Repository, as it is often the bottleneck. While testing the newly
designed repository,
a handful of other, fairly minor, changes were made to improve efficiency as
well, as these came to light when testing the new repository:
- Use a BufferedOutputStream within StandardProcessSession (via a ClaimCache
abstraction) in order to avoid continually writing to FileOutputStream when
writing many small FlowFiles
- Updated threading model of MinimalLockingWriteAheadLog - now performs
serialization outside of lock and writes to a 'synchronized' OutputStream
- Change minimum scheduling period for components from 30 microseconds to 1
nanosecond. ScheduledExecutor is very inconsistent with timing of task
scheduling. With the bored.yield.duration
now present, this value doesn't need to be set to 30 microseconds. This was
originally done to avoid processors that had no work from dominating the CPU.
However, now that we will yield
when processors have no work, this results in slowing down processors that
are able to perform work.
- Allow nifi.properties to specify multiple directories for FlowFile Repository
- If backpressure is engaged while running a batch of sessions, then stop batch
processing earlier. This helps FlowFiles to move through the system much more
smoothly instead of the
herky-jerky queuing that we previously saw at very high rates of FlowFiles.
- Added NiFi PID to log message when starting nifi. This was simply an update
to the log message that provides helpful information.
NIFI-3356: Fixed bug in ContentClaimWriteCache that resulted in data corruption
and fixed bug in RepositoryConfiguration that threw exception if cache warm
duration was set to empty string
NIFI-3356: Fixed NPE
NIFI-3356: Added debug-level performance monitoring
NIFI-3356: Updates to unit tests that failed after rebasing against master
NIFI-3356: Incorporated PR review feedback
NIFI-3356: Fixed bug where we would delete index directories that are still in
use; also added additional debug logging and a simple util class that can be
used to textualize provenance event files - useful in debugging
This closes #1493
> Provide a newly refactored provenance repository
> ------------------------------------------------
>
> Key: NIFI-3356
> URL: https://issues.apache.org/jira/browse/NIFI-3356
> Project: Apache NiFi
> Issue Type: Task
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different
> times over several years. The original design for the repository was to
> provide storage of events and sequential iteration over those events via a
> Reporting Task. After that, we added the ability to compress the data so that
> it could be held longer. We then introduced the notion of indexing and
> searching via Lucene. We've since made several more modifications to try to
> boost performance.
> At this point, however, the repository is still the bottleneck for many flows
> that handle large volumes of small FlowFiles. We need a new implementation
> that is based around the current goals for the repository and that can
> provide better throughput.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)