[
https://issues.apache.org/jira/browse/NIFI-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706829#comment-14706829
]
Mark Payne commented on NIFI-744:
---------------------------------
[~joewitt] Thanks for the thorough review! This one certainly needs a good
review, as it is pretty important to get right and far from a trivial change.
I have updated documentation to reflect the concerns/questions that you have
and attached a new patch that includes those modifications. I attached it as a
separate patch so that you can easily tell what was added.
I believe it addresses all of your concerns state above except for the
StandardProcessSession. I can see the benefit, certainly, of writing to the
same claim sequentially when we have multiple FlowFiles in the same session.
However, we should also consider another case. If we have multiple content
repositories, each on a separate partition, and multiple threads reading the
content of the output FlowFiles, we will get better performance by spanning
across multiple partitions than by writing sequentially to the same file. The
case in which sequential reads/writes will be beneficial is if all of the
FlowFiles go to the same follow-on processor and that processor is
single-threaded. The current implementation is optimized instead for
multi-threaded processing. If we think it makes sense to do so, we can consider
refactoring later to write sequentially for multiple FlowFiles in the same
session, but I believe
that should be done as a future enhancement if we want to support it, not on
this ticket.
Let me know if you have any more questions or concerns.
Thanks again for taking the time to review this in-depth so well!
-Mark
> Allow FileSystemRepository to write to the same file for multiple
> (non-parallel) sessions
> -----------------------------------------------------------------------------------------
>
> Key: NIFI-744
> URL: https://issues.apache.org/jira/browse/NIFI-744
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 0.3.0
>
> Attachments:
> 0001-NIFI-744-Refactored-ContentClaim-into-ContentClaim-a.patch
>
>
> Currently, when a ProcessSession is committed, the Content Claim that was
> being written to is now "finished" and will never be written to again.
> When a flow has processors that generate many, many FlowFiles, each in their
> own session, this means that we have many, many files on disk on the Content
> Repository, as well. Generally, this hasn't been a problem to write to these
> files. However, when the files are to be archived or destroyed, this is very
> taxing and can cause erratic performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)