[
https://issues.apache.org/jira/browse/NIFI-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008512#comment-16008512
]
Mark Payne commented on NIFI-3860:
----------------------------------
[~devriesb] - I think Joe did a great job above of outlining the benefits of
this and the tradeoffs that this addresses. Certainly there is a use case that
I have in mind, though. I have a Processor that I am working on that takes in a
FlowFile and spits out N number of FlowFiles. I may not know until I'm half-way
through reading the input FlowFile that I need a new output FlowFile. At this
point, I will need to create a new FlowFile and begin writing to it. However, I
need that stream to stay open until I've finished processing all of the
FlowFiles. This means that using the existing
StandardProcessSession.write(FlowFile, OutputStreamCallback) won't really work
since I'll need that stream to stay open. Specifically, I'll be using a library
that wraps the OutputStream. This is important because typically we handle this
scenario by calling ProcessSession.append; however, that will provide a new
OutputStream each time and that doesn't work here because I am wrapping the
output stream with another library. What I need is the ability to call
ProcessSession.write and get an OutputStream - similar to how we allow
ProcessSession.read to return an OutputStream. We avoided allowing
ProcessSession.write, though, specifically because we need the most up-to-date
version returned when we finished writing to the stream. This would allow that
to happen.
Also, more generally, we see many e-mails from users where their custom
processors throw an Exception because the FlowFiles given are not the most
recent version. I've even done this myself on a few occasions in error-handling
cases. Since we already know what the most recent version of the FlowFile is,
it makes sense to me to just use that version.
> Consider relaxing the constraint that ProcessSession enforces we give it the
> most recent version of a FlowFile
> --------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-3860
> URL: https://issues.apache.org/jira/browse/NIFI-3860
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
>
> Currently, when we call methods on ProcessSession to access or modify a
> FlowFile, the ProcessSession will roll itself back and throw a
> FlowFileHandlingException with the message "<FlowFile> is not the most recent
> version of this FlowFile within this session". This was done to ensure that
> Processor developers ensure that they know what they are doing and always
> have the most recent version of a FlowFile. However, this comes with a few
> downsides:
> * It can result in code being complex in error-handling cases when we need to
> ensure that no matter what we hold the most recent version of a FlowFile
> * It's easy to call session.putAttribute and forget to store the most recent
> version of the FlowFile, which gets returned - this is most problematic when
> dealing with a Collection<FlowFile>.
> * We have a method for ProcessSession.read(FlowFile) that returns an
> InputStream. However, we don't have a corresponding write() method. This is
> due to the fact that once we finish writing to the FlowFile, we would have to
> return the most up-to-date version of the FlowFile and there's no way to do
> that if returning an OutputStream.
> We should consider relaxing this constraint and instead just always make use
> of the most recent version of the FlowFile, even if an older version of the
> FlowFile is passed in.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)