[jira] [Commented] (NIFI-3225) Abstract Processor type that batches session.get() and session.commit() calls

Mark Payne (JIRA) Mon, 19 Dec 2016 14:00:07 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762429#comment-15762429
 ]


Mark Payne commented on NIFI-3225:
----------------------------------

[[email protected]] - I'd be very hesitant to modify this. The changes 
you describe here would have the same affect as the current model if we always 
call session.commit(). However, any time that the session is rolled back, it 
would behave a bit differently. Namely, we'd roll back more than we intended 
because session.checkpoint() hadn't been called. The idea behind the session is 
that if we call session.rollback(), it should rollback everything since the 
last time session.commit() or session.rollback() was called but no more. 
Changing how frequently checkpoint() is called would change that.

I'd also recommend digging into this with a profiler, such as VisualVM and/or 
YourKit. Because checkpoint() is going to be called for every flowfile in the 
case that you describe, it is reasonable to expect it to show up very 
frequently when using a sampler instead of a profiler.

As you said, session.checkpoint() is quite cheap, but not free. I think it's 
required for correctness, though. We may be able to improve its efficiency 
somewhat if its performance is concerning. The majority of its work is 
transferring data from one HashMap to another. We may be able to use some other 
type of data structure to be more efficient here, or perhaps tune the HashMap 
with non-default values for the constructor arguments.

> Abstract Processor type that batches session.get() and session.commit() calls
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-3225
>                 URL: https://issues.apache.org/jira/browse/NIFI-3225
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Rosander
>            Assignee: Bryan Rosander
>            Priority: Minor
>         Attachments: after.png, before.png
>
>
> For processors that are stateless and support batching, it should be safe to 
> get and process multiple input FlowFiles for each onTrigger() call.  
> This should amortize the cost of session.get(), session.checkpoint(), 
> session.commit() as well as any setup in onTrigger() that isn't dependent on 
> the FlowFile(s) attributes or content.
> An AbstractBatchingProcessor type should reduce boilerplate code in candidate 
> processors and encourage uniform configurability via a property to control 
> batch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-3225) Abstract Processor type that batches session.get() and session.commit() calls

Reply via email to