[ 
https://issues.apache.org/jira/browse/NIFI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275880#comment-14275880
 ] 

Mark Payne commented on NIFI-238:
---------------------------------

Ryan,

Can you explain what you mean by "I'm not clear on why I would use some method 
calls?"

I'll try to clear up some things here without going into as much detail as you 
would see in a developer guide, which we're working on.

The difference between ProcessContext and ProcessSession, in a nut shell is the 
the session provides access to data (FlowFiles), while the context provides 
information about the environment and configuration (processor properties, etc.)

context.yield() is a way to indicate that there's nothing useful that a 
Processor can do, so it should not be triggered to run for a bit. For example, 
if you are pulling from an external source and you know there's no data, you 
can call context.yield() to have the framework essentially "pause" your 
Processor so that you don't abuse the remote resource by continually asking for 
data. The amount of time that the Processor is "paused" is controlled in the 
Processor configuration dialog ("Yield Duration") with a default of 1 second.

Error handling is definitely something we want to address in the developer 
guide. Generally, calls to session.write and session.read will be surrounded in 
a try/catch where you catch ProcessException. Any IOException that is thrown by 
your callback will be wrapped in a ProcessException, and this is often what 
you're wanting to catch. If any Exception (really any Throwable) escapes your 
onTrigger method, the framework will roll back the session. If that Throwable 
is not an instance of ProcessException, it will also "administratively yield" 
your Processor. This is done because if you let something escape other than 
ProcessException, it's assumed to be a bug and this can sometimes lead to 
Processors consuming large amounts of resources without accomplishing anything 
(do a bunch of work, then throw an Exception, rollback, and repeat). So in this 
case we at least prevent it from completely consuming your resources.

Regarding backpressure: After you draw a connection between two processors, you 
can right-click on the connection and click Configure. There, you can configure 
a backpressure threshold in terms of number of FlowFiles and/or size of 
FlowFiles in the queue. Once this value is reached, the source of the 
connection will no longer be triggered to run until the queue drops back down 
below this threshold. This is a "soft limit." I.e., if the source of the 
connection is a Processor that generates 1000 FlowFiles, and the connection is 
almost full, it will still put all 1000 FlowFiles onto the connection's queue, 
but it will then stop being triggered for a while.

The user-guide has an explanation of the scheduling: 
http://nifi.incubator.apache.org/docs/nifi-docs/user-guide.html#configuring-a-processor

There's a "Scheduling Tab" section that is a sub-section of the "Configuring a 
Processor" section.

Hopefully this clears some things up instead of muddying the waters. Fire back 
with any other questions or if there's something that isn't clear here...

> Add processors to write datasets using Kite
> -------------------------------------------
>
>                 Key: NIFI-238
>                 URL: https://issues.apache.org/jira/browse/NIFI-238
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Ryan Blue
>
> I think it would be great to have a set of processors that parse incoming 
> flow files and add the data to Kite datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to