[
https://issues.apache.org/jira/browse/NIFI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275880#comment-14275880
]
Mark Payne commented on NIFI-238:
---------------------------------
Ryan,
Can you explain what you mean by "I'm not clear on why I would use some method
calls?"
I'll try to clear up some things here without going into as much detail as you
would see in a developer guide, which we're working on.
The difference between ProcessContext and ProcessSession, in a nut shell is the
the session provides access to data (FlowFiles), while the context provides
information about the environment and configuration (processor properties, etc.)
context.yield() is a way to indicate that there's nothing useful that a
Processor can do, so it should not be triggered to run for a bit. For example,
if you are pulling from an external source and you know there's no data, you
can call context.yield() to have the framework essentially "pause" your
Processor so that you don't abuse the remote resource by continually asking for
data. The amount of time that the Processor is "paused" is controlled in the
Processor configuration dialog ("Yield Duration") with a default of 1 second.
Error handling is definitely something we want to address in the developer
guide. Generally, calls to session.write and session.read will be surrounded in
a try/catch where you catch ProcessException. Any IOException that is thrown by
your callback will be wrapped in a ProcessException, and this is often what
you're wanting to catch. If any Exception (really any Throwable) escapes your
onTrigger method, the framework will roll back the session. If that Throwable
is not an instance of ProcessException, it will also "administratively yield"
your Processor. This is done because if you let something escape other than
ProcessException, it's assumed to be a bug and this can sometimes lead to
Processors consuming large amounts of resources without accomplishing anything
(do a bunch of work, then throw an Exception, rollback, and repeat). So in this
case we at least prevent it from completely consuming your resources.
Regarding backpressure: After you draw a connection between two processors, you
can right-click on the connection and click Configure. There, you can configure
a backpressure threshold in terms of number of FlowFiles and/or size of
FlowFiles in the queue. Once this value is reached, the source of the
connection will no longer be triggered to run until the queue drops back down
below this threshold. This is a "soft limit." I.e., if the source of the
connection is a Processor that generates 1000 FlowFiles, and the connection is
almost full, it will still put all 1000 FlowFiles onto the connection's queue,
but it will then stop being triggered for a while.
The user-guide has an explanation of the scheduling:
http://nifi.incubator.apache.org/docs/nifi-docs/user-guide.html#configuring-a-processor
There's a "Scheduling Tab" section that is a sub-section of the "Configuring a
Processor" section.
Hopefully this clears some things up instead of muddying the waters. Fire back
with any other questions or if there's something that isn't clear here...
> Add processors to write datasets using Kite
> -------------------------------------------
>
> Key: NIFI-238
> URL: https://issues.apache.org/jira/browse/NIFI-238
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Ryan Blue
>
> I think it would be great to have a set of processors that parse incoming
> flow files and add the data to Kite datasets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)