Joe,

Thank you for your inputs.

I'd like to avoid creating the multi-threads.  Would it possible to loop 
through a ProcessSession once it's committed?   For example, the total of 1000 
requests, and break down 100 requests per batch.  Create/transfer a flowfile 
per request, then once 100 requests are processed, commit it and then loop 
through again.  Would it better that transfer a flow once at time, but transfer 
it in batch?

Thanks
Kumiko

-----Original Message-----
From: Joe Witt [mailto:[email protected]] 
Sent: Thursday, May 26, 2016 7:17 PM
To: [email protected]
Subject: Re: Best way to process the processor requests in batch

Kumiko

A couple of quick thoughts to share.  You can absolutely code your processor to 
operate in batches and you can of course multi-thread the processor.  The 
general unit of work concept Apache NiFi supports is called a ProcessSession 
and you can operate on as many flow files as
you need in that session and then commit it as one batch.   NiFi will
automatically track/record a lot of very nice information at the process 
session level.  In addition NiFi will capture provenance information which 
itself is useful for understand specific items that went through that flow and 
their latencies and such.  Beyond these options there is also a concept of 
counters which you can use to capture, generally for development purposes, 
interesting things you'd like to observe over time. You'll also want to get a 
good handle on what performance you should expect interacting with the web 
service independent of NiFi so you can get a good baseline to work from.

The quota question is also one where you have choices and design decisions to 
make.  You can bake this quota handling logic into your processor itself or you 
could also possibly wire existing or some new processor in that specifically 
handles the quote/grouping logic you need and it would have relationships such 
as 'within quota' and 'exceeds quota'.

I apologize for not giving a more precise response.  There are many ways to 
approach this and the best trade offs will depend on finer details.  As you 
advance with this please feel free to ask more questions.  If you find things 
you wish were available and you think should exist in NiFi we'd love to have 
your contribution in any form (ideas, code, JIRAs, etc..).

Thanks
Joe

On Thu, May 26, 2016 at 9:08 PM, Kumiko Yada <[email protected]> wrote:
> Hello,
>
> We implemented the custom process that are similar to the InvokeHTTP that the 
> part of URL can be replaced with the Context Data List, then write the 
> weather to the flowfile.  For example, URL to get the weather feed have to 
> include the zip code in URL, and the ZIP code is {0} in the URL and replaced 
> the zip code from the Context Data List property.
>
> URL
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> Context Data List:
> 00000
> 11111
> 22222
>
> Processor with make the following requests:
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> http://example00000/weather
> http://example11111/weather
> http://example22222/weather
>
> This processor is processed in one request at a time and have a perf issue.  
> I'd like to modify to process in batches.  What are the best way to process 
> in batches?  And also, would the Nifi keep track how many requests the 
> processor is processed?  If so, how the Nifi keep track this and how long the 
> Nifi keep track of data?  I'd like to add the quota priorities in this 
> processor to keep track of quota.  For example, if the weather feeds can be 
> requested only 100 requests a day, I don't want to processor to executed once 
> the quota is reached.
>
> Thanks
> Kumiko

Reply via email to