RE: SplitRecord behaviour

Kumara M S, Hemantha (Nokia - IN/Bangalore) Fri, 01 Mar 2019 09:48:06 -0800

Thanks Bryan, I got your point.  Yeah we could try PublishKafkaRecord, as in 
some of other case we had already used PublishKafkaRecord(csv data to avro) to 
send out records.
In the below mentioned use case we thought of sending out bunch of records(as 
we are not doing anything with the data) at one shot instead of sending one 
record at a time.


Thanks,
Hemantha

-----Original Message-----
From: Bryan Bende <[email protected]> 
Sent: Friday, March 1, 2019 7:52 PM
To: [email protected]
Subject: Re: SplitRecord behaviour

Hello,

Flow files are not transferred until the session they came form is committed. 
So imagine we periodically commit and some of the splits are transferred, then 
half way through a failure is encountered, the entire original flow file will 
be reprocessed, producing some of the same splits that were already send out. 
The way it is implemented now, it is either completely successful, or not, but 
never partially successful producing duplicates.

Based on the description of your flow with the three processors you mentioned, 
I wouldn't bother using SplitRecord, just have ListenHttp
-> PublishKafkaRecord. PublishKafkaRecorcd can be configured with the
same reader and writer you were using in SplitRecord, and it will read each 
record and send to Kafka, without having to produce unnecessary flow files.

Thanks,

Bryan

On Fri, Mar 1, 2019 at 3:44 AM Kumara M S, Hemantha (Nokia -
IN/Bangalore) <[email protected]> wrote:
>
> Hi All,
>
> We have a use case where receiving huge json(file size might vary from 1GB to 
> 50GB) via http, convert in to XML(xml format is not fixed, any other format 
> is fine) and send out using Kafka. - here is the restriction is CPU & RAM 
> usage requirement(once it is fixed, it should handle all size files) should 
> not getting changed based on incoming file size.
>
> We used ListenHTTP -->SplitRecord -->PublishKafa , but we have observed one 
> behaviour where SplitRecord is sending out data to PublishKafa only after 
> whole FlowFile processing. Is there any reason why did we design this way? 
> Will it not be good if we send out splits  to next processor after each 
> configured records instead of all sending all splits at one shot?
>
>
> Regards,
> Hemantha
>

RE: SplitRecord behaviour

Reply via email to