[ 
https://issues.apache.org/jira/browse/NIFI-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698458#comment-14698458
 ] 

Mark Payne commented on NIFI-853:
---------------------------------

Toivo,

You are certainly right - we need the ability to do batch processing. What I'm 
thinking of doing is having a new property - Batch Size. It would default to 
say 100 (I am saying 100 only because that's what Sqoop defaults to, this 
number could be changed if it makes sense).

We would then pull in that that many FlowFiles and create a Map<String, 
PreparedStatement>. The key would be the SQL statement, and the 
PreparedStatement would then be the compiled statement. So we could then 
iterate over those 100 FlowFiles and for each one get the PreparedStatement 
that is appropriate or create a new one if none exists. We would then do an 
"addBatch" on that prepared statement with the appropriate parameters (from 
FlowFile attributes). Then, after we have done this for all FlowFiles in our 
batch, we can then do an executeBatch for each PreparedStatement.

This will add a bit of complexity to the code but not very much. It still gives 
us our flexibility and provides some good throughput by batching the updates.

In terms of 1 vs. many FlowFIles, you are right - one is of course going to be 
less resource-intensive to process than many. However, in the 0.3.0 release we 
are hoping to get some improvements in that reworks how we handle some of the 
repository implementations that allows us to get MUCH better performance when 
dealing with many small FlowFiles. I think that will help tremendously in this 
respect.

Thoughts on this approach?

> Create Processors to put JSON data to a Relational Database
> -----------------------------------------------------------
>
>                 Key: NIFI-853
>                 URL: https://issues.apache.org/jira/browse/NIFI-853
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.3.0
>
>         Attachments: 
> 0001-NIFI-853-Added-processors-ConvertFlatJSONToSQL-PutSQ.patch, 
> 0002-NIFI-853-Made-updates-to-processors.patch
>
>
> Most of the discussion/design for these processors happened in the comments 
> of NIFI-293, which was the initial ticket for implementing JDBC functionality 
> in NiFi, but was closed in a previous version, so this ticket was created to 
> do the work.
> The idea is to have a processor that will take in FlowFiles whose contents 
> are arbitrary SQL INSERT/UPDATE commands. The commands can be parameterized 
> with the parameters' values and types in FlowFile attributes.
> We then should have a processor that converts a JSON document into a SQL 
> command to either update or insert data into a database table. We will also 
> want some other processors in the future probably to handle other data types, 
> such as converting XML, CSV, Avro, etc. into SQL commands.
> This breakout gives us a nice coherence to the "do only one thing and do it 
> well" principle by separating the logic of handling all of the incoming 
> formats from the logic of updating the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to