[jira] [Commented] (NIFI-4385) Adjust the QueryDatabaseTable processor for handling big tables.

Rohit Pawar (Jira) Thu, 20 Apr 2023 12:17:39 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714726#comment-17714726
 ]


Rohit Pawar commented on NIFI-4385:
-----------------------------------

[~mattyb149] [~pwicks] 

Hello, I have question regarding QueryDatabaseTable ,Actually In mysql I am 
trying to use server side cursor streaming by enabling useCursorFetch=true in 
connection url , For Implementing batch thing and save nifi from 
memory-overflow like thinks ,Example for what I am expecting : data-fetched and 
send parallely to next processor(which write data in DB) so that load of memory 
 not increase .
QueryDatabaseTable - > PutDatabaseRecord

Problem : Througth QueryDatabaseTable I got all data of query about 50k get in 
halfsec and placed to Queue and then next           processor starts processing 
it .

Why QueryDatabaseTable place all data at once ? because I have set the values 
for fetchrow=50,batch=5,flowfile=5

Anything I am missing?

> Adjust the QueryDatabaseTable processor for handling big tables.
> ----------------------------------------------------------------
>
>                 Key: NIFI-4385
>                 URL: https://issues.apache.org/jira/browse/NIFI-4385
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.3.0
>            Reporter: Tim Späth
>            Priority: Major
>
> When querying large database tables, the *QueryDatabaseTable* processor does 
> not perform very well.
> The processor will always perform the full query and then transfer all 
> flowfiles as a list instead of 
> transferring them particularly after the *ResultSet* is fetching the next 
> rows(If a fetch size is given). 
> If you want to query a billion rows from a table, 
> the processor will add all flowfiles in an ArrayList<FlowFile> in memory 
> before transferring the whole list after the last row is fetched by the 
> ResultSet. 
> I've checked the code in 
> *org.apache.nifi.processors.standard.QueryDatabaseTable.java* 
> and in my opinion, it would be no big deal to move the session.transfer to a 
> proper position in the code (into the while loop where the flowfile is added 
> to the list) to 
> achieve a real _stream support_. There was also a bug report for this problem 
> which resulted in adding the new property *Maximum Number of Fragments*, 
> but this property will just limit the results. 
> Now you have to multiply *Max Rows Per Flow File* with *Maximum Number of 
> Fragments* to get your limit, 
> which is not really a solution for the original problem imho. 
> Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is 
> much slower than using a database cursor or a ResultSet
> and stream the rows in flowfiles directly in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-4385) Adjust the QueryDatabaseTable processor for handling big tables.

Reply via email to