[ https://issues.apache.org/jira/browse/NIFI-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714726#comment-17714726 ]
Rohit Pawar commented on NIFI-4385: ----------------------------------- [~mattyb149] [~pwicks] Hello, I have question regarding QueryDatabaseTable ,Actually In mysql I am trying to use server side cursor streaming by enabling useCursorFetch=true in connection url , For Implementing batch thing and save nifi from memory-overflow like thinks ,Example for what I am expecting : data-fetched and send parallely to next processor(which write data in DB) so that load of memory not increase . QueryDatabaseTable - > PutDatabaseRecord Problem : Througth QueryDatabaseTable I got all data of query about 50k get in halfsec and placed to Queue and then next processor starts processing it . Why QueryDatabaseTable place all data at once ? because I have set the values for fetchrow=50,batch=5,flowfile=5 Anything I am missing? > Adjust the QueryDatabaseTable processor for handling big tables. > ---------------------------------------------------------------- > > Key: NIFI-4385 > URL: https://issues.apache.org/jira/browse/NIFI-4385 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Affects Versions: 1.3.0 > Reporter: Tim Späth > Priority: Major > > When querying large database tables, the *QueryDatabaseTable* processor does > not perform very well. > The processor will always perform the full query and then transfer all > flowfiles as a list instead of > transferring them particularly after the *ResultSet* is fetching the next > rows(If a fetch size is given). > If you want to query a billion rows from a table, > the processor will add all flowfiles in an ArrayList<FlowFile> in memory > before transferring the whole list after the last row is fetched by the > ResultSet. > I've checked the code in > *org.apache.nifi.processors.standard.QueryDatabaseTable.java* > and in my opinion, it would be no big deal to move the session.transfer to a > proper position in the code (into the while loop where the flowfile is added > to the list) to > achieve a real _stream support_. There was also a bug report for this problem > which resulted in adding the new property *Maximum Number of Fragments*, > but this property will just limit the results. > Now you have to multiply *Max Rows Per Flow File* with *Maximum Number of > Fragments* to get your limit, > which is not really a solution for the original problem imho. > Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is > much slower than using a database cursor or a ResultSet > and stream the rows in flowfiles directly in the queue. -- This message was sent by Atlassian Jira (v8.20.10#820010)