Tim Späth created NIFI-4385:
-------------------------------

             Summary: Adjust the QueryDatabaseTable processor for handling big 
tables.
                 Key: NIFI-4385
                 URL: https://issues.apache.org/jira/browse/NIFI-4385
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.3.0
            Reporter: Tim Späth


When querying large database tables, the *QueryDatabaseTable * processor does 
not perform very well.
The processor will always perform the full query and then transfer all 
flowfiles as a list instead of 
transferring them particularly after the *ResultSet *is fetching the next 
rows(If a fetch size is given). 
If you want to query a billion rows from a table, 
the processor will add all flowfiles in an ArrayList<FlowFile> in memory 
before transferring the whole list after the last row is fetched by the 
ResultSet. 
I've checked the code in 
*org.apache.nifi.processors.standard.QueryDatabaseTable.java* 
and in my opinion, it would be no big deal to move the session.transfer to a 
proper position in the code (into the while loop where the flowfile is added to 
the list) to 
achieve a real _stream support_. There was also a bug report for this problem 
which resulted in adding the new property *Maximum Number of Fragments*, 
but this property will just limit the results. 
Now you have to multiply *Max Rows Per Flow File* with *Maximum Number of 
Fragments* to get your limit, 
which is not really a solution for the original problem imho. 
Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is 
much slower than using a database cursor or a ResultSet
and stream the rows in flowfiles directly in the queue.







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to