[
https://issues.apache.org/jira/browse/NIFI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102253#comment-15102253
]
Daniel Cave commented on NIFI-1251:
-----------------------------------
Specifically, my suggestion that this call out in ExecuteSQL
"nrOfRows.set(JdbcCommon.convertToAvroStream(resultSet, out));"
be replaced and have a property added around it to control the batch sizes to
include in each Avro file as well as to multithread up to a specified number of
threads. With this, the database will not be impacted however output speed
should be greatly increased. The subdividing of that map to batch maps should
not greatly affect the memory footprint (since the total number of queried rows
is unchained and the total output size for all batches would be reasonably
close to the single file size as long as batches are kept reasonable), but
would increase output speed at a proportional rate to the number of batches.
Related to this, outgoing, provenence, and transfer would need to be called as
each batch was completed instead of once at the end.
> Allow ExecuteSQL to send out large result sets in chunks
> --------------------------------------------------------
>
> Key: NIFI-1251
> URL: https://issues.apache.org/jira/browse/NIFI-1251
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Fix For: 0.5.0
>
>
> Currently, when using ExecuteSQL, if a result set is very large, it can take
> quite a long time to pull back all of the results. It would be nice to have
> the ability to specify the maximum number of records to put into a FlowFile,
> so that if we pull back say 1 million records we can configure it to create
> 1000 FlowFiles, each with 1000 records. This way, we can begin processing the
> first 1,000 records while the next 1000 are being pulled from the remote
> database.
> This suggestion comes from Vinay via the dev@ mailing list:
> Is there way to have streaming feature when large result set is fetched from
> database basically to reads data from the database in chunks of records
> instead of loading the full result set into memory.
> As part of ExecuteSQL can a property be specified called "FetchSize" which
> Indicates how many rows should be fetched from the resultSet.
> Since jam bit new in using NIFI , can any guide me on above.
> Thanks in advance
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)