[ 
https://issues.apache.org/jira/browse/NIFI-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351904#comment-16351904
 ] 

ASF GitHub Bot commented on NIFI-4836:
--------------------------------------

Github user MikeThomsen commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2447#discussion_r165436273
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/QueryDatabaseTable.java
 ---
    @@ -123,8 +124,22 @@
         public static final PropertyDescriptor MAX_ROWS_PER_FLOW_FILE = new 
PropertyDescriptor.Builder()
                 .name("qdbt-max-rows")
                 .displayName("Max Rows Per Flow File")
    -            .description("The maximum number of result rows that will be 
included in a single FlowFile. " +
    -                    "This will allow you to break up very large result 
sets into multiple FlowFiles. If the value specified is zero, then all rows are 
returned in a single FlowFile.")
    +            .description("The maximum number of result rows that will be 
included in a single FlowFile. This will allow you to break up very large "
    +                    + "result sets into multiple FlowFiles. If the value 
specified is zero, then all rows are returned in a single FlowFile.")
    +            .defaultValue("0")
    --- End diff --
    
    When I did something similar on GetMongo, I think I chose to make it 
optional and have "blank" be the equivalent. Thoughts?


> Allow QueryDatabaseTables to send out batches of flow files while result set 
> is being processed
> -----------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4836
>                 URL: https://issues.apache.org/jira/browse/NIFI-4836
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>
> Currently QueryDatabaseTable (QDT) will not transfer the outgoing flowfiles 
> to the downstream relationship(s) until the entire result set has been 
> processed (regardless of whether Max Rows Per Flow File is set). This is so 
> the maxvalue.* and fragment.count attributes can be set correctly for each 
> flow file.
> However for very large result sets, the initial fetch can take a long time, 
> and depending on the setting of Max Rows Per FlowFile, there could be a great 
> number of FlowFiles transferred downstream as a large burst at the end of QDT 
> execution.
> It would be nice for the user to be able to choose to have FlowFiles be 
> transferred downstream while the result set is still being processed. This 
> alleviates the "large burst at the end" by replacing it with smaller output 
> batches during processing. The tradeoff will be that if an Output Batch Size 
> is set, then the maxvalue.* and fragment.count attributes will not be set on 
> the outgoing flow files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to