[ https://issues.apache.org/jira/browse/NIFI-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380467#comment-16380467 ]
ASF GitHub Bot commented on NIFI-4838: -------------------------------------- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2448#discussion_r171280456 --- Diff: nifi-nar-bundles/nifi-mongodb-bundle/nifi-mongodb-processors/src/main/java/org/apache/nifi/processors/mongodb/GetMongo.java --- @@ -129,26 +144,44 @@ .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) .build(); static final PropertyDescriptor RESULTS_PER_FLOWFILE = new PropertyDescriptor.Builder() - .name("results-per-flowfile") - .displayName("Results Per FlowFile") - .description("How many results to put into a flowfile at once. The whole body will be treated as a JSON array of results.") - .required(false) - .expressionLanguageSupported(true) - .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) - .build(); + .name("results-per-flowfile") + .displayName("Results Per FlowFile") + .description("How many results to put into a flowfile at once. The whole body will be treated as a JSON array of results.") + .required(false) + .expressionLanguageSupported(true) + .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR) + .build(); + static final PropertyDescriptor ESTIMATE_PROGRESS = new PropertyDescriptor.Builder() + .name("estimate-progress") + .displayName("Estimate Progress") + .description("If enabled, a count query will be run first, using the configured query, and attributes will be added to each flowfile showing how far they are into the result set.") + .required(true) + .addValidator(StandardValidators.BOOLEAN_VALIDATOR) + .allowableValues(GM_TRUE, GM_FALSE) + .defaultValue(GM_FALSE.getValue()) + .build(); + static final PropertyDescriptor PROGRESSIVE_COMMITS = new PropertyDescriptor.Builder() + .name("progressive-commits") + .displayName("Commit After Each Batch") --- End diff -- I'm a little confused here about the term "batch". It doesn't seem directly related to the Batch Size property (since the latter is kind of a server-side thing, like a JDBC "fetch size"?), and in the code a "batch" seems to refer to the number of files set in Results Per Flowfile. Can you explain a little more about what's going on with the progressive commits? If I have Results per Flowfile set to 100 and Batch Size set to 1000, would I get 10 flow files committed at once as once "batch"? Or is it always one commit per flowfile (if Commit After Each Batch is set)? > Make GetMongo support multiple commits and give some progress indication > ------------------------------------------------------------------------ > > Key: NIFI-4838 > URL: https://issues.apache.org/jira/browse/NIFI-4838 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Mike Thomsen > Assignee: Mike Thomsen > Priority: Major > > It shouldn't wait until the end to do a commit() call because the effect is > that GetMongo looks like it has hung to a user who is pulling a very large > data set. > It should also have an option for running a count query to get the current > approximate count of documents that would match the query and append an > attribute that indicates where a flowfile stands in the total result count. > Ex: > query.progress.point.start = 2500 > query.progress.point.end = 5000 > query.count.estimate = 17,568,231 -- This message was sent by Atlassian JIRA (v7.6.3#76005)