Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2361#discussion_r158674300
--- Diff:
nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java
---
@@ -267,26 +267,28 @@ public void onTrigger(final ProcessContext context,
final ProcessSession session
commit(context, session, listCount);
listCount = 0;
} while (bucketLister.isTruncated());
- currentTimestamp = maxTimestamp;
+
+ if (maxTimestamp > currentTimestamp) {
+ currentTimestamp = maxTimestamp;
+ }
final long listMillis =
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos);
getLogger().info("Successfully listed S3 bucket {} in {} millis",
new Object[]{bucket, listMillis});
if (!commit(context, session, listCount)) {
- if (currentTimestamp > 0) {
- persistState(context);
- }
getLogger().debug("No new objects in S3 bucket {} to list.
Yielding.", new Object[]{bucket});
context.yield();
}
+
+ // Persist all state, including any currentKeys
+ persistState(context);
--- End diff --
Do we still need this? Isn't updating state within commit() enough? We
should minimize the number of status updates as some state storage is not
designed for frequently updates, e.g. Zookeeper. I think if the processor
didn't find any new file to list, then it does not have to update state, does
it?
I might be missing something as the original reason is not clear to me, to
call persistState() when there was nothing to commit.
---