Github user adamlamar commented on the issue:
https://github.com/apache/nifi/pull/2361
@ijokarumawak I did as you suggested and pulled `persistState` out in the
case when no new keys have been listed, but this actually caused unit tests to
fail. This is because `currentTimestamp` never changes during the main loop, so
even though `commit` calls `persistState`, the value of `currentTimestamp`
doesn't change until the main loop exits. Which is why `persistState` is
required in both exit paths.
Instead, I took a slightly different approach with the change just pushed.
Since `currentTimestamp` is the current value persisted to the state manager,
`maxTimestamp` is the highest timestamp seen in the main loop, and
`currentKeys` is tied to `maxTimestamp` (not `currentTimestamp`), I removed the
`persistState` call in `commit`, and did `persistState` at the end of
`onTrigger` only. While this does continue to `persistState` on each exit, it
reduces the number of `persistState` calls to once per `onTrigger` rather than
once per 1000 keys iterated (which was done previously in `commit`).
I did a bunch of manual testing with concurrent `PutS3Object` and `ListS3`
and always got the correct number of listed keys, even when uploading 20k+
objects using 10 threads. I tried a few strategies to skip `persistState` if
nothing had changed, but in manual testing it always produced the wrong number
of keys, although sometimes only off by 1. The current code should be quite an
improvement to the load on the state manager, even if it isn't ideal.
I also introduced `totalListCount` which helps tighten up the log messages
a bit. Previously it would "successfully list X objects" followed by "no new
objects to list" in a single `onTrigger` (this was apparent in the unit test
output). `totalListCount` also avoids an unnecessary `yield`.
There's a lot going on in this one - let me know if you have any other
questions!
---