Craig created NIFI-9358:
---------------------------
Summary: QueryDatabaseTableRecord can miss records if Max column
is a timestamp
Key: NIFI-9358
URL: https://issues.apache.org/jira/browse/NIFI-9358
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Reporter: Craig
If the maxval column is a date or timestamp field, it is very possible that
multiple records could have the exact same timestamp. If there is a high
volume of updates it is possible that the processor will execute the query and
record the new max value while records are still being updated in the database
for that same timestamp.
This is easy to conceive if you imagine that the tracked column is just a date
(MM/DD/YYYY), as obviously many records will be inserted during the same day,
hour, minute, second... It becomes less and less likely the higher the
resolution of the timestamp, but it can easily be demonstrated even at the
millisecond resolution.
The root of the issue is that the first filter added to the WHERE clause in the
generated query (within GenerateTableFetch.java, around line 351) uses the ">"
operator, which ensures no rows are pulled twice, but it also opens the door to
some records never getting pulled. The downside is that records may be pulled
twice, but the upside is that no records will be missed. Users would need to
ensure processing is idempotent in this case.
My request is to change that maxval filter to ">=" or at least make it
optionally ">=".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)