Craig created NIFI-9358:
---------------------------

             Summary: QueryDatabaseTableRecord can miss records if Max column 
is a timestamp
                 Key: NIFI-9358
                 URL: https://issues.apache.org/jira/browse/NIFI-9358
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
            Reporter: Craig


If the maxval column is a date or timestamp field, it is very possible that 
multiple records could have the exact same timestamp.   If there is a high 
volume of updates it is possible that the processor will execute the query and 
record the new max value while records are still being updated in the database 
for that same timestamp.   

This is easy to conceive if you imagine that the tracked column is just a date 
(MM/DD/YYYY), as obviously many records will be inserted during the same day, 
hour, minute, second...  It becomes less and less likely the higher the 
resolution of the timestamp, but it can easily be demonstrated even at the 
millisecond resolution. 

The root of the issue is that the first filter added to the WHERE clause in the 
generated query (within GenerateTableFetch.java, around line 351) uses the ">" 
operator, which ensures no rows are pulled twice, but it also opens the door to 
some records never getting pulled.   The downside is that records may be pulled 
twice, but the upside is that no records will be missed.  Users would need to 
ensure processing is idempotent in this case.

My request is to change that maxval filter to  ">=" or at least make it 
optionally ">=". 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to