Matt Burgess created NIFI-2713:
----------------------------------
Summary: Database Fetch processors' max-value columns don't work
as expected
Key: NIFI-2713
URL: https://issues.apache.org/jira/browse/NIFI-2713
Project: Apache NiFi
Issue Type: Bug
Reporter: Matt Burgess
Assignee: Matt Burgess
Currently, for QueryDatabaseTable and GenerateTableFetch, the user can enter
any number of maximum-value columns, which are used to generate a SQL query
that will fetch all records whose values are greater than the last-observed
maximum values for those columns.
However this makes multiple max-value columns not very useful, since they will
both have to increase in lockstep or records will be lost/skipped. In such a
case, using one or the other (but not both) would suffice, making multiple
max-value columns useless.
The more likely use case is that there are multiple columns whose values are
strictly increasing, but at different rates. This is common with very large
tables where a column could be for "date_created" and also a "bucket number"
that strictly increases once a day. Queries for a day's worth of data are more
efficient if they can be filtered on "bucket" (in this case), then on
timestamp. However the generated SQL queries would have to reflect that
"bucket" may remain the same as timestamp is increasing, but once the bucket
value has increased, then only the (new) timestamps for that bucket should be
fetched.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)