Github user ijokarumawak commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1407#discussion_r95716780
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
---
@@ -115,20 +128,36 @@ public GenerateTableFetch() {
@OnScheduled
public void setup(final ProcessContext context) {
+ // The processor is invalid if there is an incoming connection and
max-value columns are defined
+ if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() &&
context.hasIncomingConnection()) {
+ throw new ProcessException("If an incoming connection is
supplied, no max-value column names may be specified");
--- End diff --
I understand the concerns.
For backward compatibility, I think we should provide that so that existing
flow can keep fetching rows based on the stored state even after upgrade. I've
done the similar thing before with [TailFile
processor](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java#L348).
Checking the state key name format to determine if it's current or older
format, then migrate the values.
I concern this statement in your previous comment, If we support Max-value
column with incoming files:
> and just document that all the specified tables must contain the
max-value columns.
Max-value column hasn't been required, I guess that is for users who want
to fetch all rows periodically and don't have to track the max value. Maybe for
things like master configuration tables. Then I think we need to keep
supporting empty max-value column.
An example flow I thought that might be useful is, using GenerateFlowFile
or FetchFile to pass a configuration text such as:
```
# Table name : MAX value column(s)
USERS:LAST_UPDATED
ITEMS
PURCHASE_HISTORIES:LAST_UPDATED
```
Then pass it to SplitText and ExtractText to generate flow files with
attributes `tableName` and `maxColumns`. Then pass it to GenerateTableFetch
processor to generate fetch SQL dynamically. This way, user can easily modify
which table to fetch.
Maybe after processing these incoming flow files, GenerateTableFetch would
have state like this (Table `ITEMS` doesn't have max value column):
|KEY|VALUE|
|----|-------|
|USERS.LAST_UPDATED|2017.01.12 11:42:00|
|PURCHASE_HISTORIES.LAST_UPDATED|2017.01.12 11:59:32|
How do you think? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---