Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1407#discussion_r95716780
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/GenerateTableFetch.java
 ---
    @@ -115,20 +128,36 @@ public GenerateTableFetch() {
     
         @OnScheduled
         public void setup(final ProcessContext context) {
    +        // The processor is invalid if there is an incoming connection and 
max-value columns are defined
    +        if (context.getProperty(MAX_VALUE_COLUMN_NAMES).isSet() && 
context.hasIncomingConnection()) {
    +            throw new ProcessException("If an incoming connection is 
supplied, no max-value column names may be specified");
    --- End diff --
    
    I understand the concerns.
    
    For backward compatibility, I think we should provide that so that existing 
flow can keep fetching rows based on the stored state even after upgrade. I've 
done the similar thing before with [TailFile 
processor](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/TailFile.java#L348).
 Checking the state key name format to determine if it's current or older 
format, then migrate the values.
    
    I concern this statement in your previous comment, If we support Max-value 
column with incoming files:
    > and just document that all the specified tables must contain the 
max-value columns.
    
    Max-value column hasn't been required, I guess that is for users who want 
to fetch all rows periodically and don't have to track the max value. Maybe for 
things like master configuration tables. Then I think we need to keep 
supporting empty max-value column.
    
    An example flow I thought that might be useful is, using GenerateFlowFile 
or FetchFile to pass a configuration text such as:
    
    ```
    # Table name : MAX value column(s)
    USERS:LAST_UPDATED
    ITEMS
    PURCHASE_HISTORIES:LAST_UPDATED
    ```
    
    Then pass it to SplitText and ExtractText to generate flow files with 
attributes `tableName` and `maxColumns`. Then pass it to GenerateTableFetch 
processor to generate fetch SQL dynamically. This way, user can easily modify 
which table to fetch.
    
    Maybe after processing these incoming flow files, GenerateTableFetch would 
have state like this (Table `ITEMS` doesn't have max value column):
    
    |KEY|VALUE|
    |----|-------|
    |USERS.LAST_UPDATED|2017.01.12 11:42:00|
    |PURCHASE_HISTORIES.LAST_UPDATED|2017.01.12 11:59:32|
    
    How do you think? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to