Andrej created NIFI-14549:
-----------------------------

             Summary: Saving state for ExecuteSQL and ExecuteSQLRecord processor
                 Key: NIFI-14549
                 URL: https://issues.apache.org/jira/browse/NIFI-14549
             Project: Apache NiFi
          Issue Type: Improvement
    Affects Versions: 2.4.0
         Environment: Docker version: 28.0.4; docker image: apache/nifi:2.4.0; 
Host OS: DEbian 12
            Reporter: Andrej


Saving state for ExecuteSQL and ExecuteSQLRecord processor:

It would be much easier to incrementally load data with complex queries with 
state recorded from previous run.

My example: I need to transfer data from MS SQL Extended event from system 
table-valued function named sys.fn_xe_file_target_read_file. Since very large 
amount of data is produced every minute, I need to use function parameter to 
query only data with certain extended event file and from file offset. I need 
to remember last values for next query run.

Processor QueryDatabaseTableRecord records state with Maximum-value Columns 
would do this, however it work with subqueries, which means in this case every 
time all the data is read, and then filtered. I can not afford this approach 
since there are millions of rows.

 

Currently I am solving this with Groovy Script to get state from json file --> 
ExecuteSQLRecord --> Groovy Script to get last record --> write to json file. 
All this needs to be in Sub-Process Group where I am allowing only one flowfile 
at the time. This is very complex, prone to error and slow.

 

So statefull ExecuteSQLRecord would remove all this trouble. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to