Andrej created NIFI-14549:
-----------------------------
Summary: Saving state for ExecuteSQL and ExecuteSQLRecord processor
Key: NIFI-14549
URL: https://issues.apache.org/jira/browse/NIFI-14549
Project: Apache NiFi
Issue Type: Improvement
Affects Versions: 2.4.0
Environment: Docker version: 28.0.4; docker image: apache/nifi:2.4.0;
Host OS: DEbian 12
Reporter: Andrej
Saving state for ExecuteSQL and ExecuteSQLRecord processor:
It would be much easier to incrementally load data with complex queries with
state recorded from previous run.
My example: I need to transfer data from MS SQL Extended event from system
table-valued function named sys.fn_xe_file_target_read_file. Since very large
amount of data is produced every minute, I need to use function parameter to
query only data with certain extended event file and from file offset. I need
to remember last values for next query run.
Processor QueryDatabaseTableRecord records state with Maximum-value Columns
would do this, however it work with subqueries, which means in this case every
time all the data is read, and then filtered. I can not afford this approach
since there are millions of rows.
Currently I am solving this with Groovy Script to get state from json file -->
ExecuteSQLRecord --> Groovy Script to get last record --> write to json file.
All this needs to be in Sub-Process Group where I am allowing only one flowfile
at the time. This is very complex, prone to error and slow.
So statefull ExecuteSQLRecord would remove all this trouble.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)