Barry Welch created NIFI-6746:
---------------------------------
Summary: ExecuteSQL loses inbound attributes after first batch
Key: NIFI-6746
URL: https://issues.apache.org/jira/browse/NIFI-6746
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.9.2
Environment: Linux - Appears unrelated to O/S
Reporter: Barry Welch
When passing the SQL query in the FlowFile contents, and MaxRowsPerFlow and
OutputBatchSize parameters are non-zero, ExecuteSQL appears to build and
release batches of records without waiting for the query to complete producing
streaming output from the query. This is great!
The attributes associated with the inbound FlowFile containing the SQL query
are passed to the flowfiles generated for the first batch.
However, after the first batch is released all subsequent batches are missing
those attributes.
I believe this is because ExecuteSQL pulls a reference to the attributes
(inputFileAttrMap) in the SQL query before it starts processing the query's
result set, but then deletes the original FlowFile object after the first batch
is released, invalidating that reference.
This occurs at line 327 in AbstractExecuteSQL with these statements:
session.remove(fileToProcess);
fileToProcess = null;
To replicate the error,
# Create flowfile containing query that returns 50 rows.
# Add a couple of attributes to the flowfile
# Add ExecuteSQL Processor with MaxRowsPerFlow set to 3 and OutputBatchSize to
5
# Run query
# Check flowfiles produced after SplitAvro Processor
# The first 15 flowfiles will have the inbound attributes.
# The remaining 35 will not have the inbound attributes.
Let me know if you need more details.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)