Barry Welch created NIFI-6746:
---------------------------------

             Summary: ExecuteSQL loses inbound attributes after first batch
                 Key: NIFI-6746
                 URL: https://issues.apache.org/jira/browse/NIFI-6746
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.9.2
         Environment: Linux - Appears unrelated to O/S
            Reporter: Barry Welch


When passing the SQL query in the FlowFile contents, and MaxRowsPerFlow and 
OutputBatchSize parameters are non-zero, ExecuteSQL appears to build and 
release batches of records without waiting for the query to complete producing 
streaming output from the query.  This is great!

The attributes associated with the inbound FlowFile containing the SQL query 
are passed to the flowfiles generated for the first batch.

However, after the first batch is released all subsequent batches are missing 
those attributes.

I believe this is because ExecuteSQL pulls a reference to the attributes 
(inputFileAttrMap) in the SQL query before it starts processing the query's 
result set, but then deletes the original FlowFile object after the first batch 
is released, invalidating that reference.

This occurs at line 327 in AbstractExecuteSQL with these statements:

     session.remove(fileToProcess);
     fileToProcess = null;

To replicate the error, 
 # Create flowfile containing query that returns 50 rows.
 # Add a couple of attributes to the flowfile
 # Add ExecuteSQL Processor with MaxRowsPerFlow set to 3 and OutputBatchSize to 
5
 # Run query
 # Check flowfiles produced after SplitAvro Processor
 # The first 15 flowfiles will have the inbound attributes.
 # The remaining 35 will not have the inbound attributes.

Let me know if you need more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to