Matt Burgess created NIFI-7740:
----------------------------------

             Summary: Add Records Per Transaction and Transactions Per Batch to 
PutHive3Streaming
                 Key: NIFI-7740
                 URL: https://issues.apache.org/jira/browse/NIFI-7740
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Matt Burgess


The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user 
for tuning the number of records in an individual Hive Streaming transaction, 
as well as the number of transactions to be batched together (for performance).

These properties should be exposed in the PutHive3Streaming processor in order 
to tune its performance. The default values should result in the current 
behavior, so a setting of zero for Records Per Transaction will put all records 
into a single transaction, and a setting of 1 for Transactions Per Batch will 
result in a single transaction in each batch. Together these defaults describe 
the current behavior.

For large files, Records Per Transaction should be set to something more 
manageable, such as 100K perhaps, and Transactions Per Batch to something such 
as 10. As a rule the product of the two numbers should be larger than the 
largest expected number of records in the flow file(s), this will ensure any 
failed transaction batches cause a full rollback. The documentation for these 
properties should include this prescription.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to