[ 
https://issues.apache.org/jira/browse/NIFI-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Turcsanyi updated NIFI-7740:
----------------------------------
    Component/s: Extensions

> Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-7740
>                 URL: https://issues.apache.org/jira/browse/NIFI-7740
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user 
> for tuning the number of records in an individual Hive Streaming transaction, 
> as well as the number of transactions to be batched together (for 
> performance).
> These properties should be exposed in the PutHive3Streaming processor in 
> order to tune its performance. The default values should result in the 
> current behavior, so a setting of zero for Records Per Transaction will put 
> all records into a single transaction, and a setting of 1 for Transactions 
> Per Batch will result in a single transaction in each batch. Together these 
> defaults describe the current behavior.
> For large files, Records Per Transaction should be set to something more 
> manageable, such as 100K perhaps, and Transactions Per Batch to something 
> such as 10. As a rule the product of the two numbers should be larger than 
> the largest expected number of records in the flow file(s), this will ensure 
> any failed transaction batches cause a full rollback. The documentation for 
> these properties should include this prescription.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to