[
https://issues.apache.org/jira/browse/NIFI-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Turcsanyi updated NIFI-7740:
----------------------------------
Fix Version/s: 1.13.0
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
> ---------------------------------------------------------------------------
>
> Key: NIFI-7740
> URL: https://issues.apache.org/jira/browse/NIFI-7740
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
> Fix For: 1.13.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user
> for tuning the number of records in an individual Hive Streaming transaction,
> as well as the number of transactions to be batched together (for
> performance).
> These properties should be exposed in the PutHive3Streaming processor in
> order to tune its performance. The default values should result in the
> current behavior, so a setting of zero for Records Per Transaction will put
> all records into a single transaction, and a setting of 1 for Transactions
> Per Batch will result in a single transaction in each batch. Together these
> defaults describe the current behavior.
> For large files, Records Per Transaction should be set to something more
> manageable, such as 100K perhaps, and Transactions Per Batch to something
> such as 10. As a rule the product of the two numbers should be larger than
> the largest expected number of records in the flow file(s), this will ensure
> any failed transaction batches cause a full rollback. The documentation for
> these properties should include this prescription.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)