[
https://issues.apache.org/jira/browse/CASSANDRA-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906578#action_12906578
]
Philip (flip) Kromer commented on CASSANDRA-1434:
-------------------------------------------------
Right now the code does { buffer n mutations, holding each acc. to its
endpoint. After n writes, check that all endpoint writes are finished, and
dispatch to each endpoint its share of the n mutations }
This is non-blocking at the socket level but ends up being blocking at the app
level, and the wide variance in size has bad effects on gc at the cassandra end.
I think the ColumnFamilyRecordWriter would see a speedup & improved stability
with { buffer mutations, holding each acc. to its endpoint. When an endpoint
has seen n writes, check that any previous write has finished, and dispatch to
this endpoint a full buffer of N mutations }.
> ColumnFamilyOutputFormat performs blocking writes for large batches
> -------------------------------------------------------------------
>
> Key: CASSANDRA-1434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1434
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Reporter: Stu Hood
> Assignee: Stu Hood
> Fix For: 0.7 beta 2
>
> Attachments: 0001-Switch-to-TFramedTransport-in-TestRingCache.patch,
> 0002-Add-kth-endpoint-method-to-RingCache-and-improve-con.patch,
> 0003-Remove-nesting-in-RingCache.patch,
> 0004-Fix-regression-introduced-on-1322-add-all-replicas-o.patch
>
>
> By default, ColumnFamilyOutputFormat batches
> {{mapreduce.output.columnfamilyoutputformat.batch.threshold}} or
> {{Long.MAX_VALUE}} mutations, and then performs a blocking write.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.