[ 
https://issues.apache.org/jira/browse/CASSANDRA-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158635#comment-13158635
 ] 

Brandon Williams commented on CASSANDRA-3045:
---------------------------------------------

bq. are there any benchmarks or is there anything anecdotal about performance?

Using the simplest job possible (copying a CF, map-only) I see a 20-25% gain.  
I suspect this is read-limited though and if you're generating the output on a 
hadoop cluster and loading it into a cassandra cluster (ie, not colocated), 
this will be even faster, but creating such a workload is a bit too much work 
for me to test.  If anyone has an existing case like this, I'd love for them to 
test and chime in.
                
> Update ColumnFamilyOutputFormat to use new bulkload API
> -------------------------------------------------------
>
>                 Key: CASSANDRA-3045
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3045
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Jonathan Ellis
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-Remove-gossip-SS-requirement-from-BulkLoader.txt, 
> 0002-Allow-DD-loading-without-yaml.txt, 
> 0003-hadoop-output-support-for-bulk-loading.txt
>
>
> The bulk loading interface added in CASSANDRA-1278 is a great fit for Hadoop 
> jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to