[jira] [Updated] (CASSANDRA-7776) Allow multiple MR jobs to concurrently write to the same column family from the same node using CqlBulkOutputFormat

Paul Pak (JIRA) Fri, 12 Sep 2014 14:26:18 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Pak updated CASSANDRA-7776:
--------------------------------
    Description: 
After sstable files are written, all files in the specified output directory 
are loaded (transferred) to the remote cassandra cluster. If multiple writes 
occur on a node to the same table (i.e. directory), then the multiple load 
processes end up transferring the same sstable files multiple times. 
Furthermore, if directory cleanup of successful outputs is set to occur 
([CASSANDRA-7777|https://issues.apache.org/jira/browse/CASSANDRA-7777]), then 
there could be errors caused by write/load contention.

This can be simply remedied by using unique output directories for each MR job.

  was:
After sstable files are written, all files in the specified output directory 
are loaded (transferred) to the remote cassandra cluster. If multiple writes 
occur on a node to the same table (i.e. directory), then the multiple load 
processes end up transferring the same sstable files multiple times. 
Furthermore, if directory cleanup of successful outputs is set to occur 
([CASSANDRA-7777|]), then there could be errors caused by write/load contention.

This can be simply remedied by using unique output directories for each MR job.


> Allow multiple MR jobs to concurrently write to the same column family from 
> the same node using CqlBulkOutputFormat
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7776
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7776
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Paul Pak
>            Assignee: Paul Pak
>            Priority: Minor
>              Labels: cql3, hadoop
>         Attachments: trunk-7776-v1.txt
>
>
> After sstable files are written, all files in the specified output directory 
> are loaded (transferred) to the remote cassandra cluster. If multiple writes 
> occur on a node to the same table (i.e. directory), then the multiple load 
> processes end up transferring the same sstable files multiple times. 
> Furthermore, if directory cleanup of successful outputs is set to occur 
> ([CASSANDRA-7777|https://issues.apache.org/jira/browse/CASSANDRA-7777]), then 
> there could be errors caused by write/load contention.
> This can be simply remedied by using unique output directories for each MR 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7776) Allow multiple MR jobs to concurrently write to the same column family from the same node using CqlBulkOutputFormat

Reply via email to