[jira] Updated: (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Jonathan Ellis (JIRA) Wed, 29 Dec 2010 11:44:13 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis updated CASSANDRA-1278:
--------------------------------------

      Component/s: Tools
    Fix Version/s:     (was: 0.8)
                   0.7.1
         Assignee: Matthew F. Dennis

All a "bulk load" API needs to do is pretend it's a streaming source, and send 
data rows (in sorted order) to the target.  Since Hadoop sorts as part of the 
reduce stage, we should be able do this directly in CFOF/CFRW.

The tricky part is that StreamOutSession.begin assumes that it has a list of 
physical files to stream from (via addFilesToStream).

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7.1
>
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to