[
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis reassigned CASSANDRA-1278:
-----------------------------------------
Assignee: Sylvain Lebresne (was: Matthew F. Dennis)
I think we've been over-engineering the problem. Ed was on the right track:
bq. I would personally like to see a JMX function like 'nodetool addsstable
mykeyspace mycf mysstable-file' . Most people can generating and move an
SSTable on their own (sstableWriter +scp)
(This is, btw, the HBase bulk load approach, which despite some clunkiness does
seem to solve the problem for those users.)
The main drawback is that because of Cassandra's replication strategies, data
from a naively-written sstable could span many nodes -- even the entire cluster.
So we can improve the experience a lot with a simple tool that just streams
ranges from a local table to the right nodes. Since it's doing the exact thing
that existing node movement needs -- sending ranges from an existing sstable --
it should not require any new code from Streaming.
Sylvain volunteered to take a stab at this.
> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
> Key: CASSANDRA-1278
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Jeremy Hanna
> Assignee: Sylvain Lebresne
> Fix For: 0.8.1
>
> Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt,
> 1278-cassandra-0.7.txt
>
> Original Estimate: 40h
> Time Spent: 40h 40m
> Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art. People are either
> directed to just do it responsibly with thrift or a higher level client, or
> they have to explore the contrib/bmt example -
> http://wiki.apache.org/cassandra/BinaryMemtable That contrib module requires
> delving into the code to find out how it works and then applying it to the
> given problem. Using either method, the user also needs to keep in mind that
> overloading the cluster is possible - which will hopefully be addressed in
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents
> dealing with bulk loading. Perhaps it could include code in the Core to make
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to
> do - bulk load their data into Cassandra.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira