[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Jonathan Ellis (JIRA) Sat, 30 Apr 2011 05:36:45 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027327#comment-13027327
 ]


Jonathan Ellis commented on CASSANDRA-1278:
-------------------------------------------

bq. If you're comparing to the streams we use for repair and similar, they 
require table names and byte ranges be known up front

We've had enough trouble debugging streaming when people use it all the time 
for repair. I shudder to think of the bugs we'll introduce to a second-class 
protocol that gets used slightly more often than BMT.

Maybe we've been too clever here: why not just write out the full sstable on 
the client, and stream it over (indexes and all) so that

 - we move the [primary] index build off the server, which should give a nice 
performance boost
 - we have filenames and sizes ready to go so streaming will be happy

We're still talking about a minor change to streaming of recognizing that we're 
getting all the components and not just data, but that's something we can deal 
with at the StreamInSession level, I don't think we'll need to change the 
protocol itself.

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.8.1
>
>         Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 
> 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to