[jira] [Updated] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Sylvain Lebresne (JIRA) Fri, 20 May 2011 03:53:33 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-1278:
----------------------------------------

    Attachment: 0001-Add-bulk-loader-utility-v2.patch

bq. It'd be nice if it printed the filename and the time it took for each time, 
since just having the percentages reset is a bit confusing.

The fact that the percentages reset is really just a bug (I test at first with 
only one sstable, my bad). Anyway, that's fixed. I also agree with Jonathan's 
objection about printing the filename. And in general I'm not sure giving too 
much information is really necessary.

bq. Also, this should respect SS.RING_DELAY

Yes, I think this is the fat client that wasn't respecting it, it was waiting 
for an hardcoded time of 5 seconds, which is almost always not enough. I've 
updated SS.initClient() to use RING_DELAY instead.


Attaching v2 that:
  * use RING_DELAY
  * update the progress indication so that percentage works. It also add for 
each host the number of files that should be transfered to it and how many have 
already been. Lastly it adds a total percentage as well as approximate transfer 
rate infos.


> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.1
>
>         Attachments: 0001-Add-bulk-loader-utility-v2.patch, 
> 0001-Add-bulk-loader-utility.patch, 1278-cassandra-0.7-v2.txt, 
> 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to