[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Sylvain Lebresne (JIRA) Thu, 19 May 2011 07:49:29 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036231#comment-13036231
 ]


Sylvain Lebresne commented on CASSANDRA-1278:
---------------------------------------------

I'd love to, but as it turns out it is fairly heavily hardwired in Descriptor 
that the keyspace name is the directory where the file sits. And by hardwired I 
mean that even if you add a constructor to Descriptor to decorrelate the ksname 
field from the directory argument this doesn't work, because streaming only 
transmit the name of the file (including the directory), not the ksname field 
and thus would get the wrong name.

That is, I don't think we can do that without adding a new argument to the 
stream header, which felt a bit overkill at first (it's probably doable 
though).  

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.1
>
>         Attachments: 0001-Add-bulk-loader-utility.patch, 
> 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to