[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Stu Hood (JIRA) Fri, 29 Apr 2011 22:26:49 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027291#comment-13027291
 ]


Stu Hood commented on CASSANDRA-1278:
-------------------------------------

> It was intentional as previously only the streaming was buffered (at 4k)
It was the other way around IIRC: (non-encrypted) streaming used 
channel.transferTo, which bypassed the buffering entirely. The buffering was 
for internode messaging: see CASSANDRA-1943.

> We could construct something that buffers up X amount of data and then frames 
> the data being
> sent and change the inner loop to decompose that but it's extra complexity, 
> code and overhead.
You're already buffering rows in StreamingProxyFlusher.bufferRow: the change 
would simply be to continue to buffer rows until a threshold was reached. The 
benefit here is that the code on the receiving side doesn't need to change when 
the proxy starts sending it a different SSTable version/format. I've never 
heard of somebody regretting having framing in a protocol: it's always the 
other way around.

Also, an SSTable version (as usually held by Descriptor) should be added to the 
header of your protocol so that clients don't break by sending unversioned 
blobs: not having versioning is my primary complaint vis-a-vis BinaryMemtables.

> If we buffer it on the other side we consume more memory for a longer period 
> of time
I was talking about buffering on the client side: the server side can do one 
system call to flush to disk, such that it never enters userspace.

Thanks!

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.8.1
>
>         Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 
> 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either 
> directed to just do it responsibly with thrift or a higher level client, or 
> they have to explore the contrib/bmt example - 
> http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
> delving into the code to find out how it works and then applying it to the 
> given problem.  Using either method, the user also needs to keep in mind that 
> overloading the cluster is possible - which will hopefully be addressed in 
> CASSANDRA-685
> This improvement would be to create a contrib module or set of documents 
> dealing with bulk loading.  Perhaps it could include code in the Core to make 
> it more pluggable for external clients of different types.
> It is just that this is something that many that are new to Cassandra need to 
> do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

Reply via email to