You can probably leverage the bulk writer API. Look at SSTableSimpleUnsortedWriter for example.
On Tue, Apr 9, 2013 at 4:16 PM, William Katsak <wkat...@cs.rutgers.edu>wrote: > Hello, > > I apologize for my very vague email, I shouldn't have written it in such a > hurry. I would like to clarify my use case and requirements, so that maybe > someone can give me some advice. > > I am building a research version of Cassandra in which a missed write is a > normal case (e.g. out of n replicas, it would be a normal case for at least > one of these to miss a write). I keep track of missed writes similar to how > default Cassandra does for HintedHandoff (a column family in system that > stores serialized RowMutations). Later, when the nodes that were missed are > ready to receive writes again, the node caching the RowMutations sends them > one a a time until they have all been delivered. This all happens in the > context of a live, serving system. > > My system works and does what it is supposed to, now I am trying to > improve performance. I currently have two optimizations in mind, but am not > sure how to approach them: > > 1) Minimize the transfer of excessive RowMutations by merging all > RowMutations for the same key, and transmitting only one per key. In the > event that a subset of keys are very popular, I can minimize how much I > need to transfer to bring a node back up to date. I am thinking I can go > inside the RowMutation and merge each ColumnFamily, then create a new > RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to > merge an invididual CF, or am I misunderstanding it? > > 2) Serialize a whole bunch of RowMutations into a chunk, stream the chunk > to the appropriate node, deserialize them, and apply them individually. In > this case, I would avoid having to wait for an ACK on each mutation, and > could more efficiently send lots of data. Is this feasible with the > existing streaming infrastructure, or would I have to implement a new > facility? > > Again, my codebase is on top of Cassandra 1.1.6. I would very much > appreciate any insight anyone could give me. > > Thanks very much, > Bill Katsak > > On 04/08/2013 12:10 PM, William Katsak wrote: > >> Hello, >> >> I am sorry to bother the list with this question, but I was wondering, >> assuming I have many saved (small) mutations (of the type that hinted >> handoff uses), is there any easy way to put these all together and bulk >> transmit (stream) them to a destination node? >> >> My codebase is based on Cassandra 1.1.6. >> >> Thanks very much in advance, >> Bill Katsak >> >> >> >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced