Yes, I meant the "row header index". What I have done is that I'm storing an object (i.e. UserProfile) where you read or write it as a whole (a user updates their user details in a single page in the UI). So I serialize that object into a binary JSON using SMILE format. I then compress it using Snappy on the client side. So as far as Cassandra cares it's storing a byte[].
Now on the client side, I'm using cassandra-cli with a custom type that knows how to turn a byte[] into a JSON text and back. The only issue was CASSANDRA-4081 where "assume" doesn't work with custom types. If CASSANDRA-4081 gets fixed, I'll get the best of both worlds. Also advantages of this vs. the thrift based Super Column families are: 1. Saving extra CPU usage on the Cassandra nodes. Since serialize/deserialize and compression/decompression happens on the client nodes where there is plenty idle CPU time 2. Saving network bandwidth since I'm sending over a compressed byte[] -- Drew On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote: > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian <d...@venarc.com> wrote: >>> I think this is a much better approach because that gives you the >>> ability to update or retrieve just parts of objects efficiently, >>> rather than making column values just blobs with a bunch of special >>> case logic to introspect them. Which feels like a big step backwards >>> to me. >> >> Unless your access pattern involves reading/writing the whole document each >> time. In that case you're better off serializing the whole document and >> storing it in a column as a byte[] without incurring the overhead of column >> indexes. Right? > > Hmm, not sure what you're thinking of there. > > If you mean the "index" that's part of the row header for random > access within a row, then no, serializing to byte[] doesn't save you > anything. > > If you mean secondary indexes, don't declare any if you don't want any. :) > > Just telling C* to store a byte[] *will* be slightly lighter-weight > than giving it named columns, but we're talking negligible compared to > the overhead of actually moving the data on or off disk in the first > place. Not even close to being worth giving up being able to deal > with your data from standard tools like cqlsh, IMO. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com