But it isn't special case logic. The current AbstractType and Indexing of Abstract types for the most part would already support this. Someone just has to write the code for JSONType or ProtoBuffType.
The problem isn't writing the code to break objects up, the problem is encode/decode time. Encode/decode to thrift is already a significant portion of the time line in writing data, adding an object to column encode/decode on top of that makes it even longer. For a read heavy load that wants the JSON/Proto as the thing to be served to clients, an increase in the write time line to parse/index the blob is probably acceptable, so that you don't have to pay the re-assemble penalty every time you hit the database for that object. But, once we get multi range slicing, for the average case I think the break it up into multiple columns approach will be best for most people. That is the other problem I have with doing the break into columns thing right now. I have to either use Super Columns and not be able to index, so why did I break them up? Or I can't get multiple objects at once, with out pulling a huge slice from o1 start to o5 end and then throwing away the majority of the data I pulled back that doesn't belong to o1 and o5 -Jeremiah ________________________________________ From: Jonathan Ellis [jbel...@gmail.com] Sent: Thursday, March 29, 2012 11:23 AM To: dev@cassandra.apache.org Subject: Re: Document storage On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan <jeremiah.jor...@morningstar.com> wrote: > Its not clear what 3647 actually is, there is no code attached, and no real > example in it. > > Aside from that, the reason this would be useful to me (if we could get > indexing of attributes working), is that I already have my data in > JSON/Thrift/ProtoBuff, depending how large the data is, it isn't trivial to > break it up into columns to insert, and re-assemble into columns to read. I don't understand the problem. Assuming Cassandra support for maps and lists, I could write a Python module that takes json (or thrift, or protobuf) objects and splits them into Cassandra rows by fields in a couple hours. I'm pretty sure this is essentially what Brian's REST api for Cassandra does now. I think this is a much better approach because that gives you the ability to update or retrieve just parts of objects efficiently, rather than making column values just blobs with a bunch of special case logic to introspect them. Which feels like a big step backwards to me. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com