RE: Document storage

Jeremiah Jordan Thu, 29 Mar 2012 09:43:07 -0700

But it isn't special case logic.  The current AbstractType and Indexing of 
Abstract types for the most part would already support this.  Someone just has 
to write the code for JSONType or ProtoBuffType.

The problem isn't writing the code to break objects up, the problem is 
encode/decode time.  Encode/decode to thrift is already a significant portion 
of the time line in writing data, adding an object to column encode/decode on 
top of that makes it even longer.  For a read heavy load that wants the 
JSON/Proto as the thing to be served to clients, an increase in the write time 
line to parse/index the blob is probably acceptable, so that you don't have to 
pay the re-assemble penalty every time you hit the database for that object.

But, once we get multi range slicing, for the average case I think the break it 
up into multiple columns approach will be best for most people.  That is the 
other problem I have with doing the break into columns thing right now.  I have 
to either use Super Columns and not be able to index, so why did I break them 
up?  Or I can't get multiple objects at once, with out pulling a huge slice 
from o1 start to o5 end and then throwing away the majority of the data I 
pulled back that doesn't belong to o1 and o5

-Jeremiah

________________________________________
From: Jonathan Ellis [jbel...@gmail.com]
Sent: Thursday, March 29, 2012 11:23 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan
<jeremiah.jor...@morningstar.com> wrote:
> Its not clear what 3647 actually is, there is no code attached, and no real 
> example in it.
>
> Aside from that, the reason this would be useful to me (if we could get 
> indexing of attributes working), is that I already have my data in 
> JSON/Thrift/ProtoBuff, depending how large the data is, it isn't trivial to 
> break it up into columns to insert, and re-assemble into columns to read.

I don't understand the problem.  Assuming Cassandra support for maps
and lists, I could write a Python module that takes json (or thrift,
or protobuf) objects and splits them into Cassandra rows by fields in
a couple hours.  I'm pretty sure this is essentially what Brian's REST
api for Cassandra does now.

I think this is a much better approach because that gives you the
ability to update or retrieve just parts of objects efficiently,
rather than making column values just blobs with a bunch of special
case logic to introspect them.  Which feels like a big step backwards
to me.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: Document storage

Reply via email to