Re: Document storage

Rick Branson Thu, 29 Mar 2012 07:45:49 -0700

Ben,

You can create a "materialized path" for each field in the document:


{
["user", "firstName"]: "ben",
["user", "skills", <TimeUUID>]: "java",
["user", "skills", <TimeUUID>]: "javascript",
["user", "skills", <TimeUUID>]: "html",
["user", "education", "school"]: "cmu",
["user", "education", "major"]: "computer science" 
}

This way each field could be independently updated, and you can take 
sub-document slices with queries such as "give me everything under 
user/skills." 

Rick


On Thursday, March 29, 2012 at 7:27 AM, Ben McCann wrote:

> Could you explain further how I would use CASSANDRA-3647? There's still
> very little documentation on composite columns and it was not clear to me
> whether they could be used to store document oriented data. Say for
> example that I had a document like:
> 
> user: {
> firstName: 'ben',
> skills: ['java', 'javascript', 'html'],
> education {
> school: 'cmu',
> major: 'computer science'
> }
> }
> 
> How would I flatten this to be stored and then reconstruct the document?
> 
> 
> On Thu, Mar 29, 2012 at 5:44 AM, Jake Luciani <jak...@gmail.com 
> (mailto:jak...@gmail.com)> wrote:
> 
> > Is there a reason you would prefer a JSONType over CASSANDRA-3647? It
> > would seem the only thing a JSON type offers you is validation. 3647 takes
> > it much further by deconstructing a JSON document using composite columns
> > to flatten the document out, with the ability to access and update portions
> > of the document (as well as reconstruct it).
> > 
> > On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann <b...@benmccann.com 
> > (mailto:b...@benmccann.com)> wrote:
> > 
> > > Hi,
> > > 
> > > I was wondering if it would be interesting to add some type of
> > > document-oriented data type.
> > > 
> > > I've found it somewhat awkward to store document-oriented data in
> > Cassandra
> > > today. I can make a JSON/Protobuf/Thrift, serialize it, and store it,
> > 
> > 
> > but
> > > Cassandra cannot differentiate it from any other string or byte array.
> > > However, if my column validation_class could be a JsonType that would
> > > allow tools to potentially do more interesting introspection on the
> > 
> > 
> > column
> > > value. E.g. bug 3647
> > > <https://issues.apache.org/jira/browse/CASSANDRA-3647>calls for
> > > supporting arbitrarily nested "documents" in CQL. Running a
> > > query against the JSON column in Pig is possible as well, but again in
> > 
> > 
> > this
> > > use case it would be helpful to be able to encode in column metadata that
> > > the column is stored as JSON. For debugging, running nightly reports,
> > 
> > 
> > etc.
> > > it would be quite useful compared to the opaque string and byte array
> > 
> > 
> > types
> > > we have today. JSON is appealing because it would be easy to implement.
> > > Something like Thrift or Protocol Buffers would actually be interesting
> > > since they would be more space efficient. However, they would also be a
> > > bit more difficult to implement because of the extra typing information
> > > they provide. I'm hoping with Cassandra 1.0's addition of compression
> > 
> > 
> > that
> > > storing JSON is not too inefficient.
> > > 
> > > Would there be interest in adding a JsonType? I could look at putting a
> > > patch together.
> > > 
> > > Thanks,
> > > Ben
> > 
> > 
> > 
> > 
> > 
> > --
> > http://twitter.com/tjake
>

Re: Document storage

Reply via email to