Hi Ben, Sure, there's nothing really to it, but I'll email it to you. As far as why I'm using Snappy on the type instead of sstable_compression is because when you set sstable_compression the compression happens on the Cassandra nodes and I see two advantages with my approach:
1. Saving extra CPU usage on the Cassandra nodes. Since compression/decompression can easily be done on the client nodes where there is plenty idle CPU time 2. Saving network bandwidth since you're sending over a compressed byte[] One thing to note about my approach is that when I define the schema in Cassandra, I define the columns as byte[] and not my custom type and I do all the conversion on the client side. -- Drew On Mar 29, 2012, at 12:04 AM, Ben McCann wrote: > Sounds awesome Drew. Mind sharing your custom type? I just wrote a basic > JSON type and did the validation the same way you did, but I don't have any > SMILE support yet. It seems that if your type were committed to the > Cassandra codebase then the issue you ran into of the CLI only supporting > built-in types would no longer be a problem for you (though fixing the > issue anyway would be good and I voted for it). Btw, any reason you > compress it with Snappy yourself instead of just setting sstable_compression > to > SnappyCompressor<http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression>and > letting Cassandra do that part? > > -Ben > > > On Wed, Mar 28, 2012 at 11:28 PM, Drew Kutcharian <d...@venarc.com> wrote: > >> I'm actually doing something almost the same. I serialize my objects into >> byte[] using Jackson's SMILE format, then compress it using Snappy then >> store the byte[] in Cassandra. I actually created a simple Cassandra Type >> for this but I hit a wall with cassandra-cli: >> >> https://issues.apache.org/jira/browse/CASSANDRA-4081 >> >> Please vote on the JIRA if you are interested. >> >> Validation is pretty simple, you just need to read the value and parse it >> using Jackson, if you don't get any exceptions you're JSON/Smile is valid ;) >> >> -- Drew >> >> >> >> On Mar 28, 2012, at 9:28 PM, Ben McCann wrote: >> >>> I don't imagine sort is a meaningful operation on JSON data. As long as >>> the sorting is consistent I would think that should be sufficient. >>> >>> >>> On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo <edlinuxg...@gmail.com >>> wrote: >>> >>>> Some work I did stores JSON blobs in columns. The question on JSON >>>> type is how to sort it. >>>> >>>> On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna >>>> <jeremy.hanna1...@gmail.com> wrote: >>>>> I don't speak for the project, but you might give it a day or two for >>>> people to respond and/or perhaps create a jira ticket. Seems like >> that's a >>>> reasonable data type that would get some traction - a json type. >> However, >>>> what would validation look like? That's one of the main reasons there >> are >>>> the data types and validators, in order to validate on insert. >>>>> >>>>> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote: >>>>> >>>>>> Any thoughts? I'd like to submit a patch, but only if it will be >>>> accepted. >>>>>> >>>>>> Thanks, >>>>>> Ben >>>>>> >>>>>> >>>>>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann <b...@benmccann.com> >> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was wondering if it would be interesting to add some type of >>>>>>> document-oriented data type. >>>>>>> >>>>>>> I've found it somewhat awkward to store document-oriented data in >>>>>>> Cassandra today. I can make a JSON/Protobuf/Thrift, serialize it, >> and >>>>>>> store it, but Cassandra cannot differentiate it from any other string >>>> or >>>>>>> byte array. However, if my column validation_class could be a >> JsonType >>>>>>> that would allow tools to potentially do more interesting >>>> introspection on >>>>>>> the column value. E.g. bug 3647< >>>> https://issues.apache.org/jira/browse/CASSANDRA-3647>calls for >> supporting >>>> arbitrarily nested "documents" in CQL. Running a >>>>>>> query against the JSON column in Pig is possible as well, but again >> in >>>> this >>>>>>> use case it would be helpful to be able to encode in column metadata >>>> that >>>>>>> the column is stored as JSON. For debugging, running nightly >> reports, >>>> etc. >>>>>>> it would be quite useful compared to the opaque string and byte array >>>> types >>>>>>> we have today. JSON is appealing because it would be easy to >>>> implement. >>>>>>> Something like Thrift or Protocol Buffers would actually be >> interesting >>>>>>> since they would be more space efficient. However, they would also >> be >>>> a >>>>>>> bit more difficult to implement because of the extra typing >> information >>>>>>> they provide. I'm hoping with Cassandra 1.0's addition of >> compression >>>> that >>>>>>> storing JSON is not too inefficient. >>>>>>> >>>>>>> Would there be interest in adding a JsonType? I could look at >> putting >>>> a >>>>>>> patch together. >>>>>>> >>>>>>> Thanks, >>>>>>> Ben >>>>>>> >>>>>>> >>>>> >>>> >> >>