Re: Document storage
Ben, You can create a materialized path for each field in the document: { [user, firstName]: ben, [user, skills, TimeUUID]: java, [user, skills, TimeUUID]: javascript, [user, skills, TimeUUID]: html, [user, education, school]: cmu, [user, education, major]: computer science } This way each field could be independently updated, and you can take sub-document slices with queries such as give me everything under user/skills. Rick On Thursday, March 29, 2012 at 7:27 AM, Ben McCann wrote: Could you explain further how I would use CASSANDRA-3647? There's still very little documentation on composite columns and it was not clear to me whether they could be used to store document oriented data. Say for example that I had a document like: user: { firstName: 'ben', skills: ['java', 'javascript', 'html'], education { school: 'cmu', major: 'computer science' } } How would I flatten this to be stored and then reconstruct the document? On Thu, Mar 29, 2012 at 5:44 AM, Jake Luciani jak...@gmail.com (mailto:jak...@gmail.com) wrote: Is there a reason you would prefer a JSONType over CASSANDRA-3647? It would seem the only thing a JSON type offers you is validation. 3647 takes it much further by deconstructing a JSON document using composite columns to flatten the document out, with the ability to access and update portions of the document (as well as reconstruct it). On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann b...@benmccann.com (mailto:b...@benmccann.com) wrote: Hi, I was wondering if it would be interesting to add some type of document-oriented data type. I've found it somewhat awkward to store document-oriented data in Cassandra today. I can make a JSON/Protobuf/Thrift, serialize it, and store it, but Cassandra cannot differentiate it from any other string or byte array. However, if my column validation_class could be a JsonType that would allow tools to potentially do more interesting introspection on the column value. E.g. bug 3647 https://issues.apache.org/jira/browse/CASSANDRA-3647calls for supporting arbitrarily nested documents in CQL. Running a query against the JSON column in Pig is possible as well, but again in this use case it would be helpful to be able to encode in column metadata that the column is stored as JSON. For debugging, running nightly reports, etc. it would be quite useful compared to the opaque string and byte array types we have today. JSON is appealing because it would be easy to implement. Something like Thrift or Protocol Buffers would actually be interesting since they would be more space efficient. However, they would also be a bit more difficult to implement because of the extra typing information they provide. I'm hoping with Cassandra 1.0's addition of compression that storing JSON is not too inefficient. Would there be interest in adding a JsonType? I could look at putting a patch together. Thanks, Ben -- http://twitter.com/tjake
Re: RFC: Cassandra Virtual Nodes
I like this idea. It feels like a good 80/20 solution -- 80% of the benefits, 20% of the effort. More like 5% of the effort. I can't even enumerate all the places full vnode support would change, but an active token range concept would be relatively limited in scope. It only addresses 1 of Sam's original 5 points, so I wouldn't call it an 80% solution. To support a form of DF, I think some tweaking of the replica placement could achieve this effect quite well. We could introduce a variable into replica placement, which I'm going to incorrectly call DF for the purposes of illustration. The key range for a node would be sub-divided by DF (1 by default) and this would be used to further distribution replica selection based on this sub-partition. Currently, the offset formula works out to be something like this: offset = replica For RandomPartitioner, DF placement might look something like: offset = replica + (token % DF) Now, I realize replica selection is actually much more complicated than this, but these formulas are for illustration purposes. Modifying replica placement the partitioners to support this seems straightforward, but I'm unsure of what's required to get it working for ring management operations. On the surface, it does seem like this could be added without any kind of difficult migration support. Thoughts?
Re: RFC: Cassandra Virtual Nodes
I think if we could go back and rebuild Cassandra from scratch, vnodes would likely be implemented from the beginning. However, I'm concerned that implementing them now could be a big distraction from more productive uses of all of our time and introduce major potential stability issues into what is becoming a business critical piece of infrastructure for many people. However, instead of just complaining and pedantry, I'd like to offer a feasible alternative: Has there been consideration given to the idea of a supporting a single token range for a node? While not theoretically as capable as vnodes, it seems to me to be more practical as it would have a significantly lower impact on the codebase and provides a much clearer migration path. It also seems to solve a majority of complaints regarding operational issues with Cassandra clusters. Each node would have a lower and an upper token, which would form a range that would be actively distributed via gossip. Read and replication requests would only be routed to a replica when the key of these operations matched the replica's token range in the gossip tables. Each node would locally store it's own current active token range as well as a target token range it's moving towards. As a new node undergoes bootstrap, the bounds would be gradually expanded to allow it to handle requests for a wider range of the keyspace as it moves towards it's target token range. This idea boils down to a move from hard cutovers to smoother operations by gradually adjusting active token ranges over a period of time. It would apply to token change operations (nodetool 'move' and 'removetoken') as well. Failure during streaming could be recovered at the bounds instead of restarting the whole process as the active bounds would effectively track the progress for bootstrap target token changes. Implicitly these operations would be throttled to some degree. Node repair (AES) could also be modified using the same overall ideas provide a more gradual impact on the cluster overall similar as the ideas given in CASSANDRA-3721. While this doesn't spread the load over the cluster for these operations evenly like vnodes does, this is likely an issue that could be worked around by performing concurrent (throttled) bootstrap node repair (AES) operations. It does allow some kind of active load balancing, but clearly this is not as flexible or as useful as vnodes, but you should be using RandomPartitioner or sort-of-randomized keys with OPP right? ;) As a side note: vnodes fail to provide solutions to node-based limitations that seem to me to cause a substantial portion of operational issues such as impact of node restarts / upgrades, GC and compaction induced latency. I think some progress could be made here by allowing a pack of independent Cassandra nodes to be ran on a single host; somewhat (but nowhere near entirely) similar to a pre-fork model used by some UNIX-based servers. Input? -- Rick Branson DataStax
Re: RFC: Cassandra Virtual Nodes
On Mon, Mar 19, 2012 at 4:45 PM, Peter Schuller peter.schul...@infidyne.com wrote: As a side note: vnodes fail to provide solutions to node-based limitations that seem to me to cause a substantial portion of operational issues such as impact of node restarts / upgrades, GC and compaction induced latency. I Actually, it does. At least assumign DF RF (as in the original proposal, and mine). The impact of a node suffering from a performance degradation is mitigated because the effects are spread out over DF-1 (N-1 in the original post) nodes instead of just RF nodes. You've got me on one of those after some re-thought. For any node outage (an upgrade/restart) definitely has a big impact by distributed the load more evenly, but (and correct me if I'm wrong) for things like additional latency caused by GC/compaction, those requests will just be slower rather than timing out or getting redirected via the dynamic snitch. think some progress could be made here by allowing a pack of independent Cassandra nodes to be ran on a single host; somewhat (but nowhere near entirely) similar to a pre-fork model used by some UNIX-based servers. I have pretty significant knee-jerk negative reactions to that idea to be honest, even if the pack is limited to a handful of instances. In order for vnodes to be useful with random placement, we'd need much more than a handful of vnodes per node (cassandra instances in a pack in that model). Fair enough, I'm not super fond of the idea personally, but I don't see a way around the limitations of the current JVM GC without multiple processes. After some rethinking my ideas a bit, I think actually what I've settled a bit more on is to keep the existing node tokens, but add an additional active token that would be used to determine the data range that a node is ready to receive reads for. This should gain all of the benefits highlighted in my earlier post, but with less complexity in implementation. Node repair (AES) would still allow ranges to be specified.