Re: Document storage

2012-03-29 Thread Rick Branson
Ben,

You can create a materialized path for each field in the document:

{
[user, firstName]: ben,
[user, skills, TimeUUID]: java,
[user, skills, TimeUUID]: javascript,
[user, skills, TimeUUID]: html,
[user, education, school]: cmu,
[user, education, major]: computer science 
}

This way each field could be independently updated, and you can take 
sub-document slices with queries such as give me everything under 
user/skills. 

Rick


On Thursday, March 29, 2012 at 7:27 AM, Ben McCann wrote:

 Could you explain further how I would use CASSANDRA-3647? There's still
 very little documentation on composite columns and it was not clear to me
 whether they could be used to store document oriented data. Say for
 example that I had a document like:
 
 user: {
 firstName: 'ben',
 skills: ['java', 'javascript', 'html'],
 education {
 school: 'cmu',
 major: 'computer science'
 }
 }
 
 How would I flatten this to be stored and then reconstruct the document?
 
 
 On Thu, Mar 29, 2012 at 5:44 AM, Jake Luciani jak...@gmail.com 
 (mailto:jak...@gmail.com) wrote:
 
  Is there a reason you would prefer a JSONType over CASSANDRA-3647? It
  would seem the only thing a JSON type offers you is validation. 3647 takes
  it much further by deconstructing a JSON document using composite columns
  to flatten the document out, with the ability to access and update portions
  of the document (as well as reconstruct it).
  
  On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann b...@benmccann.com 
  (mailto:b...@benmccann.com) wrote:
  
   Hi,
   
   I was wondering if it would be interesting to add some type of
   document-oriented data type.
   
   I've found it somewhat awkward to store document-oriented data in
  Cassandra
   today. I can make a JSON/Protobuf/Thrift, serialize it, and store it,
  
  
  but
   Cassandra cannot differentiate it from any other string or byte array.
   However, if my column validation_class could be a JsonType that would
   allow tools to potentially do more interesting introspection on the
  
  
  column
   value. E.g. bug 3647
   https://issues.apache.org/jira/browse/CASSANDRA-3647calls for
   supporting arbitrarily nested documents in CQL. Running a
   query against the JSON column in Pig is possible as well, but again in
  
  
  this
   use case it would be helpful to be able to encode in column metadata that
   the column is stored as JSON. For debugging, running nightly reports,
  
  
  etc.
   it would be quite useful compared to the opaque string and byte array
  
  
  types
   we have today. JSON is appealing because it would be easy to implement.
   Something like Thrift or Protocol Buffers would actually be interesting
   since they would be more space efficient. However, they would also be a
   bit more difficult to implement because of the extra typing information
   they provide. I'm hoping with Cassandra 1.0's addition of compression
  
  
  that
   storing JSON is not too inefficient.
   
   Would there be interest in adding a JsonType? I could look at putting a
   patch together.
   
   Thanks,
   Ben
  
  
  
  
  
  --
  http://twitter.com/tjake
 





Re: RFC: Cassandra Virtual Nodes

2012-03-20 Thread Rick Branson
  I like this idea. It feels like a good 80/20 solution -- 80% of the
  benefits, 20% of the effort. More like 5% of the effort. I can't
  even enumerate all the places full vnode support would change, but an
  active token range concept would be relatively limited in scope.
 
 
 It only addresses 1 of Sam's original 5 points, so I wouldn't call it
 an 80% solution.
 
To support a form of DF, I think some tweaking of the replica placement could 
achieve this effect quite well. We could introduce a variable into replica 
placement, which I'm going to incorrectly call DF for the purposes of 
illustration. The key range for a node would be sub-divided by DF (1 by 
default) and this would be used to further distribution replica selection based 
on this sub-partition. 

Currently, the offset formula works out to be something like this:

offset = replica

For RandomPartitioner, DF placement might look something like:

offset = replica + (token % DF)

Now, I realize replica selection is actually much more complicated than this, 
but these formulas are for illustration purposes.

Modifying replica placement  the partitioners to support this seems 
straightforward, but I'm unsure of what's required to get it working for ring 
management operations. On the surface, it does seem like this could be added 
without any kind of difficult migration support. 

Thoughts?




Re: RFC: Cassandra Virtual Nodes

2012-03-19 Thread Rick Branson
I think if we could go back and rebuild Cassandra from scratch, vnodes
would likely be implemented from the beginning. However, I'm concerned that
implementing them now could be a big distraction from more productive uses
of all of our time and introduce major potential stability issues into what
is becoming a business critical piece of infrastructure for many people.
However, instead of just complaining and pedantry, I'd like to offer a
feasible alternative:

Has there been consideration given to the idea of a supporting a single
token range for a node?

While not theoretically as capable as vnodes, it seems to me to be more
practical as it would have a significantly lower impact on the codebase and
provides a much clearer migration path. It also seems to solve a majority
of complaints regarding operational issues with Cassandra clusters.

Each node would have a lower and an upper token, which would form a range
that would be actively distributed via gossip. Read and replication
requests would only be routed to a replica when the key of these operations
matched the replica's token range in the gossip tables. Each node would
locally store it's own current active token range as well as a target token
range it's moving towards.

As a new node undergoes bootstrap, the bounds would be gradually expanded
to allow it to handle requests for a wider range of the keyspace as it
moves towards it's target token range. This idea boils down to a move from
hard cutovers to smoother operations by gradually adjusting active token
ranges over a period of time. It would apply to token change operations
(nodetool 'move' and 'removetoken') as well.

Failure during streaming could be recovered at the bounds instead of
restarting the whole process as the active bounds would effectively track
the progress for bootstrap  target token changes. Implicitly these
operations would be throttled to some degree. Node repair (AES) could also
be modified using the same overall ideas provide a more gradual impact on
the cluster overall similar as the ideas given in CASSANDRA-3721.

While this doesn't spread the load over the cluster for these operations
evenly like vnodes does, this is likely an issue that could be worked
around by performing concurrent (throttled) bootstrap  node repair (AES)
operations. It does allow some kind of active load balancing, but clearly
this is not as flexible or as useful as vnodes, but you should be using
RandomPartitioner or sort-of-randomized keys with OPP right? ;)

As a side note: vnodes fail to provide solutions to node-based limitations
that seem to me to cause a substantial portion of operational issues such
as impact of node restarts / upgrades, GC and compaction induced latency. I
think some progress could be made here by allowing a pack of independent
Cassandra nodes to be ran on a single host; somewhat (but nowhere near
entirely) similar to a pre-fork model used by some UNIX-based servers.

Input?

--
Rick Branson
DataStax


Re: RFC: Cassandra Virtual Nodes

2012-03-19 Thread Rick Branson
On Mon, Mar 19, 2012 at 4:45 PM, Peter Schuller
peter.schul...@infidyne.com wrote:
  As a side note: vnodes fail to provide solutions to node-based limitations
  that seem to me to cause a substantial portion of operational issues such
  as impact of node restarts / upgrades, GC and compaction induced latency. I

 Actually, it does. At least assumign DF  RF (as in the original
 proposal, and mine). The impact of a node suffering from a performance
 degradation is mitigated because the effects are spread out over DF-1
 (N-1 in the original post) nodes instead of just RF nodes.

You've got me on one of those after some re-thought. For any node
outage (an upgrade/restart) definitely has a big impact by distributed
the load more evenly, but (and correct me if I'm wrong) for things
like additional latency caused by GC/compaction, those requests will
just be slower rather than timing out or getting redirected via the
dynamic snitch.

  think some progress could be made here by allowing a pack of independent
  Cassandra nodes to be ran on a single host; somewhat (but nowhere near
  entirely) similar to a pre-fork model used by some UNIX-based servers.

 I have pretty significant knee-jerk negative reactions to that idea to
 be honest, even if the pack is limited to a handful of instances. In
 order for vnodes to be useful with random placement, we'd need much
 more than a handful of vnodes per node (cassandra instances in a
 pack in that model).


Fair enough, I'm not super fond of the idea personally, but I don't
see a way around the limitations of the current JVM GC without
multiple processes.

After some rethinking my ideas a bit, I think actually what I've
settled a bit more on is to keep the existing node tokens, but add an
additional active token that would be used to determine the data
range that a node is ready to receive reads for. This should gain all
of the benefits highlighted in my earlier post, but with less
complexity in implementation. Node repair (AES) would still allow
ranges to be specified.