Re: Getting the most-recent version from time-series data

2014-02-28 Thread Clint Kelly
Hi Tupshin, Thanks for your help once again, I really appreciate it. Quick question regarding the issue of token-aware routing, etc. Let's say that I am using the table described earlier: CREATE TABLE time_series_stuff ( key text, family text, version int, val text, PRIMARY KEY (key,

Re: Getting the most-recent version from time-series data

2014-02-28 Thread Clint Kelly
Hi Tupshin, BTW, you asked earlier about the number of different distinct family values. There could easily be millions of different families, each with many different values. Right now I see two options: 1. Query the table once just to get all of the distinct families, then do separate

Re: Getting the most-recent version from time-series data

2014-02-28 Thread Tupshin Harper
You are correct that with that schema, all data for a give key would be in a single partition, and hence on the same node(s). I missed that before. -Tupshin On Fri, Feb 28, 2014 at 12:47 PM, Clint Kelly clint.ke...@gmail.com wrote: Hi Tupshin, Thanks for your help once again, I really

Re: Getting the most-recent version from time-series data

2014-02-26 Thread Tupshin Harper
And one last clarification. Where I said stored procedure earlier, I meant prepared statement. Sorry for the confusion. Too much typing while tired. -Tupshin On Tue, Feb 25, 2014 at 10:36 PM, Tupshin Harper tups...@tupshin.comwrote: I failed to address the matter of not knowing the families

Re: Getting the most-recent version from time-series data

2014-02-25 Thread Clint Kelly
Hi Jonathan, Thanks for the suggestion! I see a couple of problems with this approach: 1. I do not know a priori all of the family names (so I still would not know what value to use for LIMIT). 2. The versions here are similar to timestamps, so one family may get updated far more often than

Re: Getting the most-recent version from time-series data

2014-02-25 Thread Tupshin Harper
Hi Clint, What you are describing could actually be accomplished with the Thrift API and a multiget_slice with a slicerange having a count of 1. Initially I was thinking that this was an important feature gap between Thrift and CQL, and was going to suggest that it should be implemented (possible

Re: Getting the most-recent version from time-series data

2014-02-25 Thread Tupshin Harper
I failed to address the matter of not knowing the families in advance. I can't really recommend any solution to that other than storing the list of families in another structure that is readily queryable. I don't know how many families you are thinking, but if it is in the millions or more, You