I failed to address the matter of not knowing the families in advance. I can't really recommend any solution to that other than storing the list of families in another structure that is readily queryable. I don't know how many families you are thinking, but if it is in the millions or more, You might consider constructing another table such as: CREATE TABLE families ( key int, family text, PRIMARY KEY (key, family) );
store your families there, with a knowable set of keys (I suggest something like the last 3 digits of the md5 hash of the family). So then you could retrieve your families in nice sized batches SELECT family FROM id WHERE key=0; and then do the fan-out selects that I described previously. -Tupshin On Tue, Feb 25, 2014 at 10:15 PM, Tupshin Harper <tups...@tupshin.com>wrote: > Hi Clint, > > What you are describing could actually be accomplished with the Thrift API > and a multiget_slice with a slicerange having a count of 1. Initially I was > thinking that this was an important feature gap between Thrift and CQL, and > was going to suggest that it should be implemented (possible syntax is in > https://issues.apache.org/jira/browse/CASSANDRA-6167 which is almost a > superset of this feature). > > But then I was convinced by some colleagues, that with a modern CQL driver > that is token aware, you are actually better off (in terms of latency, > throughput, and reliability), by doing each query separately on the client. > > The reasoning is that if you did this with a single query, it would > necessarily be sent to a coordinator that wouldn't own most of the data > that you are looking for. That coordinator would then need to fan out the > read to all the nodes owning the partitions you are looking for. > > Far better to just do it directly on the client. The token aware client > will send each request for a row straight to a node that owns it. With a > separate connection open to each node, this is done in parallel from the > get-go. Fewer hops. Less load on the coordinator. No bottlenecks. And with > a stored procedure, very very little additional overhead to the client, > server, or network. > > -Tupshin > > > On Tue, Feb 25, 2014 at 7:48 PM, Clint Kelly <clint.ke...@gmail.com>wrote: > >> Hi everyone, >> >> Let's say that I have a table that looks like the following: >> >> CREATE TABLE time_series_stuff ( >> key text, >> family text, >> version int, >> val text, >> PRIMARY KEY (key, family, version) >> ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND >> bloom_filter_fp_chance=0.010000 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.000000 AND >> gc_grace_seconds=864000 AND >> index_interval=128 AND >> read_repair_chance=0.100000 AND >> replicate_on_write='true' AND >> populate_io_cache_on_flush='false' AND >> default_time_to_live=0 AND >> speculative_retry='99.0PERCENTILE' AND >> memtable_flush_period_in_ms=0 AND >> compaction={'class': 'SizeTieredCompactionStrategy'} AND >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> cqlsh:fiddle> select * from time_series_stuff ; >> >> key | family | version | val >> --------+---------+---------+-------- >> monday | revenue | 3 | $$$$$$ >> monday | revenue | 2 | $$$ >> monday | revenue | 1 | $$ >> monday | revenue | 0 | $ >> monday | traffic | 2 | medium >> monday | traffic | 1 | light >> monday | traffic | 0 | heavy >> >> (7 rows) >> >> Now let's say that I'd like to perform a query that gets me the most >> recent N versions of "revenue" and "traffic." >> >> Is there a CQL query to do this? Let's say that N=1. Then I know that I >> can do: >> >> cqlsh:fiddle> select * from time_series_stuff where key='monday' and >> family='revenue' limit 1; >> >> key | family | version | val >> --------+---------+---------+-------- >> monday | revenue | 3 | $$$$$$ >> >> (1 rows) >> >> cqlsh:fiddle> select * from time_series_stuff where key='monday' and >> family='traffic' limit 1; >> >> key | family | version | val >> --------+---------+---------+-------- >> monday | traffic | 2 | medium >> >> (1 rows) >> >> But what if I have lots of "families" and I want to get the most recent N >> versions of all of them in a single CQL statement. Is that possible? >> Unfortunately I am working on something where the family names and the >> number of most-recent versions are not known a priori (I am porting some >> code that was designed for HBase). >> >> Best regards, >> Clint >> > >