Thank you Janne. Yes, these are random-access (scatter) reads - I've decided on option 1; having also considered (as you wrote) that it will never make sense to look at ranges of key3.
On Fri, Dec 30, 2016 at 3:40 AM, Janne Jalkanen <janne.jalka...@ecyrd.com> wrote: > In practice, the performance you’re getting is likely to be impacted by > your reading patterns. If you do a lot of sequential reads where key1 and > key2 stay the same, and only key3 varies, then you may be getting better > peformance out of the second option due to hitting the row and disk caches > more often. If you are doing a lot of scatter reads, then you’re likely to > get better performance out of the first option, because the reads will be > distributed more evenly to multiple nodes. It also depends on how large > rows you’re planning to use, as this will directly impact things like > compaction which has an overall impact of the entire cluster speed. For > just a few values of key3, I doubt there would be much difference in > performance, but if key3 has a cardinality of say, a million, you might be > better off with option 1. > > As always the advice is - benchmark your intended use case - put a few > hundred gigs of mock data to a cluster, trigger compactions and do perf > tests for different kinds of read/write loads. :-) > > (Though if I didn’t know what my read pattern would be, I’d probably go > for option 1 purely on a gut feeling if I was sure I would never need range > queries on key3; shorter rows *usually* are a bit better for performance, > compaction, etc. Really wide rows can sometimes be a headache > operationally.) > > May you have energy and success! > /Janne > > > > On 28 Dec 2016, at 16:44, Manoj Khangaonkar <khangaon...@gmail.com> wrote: > > In the first case, the partitioning is based on key1,key2,key3. > > In the second case, partitioning is based on key1 , key2. Additionally you > have a clustered key key3. This means within a partition you can do range > queries on key3 efficiently. That is the difference. > > regards > > On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot <voytek.jar...@gmail.com> > wrote: > >> Wondering if there's a difference when querying by primary key between >> the two definitions below: >> >> primary key ((key1, key2, key3)) >> primary key ((key1, key2), key3) >> >> In terms of read speed/efficiency... I don't have much of a reason >> otherwise to prefer one setup over the other, so would prefer the most >> efficient for querying. >> >> Thanks. >> > > > > -- > http://khangaonkar.blogspot.com/ > > >