In practice, the performance you’re getting is likely to be impacted by your reading patterns.  If you do a lot of sequential reads where key1 and key2 stay the same, and only key3 varies, then you may be getting better peformance out of the second option due to hitting the row and disk caches more often. If you are doing a lot of scatter reads, then you’re likely to get better performance out of the first option, because the reads will be distributed more evenly to multiple nodes.  It also depends on how large rows you’re planning to use, as this will directly impact things like compaction which has an overall impact of the entire cluster speed.  For just a few values of key3, I doubt there would be much difference in performance, but if key3 has a cardinality of say, a million, you might be better off with option 1.

As always the advice is - benchmark your intended use case - put a few hundred gigs of mock data to a cluster, trigger compactions and do perf tests for different kinds of read/write loads. :-)

(Though if I didn’t know what my read pattern would be, I’d probably go for option 1 purely on a gut feeling if I was sure I would never need range queries on key3; shorter rows *usually* are a bit better for performance, compaction, etc.  Really wide rows can sometimes be a headache operationally.)

May you have energy and success!
/Janne



On 28 Dec 2016, at 16:44, Manoj Khangaonkar <khangaon...@gmail.com> wrote:

In the first case, the partitioning is based on key1,key2,key3.

In the second case, partitioning is based on key1 , key2. Additionally you have a clustered key key3. This means within a partition you can do range queries on key3 efficiently. That is the difference.

regards

On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot <voytek.jar...@gmail.com> wrote:
Wondering if there's a difference when querying by primary key between the two definitions below:

primary key ((key1, key2, key3))
primary key ((key1, key2), key3)

In terms of read speed/efficiency... I don't have much of a reason otherwise to prefer one setup over the other, so would prefer the most efficient for querying.

Thanks.



--

Reply via email to