Re: Performance of Data Types used for Primary keys

Reid Pinchback Fri, 06 Mar 2020 07:43:45 -0800

If you care about low-latency reads, I’d worry less about columnar data types, 
and more about the general quality of the data modeling and usage patterns, and 
tuning the things that you see cause latency spikes.  There isn’t just a single 
cause to latency spikes, so expect to spend a couple of months playing 
whack-a-mole as you identify root causes.


What you’re likely going to see most impacting latency variance are GC and I/O 
artifacts.  That’s a quick thing to say, but isolating what specifically to do, 
that’s where the hard work comes in.  Overly-simplistic guesses on what to do, 
I haven’t seen pan out very well. A lot of the tuning knobs in C* can start to 
feel like a kid’s teeter-totter, because making one dynamic better is sometimes 
at the expense of making something else be worse. Quality metric gathering and 
heap examinations will be your friend, and expect to do bursts of per-second 
and sometimes sub-second metric examinations.  I/O in particular, you often 
won’t realize what is going on without a high enough metric frequency to see 
when and how I/O ops are suddenly getting queued up.

Throughput in C* is easier to tune for than latency, and writes are easier to 
have fast than the reads because of how C* is designed.  Latency on reads, 
you’re in your worst-case tuning scenario. particularly if you’re looking for 
tight latency at 3 9’s.

Don’t forget to see how your numbers stack up during repairs.  That includes 
both nodetool or reaper-managed repairs, but per my comment on usage patterns, 
if you have antipatterns like write-then-read-back going on, under the hood 
you’ll be triggering the equivalent of localized repairs.  All of that adds to 
GC pressure, and hence to latency variance.
From: "Hanauer, Arnulf, Vodacom South Africa (External)" 
<arnulf.hana...@vcontractor.co.za>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, March 6, 2020 at 5:15 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Performance of Data Types used for Primary keys

Message from External Sender
Hi Cassandra folks,

Is there any difference in performance of general operations if using a TEXT 
based Primary key versus a BIGINT Primary key.

Our use-case requires low latency reads but currently the Primary key is TEXT 
based but the data could work on BIGINT. We are trying to optimise where 
possible.
Any experiences that could point to a winner?


Kind regards
Arnulf Hanauer











"This e-mail is sent on the Terms and Conditions that can be accessed by 
Clicking on this link 
https://webmail.vodacom.co.za/tc/default.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.vodacom.co.za_vodacom_terms_email-2Dacceptable-2Duser-2Dpolicy&d=DwMFAg&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=57fboCMzTVES21tjhLMKhiwSSDcQSxciDyaBdC6yJtA&s=1dy32y2P5dHUOOpQLhr-0I6Tu1EjX4bJoduN8jq3Nwg&e=>
 "

Re: Performance of Data Types used for Primary keys

Reply via email to