RE: Questions about high read latency and related metrics

2023-05-17 Thread Michail Kotsiouros via user
Hello C* community,
I have been experimenting a bit with my lab node. I am assuming the following 
as observing the progress of metrics over time:

  1.  EstimatedPartitionSizeHistogram metric derives from READ operations. 
Cassandra reports values to this metric as it serves Read queries.
  2.  PartitionSize derives from the Compaction activities. 
Cassandra reports values to this metric as it performs the compaction of 
sstables.

I am not sure whether those assumptions are valid but at least provide a good 
explanation to the progress of the stats observed.
Thanks a lot and CU on the next topic.

BR
MK
From: Michail Kotsiouros via user 
Sent: Thursday, May 11, 2023 14:08
To: user@cassandra.apache.org
Subject: RE: Questions about high read latency and related metrics

Hello Erick,
No Max/Min/Mean vs Histogram difference is clear.
What confuses me is the description of those metrics:
Size of the  compacted partition (in bytes). Vs 
estimated partition size.
I am after what is measured by each metric.
To be more specific:
What metric should be consider when we want to see the partition size over time?
Does this “compacted partition” means that only the partitions which have 
undergone a compaction in the respective sstables are taken into account for 
PartitionSize  metrics?
What “estimated” means in the  EstimatedPartitionSizeHistogram  metric?
Excuse me if those questions sound trivial.
BR
MK

From: Erick Ramirez 
mailto:erickramire...@apache.org>>
Sent: Thursday, May 11, 2023 13:16
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>; Michail 
Kotsiouros 
mailto:michail.kotsiou...@ericsson.com>>
Subject: Re: Questions about high read latency and related metrics

Is it the concept of histograms that's not clear? Something else?


RE: Questions about high read latency and related metrics

2023-05-11 Thread Michail Kotsiouros via user
Hello Erick,
No Max/Min/Mean vs Histogram difference is clear.
What confuses me is the description of those metrics:
Size of the  compacted partition (in bytes). Vs 
estimated partition size.
I am after what is measured by each metric.
To be more specific:
What metric should be consider when we want to see the partition size over time?
Does this “compacted partition” means that only the partitions which have 
undergone a compaction in the respective sstables are taken into account for 
PartitionSize  metrics?
What “estimated” means in the  EstimatedPartitionSizeHistogram  metric?
Excuse me if those questions sound trivial.
BR
MK

From: Erick Ramirez 
Sent: Thursday, May 11, 2023 13:16
To: user@cassandra.apache.org; Michail Kotsiouros 

Subject: Re: Questions about high read latency and related metrics

Is it the concept of histograms that's not clear? Something else?


Re: Questions about high read latency and related metrics

2023-05-11 Thread Erick Ramirez
Is it the concept of histograms that's not clear? Something else?

>


RE: Questions about high read latency and related metrics

2023-05-11 Thread Michail Kotsiouros via user
Hello Erick,
Thanks a lot for the immediate reply but still the difference between those 2 
metrics is not clear to me.

BR
MK

From: Erick Ramirez 
Sent: Thursday, May 11, 2023 13:04
To: user@cassandra.apache.org
Subject: Re: Questions about high read latency and related metrics

The min/max/mean partition sizes are the sizes in bytes which are the same 
statistics reported by nodetool tablestats.

EstimatedPartitionSizeHistogram is the distribution of partition sizes within 
specified ranges (percentiles) and is the same histogram reported by nodetool 
tablehistograms (in the Partition Size column). Cheers!


Re: Questions about high read latency and related metrics

2023-05-11 Thread Erick Ramirez
The min/max/mean partition sizes are the sizes in bytes which are the same
statistics reported by nodetool tablestats.

EstimatedPartitionSizeHistogram is the distribution of partition sizes
within specified ranges (percentiles) and is the same histogram reported by
nodetool tablehistograms (in the Partition Size column). Cheers!

>


Questions about high read latency and related metrics

2023-05-11 Thread Michail Kotsiouros via user
Hello Cassandra community,
I see the following metrics in JMX
Metric Name

org.apache.cassandra.metrics.Table...

MinPartitionSize
Gauge
Size of the smallest compacted partition (in bytes).
MaxPartitionSize
Gauge
Size of the largest compacted partition (in bytes).
MeanPartitionSize
Gauge
Size of the average compacted partition (in bytes).

And

EstimatedPartitionSizeHistogram
Gauge
Histogram of estimated partition size (in bytes).

Could you, please, help me clarify the difference of those 2 metrics.

We suspect that the increasing partition size by the application data model has 
an impact on Read latency.
What would be the appropriate metric to monitor from 
PartitionSize and EstimatedPartitionSizeHistogram.

BR
Michail Kotisouros