On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna <isa...@xsense.co> wrote:

> Hi
>
> I have a table that stores time series data, the data is not TTLed since
> we want to retain the data for the foreseeable future, and there are no
> updates or deletes. (deletes could happens rarely in case some scrambled
> data reached the table, but its extremely rare).
>
> Usually we do constant write of incoming data to the table ~ 5 milion a
> day, mostly newly generated data in the past week, but we also get old data
> that got stuck somewhere but not that often. Usually our reads are for the
> most recent data last month – three. But we do fetch old data as well in a
> specific time period in the past.
>
> Lately we have been facing performance trouble with this table see
> histogram below, When compaction is working on the table the performance
> even drops to 10-20 seconds!!
>
> Percentile  SSTables     Write Latency      Read Latency    Partition
> Size        Cell Count
>
>                               (micros)          (micros)           (bytes)
>
> 50%           215.00             17.08          89970.66
> 1916               149
>
> 75%           446.00             24.60         223875.79
> 2759               215
>
> 95%           535.00             35.43         464228.84
> 8239               642
>
> 98%           642.00             51.01         668489.53
> 24601              1916
>
> 99%           642.00             73.46         962624.93
> 42510              3311
>
> Min             0.00              2.30          10090.81
> 43                 0
>
> Max           770.00           1358.10        2395318.86
> 5839588            454826
>
>
>
> As u can see we are scaning hundreds of sstables, turns out we are using
> DTCS  (min:4,max32) , the table folder contains ~33K files  of ~130GB per
> node (cleanup pending after increasing the cluster), And compaction takes a
> very long time to complete.
>
> As I understood DTCS is deprecated so my questions
>
>    1. should we switch to TWCS even though our data is not TTLed since we
>    do not do delete at all can we still use it? Will it improve performance?
>
> It will probably be better than DTCS here, but you'll still have
potentially lots of sstables over time.

Lots of sstables in itself isn't a big deal, the problem comes from
scanning more than a handful on each read. Does your table have some form
of date bucketing to avoid touching old data files?



>
>    1. If we should switch I am thinking of using a time window of a week,
>    this way the read will scan 10s of sstables instead of hundreds today. Does
>    it sound reasonable?
>
> 10s is better than hundreds, but it's still a lot.


>
>    1. Is there a recommended size of a window bucket in terms of disk
>    space?
>
> When I wrote it, I wrote it for a use case that had 30 windows over the
whole set of data. Since then, I've seen it used with anywhere from 5 to 60
buckets.
With no TTL, you're effectively doing infinite buckets. So the only way to
ensure you're not touching too many sstables is to put the date (in some
form) into the partition key and let the database use that (+bloom filters)
to avoid reading too many sstables.

>
>    1. If TWCS is not a good idea should I switch to STCS instead could
>    that yield in better performance than current situation?
>
> LCS will give you better read performance. STCS will probably be better
than DTCS given the 215 sstable p50 you're seeing (which is crazy btw, I'm
surprised you're not just OOMing)


>    1. What are the risk of changing compaction strategy on a production
>    system, can it be done on the fly? Or its better to go through a full test,
>    backup cycle?
>
>
The risk is you trigger a ton of compactions which drops the performance of
the whole system all at once and your front door queries all time out.
You can approach this a few ways:
- Use the JMX endpoint to change compaction on one instance at a time
(rather than doing it in the schema), which lets you control how many nodes
are re-writing all their data at any given point in time
- You can make an entirely new table, and then populate it by reading from
the old one and writing ot the new one, and then you dont have the massive
compaction kick off
- You can use user defined compaction to force compact some of those 33k
sstables into fewer sstables in advance, hopefully taking away some of the
pain you're seeing, before you fire off the big compaction

The 3rd hint above - user defined compaction - will make TWCS less
effective, because TWCS uses the max timestamp per sstable for bucketing,
and you'd be merging sstables and losing granularity.

Really though, the main thing you need to do is get a time component in
your partition key so you avoid scanning every sstable looking for data,
either that or bite the bullet and use LCS so the compaction system keeps
it at a manageable level for reads.



>
>    1.
>
> All input will be appreciated,
>
> Thank you
>

Reply via email to