On Tue, Jun 29, 2021 at 5:49 PM Scott Carey <scottca...@apache.org> wrote:
>
> I'd like to discuss the inclusion of the above tickets for a 3.11.x
> release.  These are not a pure 'bug fix' so I'll need a waiver to get them
> into 3.11.x  (and implicitly, 4.0.x).
>
> The first two are straightforward oversights:  neither *nodetool
> garbagecollect *nor *nodetool scrub* currently accept a *--user-defined*
> parameter list of SSTables in the same way that *nodetool compact* does.
>
> This is an operational problem for large tables.
>
> I often need to scrub just one file that is corrupted for some reason, and
> not scrub an entire 1TB+ of data for a table on a node.  This renders
> 'nodetool scrub' operationally useless for large tables.

I think that given not having user defined options for these
compaction types is clearly an oversight, and that the alternative of
deleting the large 1TB+ sstable and then repairing is a cure worse
than the disease, this should be added to 3.11.x and 4.0.x. I am +1
here.

> For *garbagecollect* it is often operationally easy to identify which
> tables are likely to be full of bloa- and operationally useful to do this
> task in small increments.  The existing order that garbagecollect processes
> SSTables prevents it from being useful in any incremental fashion -- if you
> stop it and later restart, it will first process the SSTables you just
> garbage collected.
>
> The third ticket adds an option for* nodetool garbagecollect*,
> *--oldest-fraction* that can select a fraction of the oldest table data in
> bytes, and garbagecollect only the SSTables that 'cover' that percentage of
> data.  Operationally, this lends itself to easy automation -- for example
> running this once a week on 10% of a table's data would imply that there is
> no data on disk that has been overwritten within the last 10 weeks.  This
> caps data bloat in ways neither LCS nor STCS can currently achieve without
> regular major compactions or full-pass garbagecollect.

This is a less obvious thing to be added, and I personally lack the
operational experience to comment on how much relief this would
provide firsthand, so I'll leave that to others.  But it does make
sense to me and since it isn't heavily modifying anything my
inclination is that this could be an acceptable addition as well.

Kind Regards,
Brandon

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to