I think this would beneficial since we have a loose agreement that UCS is a
promising option as new default.

I plan to summarize the views expressed in this thread to propose a plan to
make compactions usability smoother to new users in Cassandra, if there are
any short term actions we can agree to address outstanding issues.

On Sun, 8 Dec 2024 at 12:48 Jordan West <jw...@apache.org> wrote:

> While we continue the discussion here on short term defaults do we all
> feel it would be beneficial to start a new thread on what is required to
> get UCS over the line as a default? So we can have both discussions going
> at once?
>
> On Sun, Dec 8, 2024 at 8:44 AM Paulo Motta <pa...@apache.org> wrote:
>
>>
>> Hi Dave,
>>
>> I appreciate these performance/cost considerations and I believe these
>> should be taken into account when evaluating default changes.
>>
>> I am trying to frame this as an usability issue with the database by
>> shipping with STCS by default.
>>
>> I think it's possible to classify workloads into two types:
>> a) Mutable (superset)
>> b) Immutable (or semi-immutable)
>>
>> The majority of current use cases might be b) Immutable, but shipping
>> with STCS provides a bad user experience to users of a) Mutable use cases.
>> In turn, this reinforces that "cassandra is not good for mutable use cases".
>>
>> I believe the use cases that will be covered by CQL Transactions tend to
>> be a) mutable, and it might make sense to optimize to this reality.
>>
>> Existing users of immutable use cases are familiar with STCS and can
>> remain using this choice.
>>
>> Thanks,
>>
>> Paulo
>>
>> On Sun, Dec 8, 2024 at 10:33 AM Dave Herrington <he...@rhinosource.com>
>> wrote:
>>
>>> …the analysis I describe would need to be weighted by table size.  I
>>> have several representative production cluster tablestats analyses that
>>> show r:w ratio by table, including table size.  I can check to see how this
>>> analysis plays out on a few of these.
>>>
>>> -Dave
>>>
>>> David A. Herrington II
>>> President and Chief Engineer
>>> RhinoSource, Inc.
>>>
>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>
>>> www.rhinosource.com
>>>
>>>
>>> On Sun, Dec 8, 2024 at 7:22 AM Dave Herrington <he...@rhinosource.com>
>>> wrote:
>>>
>>>> Paulo,
>>>>
>>>> I understand your perspective.
>>>>
>>>> Short of waiting for UCS to prove itself out, I guess it comes down to
>>>> the assertion that a strong majority of Cassandra use cases would benefit
>>>> from using LCS vs. STCS.
>>>>
>>>> The conventional wisdom is that workloads need to be read-heavy to make
>>>> the extra resource consumption of LCS pay off.  4:1 read:write is the
>>>> threshold I use to decide whether or not to use LCS.
>>>>
>>>> I think this ratio is important in this analysis.  Has this LCS
>>>> “payoff” threshold changed to 2:1 or better, in favor of LCS?  This would
>>>> be good to know.
>>>>
>>>> With an up-to-date threshold in hand, what is the fraction of Cassandra
>>>> use cases that meet this updatedthreshold?
>>>>
>>>> For example, say this LCS payoff r:w ratio has improved to 2:1.  What
>>>> percentage of Cassandra tables across all clusters currently in operation
>>>> are 2:1 read-to-write or more?
>>>>
>>>> If the answer is a solid majority, I think this would justify the
>>>> default change.
>>>>
>>>> -Dave
>>>>
>>>> David A. Herrington II
>>>> President and Chief Engineer
>>>> RhinoSource, Inc.
>>>>
>>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>>
>>>> www.rhinosource.com
>>>>
>>>>
>>>> On Sun, Dec 8, 2024 at 5:43 AM Paulo Motta <pa...@apache.org> wrote:
>>>>
>>>>> Hi Dave,
>>>>>
>>>>> I'm also in the field and my experience is different.
>>>>>
>>>>> I have seen new users shooting themselves in the foot with the default
>>>>> compaction strategy STCS on a regular basis over the past few years and
>>>>> have been recommending them to switch to LCS and they no longer encounter
>>>>> issues after making this switch. I would like to generalize this
>>>>> recommendation to prevent new users from having bad experiences and
>>>>> abandoning the database.
>>>>>
>>>>> This is not a cost issue, it's an ease of use matter. STCS does not
>>>>> work for mutable workloads and this is a massive functional limitation 
>>>>> with
>>>>> the database.
>>>>>
>>>>> I don't want people to download Cassandra 5.1 to try out transactions
>>>>> and start facing issues due to bad STCS performance on mutable data.
>>>>>
>>>>> If you would like to optimize for cost, then you can read the docs or
>>>>> hire a consultant to optimize the cost for you. Otherwise, the database
>>>>> should work out of the box and this is provided by LCS. If LCS can not 
>>>>> keep
>>>>> up, it means the cluster is under provisioned and needs to be expanded,
>>>>> it's not a functional issue but a capacity issue.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Paulo
>>>>>
>>>>> On Sun, Dec 8, 2024 at 1:26 AM Dave Herrington <he...@rhinosource.com>
>>>>> wrote:
>>>>>
>>>>>> Chiming in from the field, I think maintaining the familiar status
>>>>>> quo until a panacea compaction strategy proves itself out (could that be
>>>>>> UCS?) makes sense to me.  I feel it could be maddening to customers if 
>>>>>> LCS
>>>>>> started showing up in schemas after an upgrade just because the default
>>>>>> changed.  If UCS proves itself as the fits-all solution, then we’d be 
>>>>>> doing
>>>>>> them a favor by making the default. In time.
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> David A. Herrington II
>>>>>> President and Chief Engineer
>>>>>> RhinoSource, Inc.
>>>>>>
>>>>>> *Data Lake Architecture, Cloud Computing and Advanced Analytics.*
>>>>>>
>>>>>> www.rhinosource.com
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 7, 2024 at 7:32 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Dec 7, 2024, at 7:08 PM, Mick Semb Wever <m...@apache.org> wrote:
>>>>>>>
>>>>>>> Chiming in with my two cents…
>>>>>>>
>>>>>>>
>>>>>>> When people have the luxury of working in environments where
>>>>>>>> clusters are massively over provisioned, LCS as a default makes a lot 
>>>>>>>> of
>>>>>>>> sense, because there's not much downside.  The use cases where you'd
>>>>>>>> actually fall behind in compaction are pretty slim, so the negative 
>>>>>>>> impact
>>>>>>>> isn't felt.
>>>>>>>>
>>>>>>>> Most people aren't doing this.  Putting LCS as the default
>>>>>>>> significantly changes the performance profile of new clusters in a way 
>>>>>>>> that
>>>>>>>> actively harms a portion of the community.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Haddad's statement here resonates above everything else that's been
>>>>>>> said so far.  It is this particular audience that I'm thinking first 
>>>>>>> about
>>>>>>> not screwing over, everyone else is a step in front of them wrt knowing
>>>>>>> what compaction is and making an informed decision into changing it.
>>>>>>>
>>>>>>>
>>>>>>> “You have to over-provision (iops) to use LCS” isn’t that different
>>>>>>> from “you have to over-provision (space) to use LCS” (by perhaps 50%).
>>>>>>>
>>>>>>> Both of them are sub-optimal and you’re trading off either extra
>>>>>>> space or extra compute/ops.
>>>>>>>
>>>>>>>
>>>>>>>

Reply via email to