Re: What branches should perf fixes be targeting

Josh McKenzie Thu, 23 Jan 2025 07:24:06 -0800

> Of note, it's been 13 months since 5.0 GA. :)
On a scale of 1-10, I'm a 10 out of 10 for being wrong here. It's been 13 
months *since we initially intended to release 5.0*. Stabilization of CI and 
some bugs took us to mid 2024. So it's not as bad as all that. Thanks to those 
that pointed this out to me; brain derped.


So keeping things constrained to this thread: I think "bugfix only to 
non-trunk, ML for consensus otherwise" is a very workable solution. We can 
augment our wiki to reflect that since it's not here 
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530302#Patching,versioning,andLTSreleases-Wheretoapplypatches>
 yet, assuming consensus on the thread here.

On Thu, Jan 23, 2025, at 9:45 AM, Dmitry Konstantinov wrote:
> >> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0, 
> >> trunk. If there was a bug in 3.0, because we were supporting that, we had 
> >> to put this into 6 branches
> My idea is not to increase the number of support branches (it is definitely 
> not what I want to, I am more a fan of release-ready trunk-based development 
> with a faster feedback loop, but it is not always applicable).
> The option was about releasing non-long term support minor versions: like JDK 
> release JDK 9/10 as short term and then JDK11 as long term, then 12/13 as 
> short term and so on.
> So, in the case of Cassandra for example, we now have 5.0.x as a long term 
> support version with a branch, we can release 5.1/5.2 from trunk (without any 
> new support branches for them) and then 5.3 as a long term again with a bug 
> fix branch. The overhead here is only for the more frequent release (like 
> once per 3 or 6 months), there is no overhead for branches/merges.
> 
> 
> On Thu, 23 Jan 2025 at 14:31, Štefan Miklošovič <smikloso...@apache.org> 
> wrote:
>> 
>> 
>> On Thu, Jan 23, 2025 at 3:20 PM Dmitry Konstantinov <netud...@gmail.com> 
>> wrote:
>>> Hi Stefan,
>>> 
>>> Thank you a lot for the detailed feedback! Few comments:
>>> 
>>> >> I think this is already the case, more or less. We are not doing perf 
>>> >> changes in older branches.
>>> Yes, I understand the idea about stability of older branches, the primary 
>>> issue for me is that if I contribute even a small improvement to trunk - I 
>>> cannot really use it for a long time (except having it in my own fork), 
>>> because there is no release to get it back for me or anybody else..
>>> 
>>> >> Maybe it would be better to make the upgrading process as smooth as 
>>> >> possible so respective businesses are open to upgrade their clusters in 
>>> >> a more frequent manner.
>>> About the upgrade process: my personal experience (3.0.x -> 3.11.x -> 4.0.x 
>>> -> 4.1.x), the upgrade in Cassandra is positive (I suppose the autotests 
>>> which test it are really helpful), I have not experienced any serious 
>>> issues with it. I suppose the majority of time when people have an issue 
>>> with upgrades is due to delaying them for too long and staying on very old 
>>> unsupported versions till the last moment.
>>> 
>>> >>  Cassandra is not JDK. We need to fix bugs in older branches we said we 
>>> >> support
>>> Regarding the necessity to support the older branches it is the same story 
>>> for JDK: they now support and fix bugs in JDK8, JDK11, JDK17 and JDK 21 as 
>>> LTS versions and JDK23 as the latest release while developing and releasing 
>>> JDK24 now.
>> 
>> That is ... 6 branches at once. We were there, 3.0, 3.11, 4.0, 4.1, 5.0, 
>> trunk. If there was a bug in 3.0, because we were supporting that, we had to 
>> put this into 6 branches. That means 6 builds in CI. Each CI takes a couple 
>> hours ... If there is something wrong or the patch is changed we need to 
>> rebuild. So what looks like "just merge up from 3.0 and that's it" becomes a 
>> multi-day odyssey somebody needs to invest resources into. As we dropped 3.0 
>> and 3.11 and we took care of 4.0+ that is better but still not fun when done 
>> "at scale". 
>>  
>>> Another example, Postgres does a major release every year: 
>>> https://www.postgresql.org/support/versioning/ and supports the last 5 
>>> major versions.
>> 
>> Yeah, but they have most probably way more man-power as well etc ... 
>>  
>>> 
>>> >> please keep in mind that there are people behind the releases who are 
>>> >> spending time on that.
>>> Yes, as I already mentioned, I really thank you to Brandon and Mick for 
>>> doing it! It is hard, exhausting and not the most exciting work to do. 
>>> Please contact me if I can help somehow with it, like checking and fixing 
>>> CI test failures(I've already done it for a while) / doing some scripting/ 
>>> etc.
>>> I have a hypothesis (maybe I am completely wrong here) that actually the 
>>> low interest in the releasing process is somehow related to having a 
>>> Cassandra fork by many contributors, so there is no big demand for regular 
>>> mainline releases if you have them in a fork..
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, 23 Jan 2025 at 12:30, Štefan Miklošovič <smikloso...@apache.org> 
>>> wrote:
>>>> I think the current guidelines are sensible.
>>>> 
>>>> Going through your suggestions:
>>>> 
>>>> 1) I think this is already the case, more or less. We are not doing perf 
>>>> changes in older branches. This is what we see in CASSANDRA-19429, a user 
>>>> reported that it is a performance improvement, and most probably he is 
>>>> right, but I am hesitant to refactor / introduce changes into older 
>>>> branches. 
>>>> 
>>>> Cassandra has a lot of inertia, we can not mess with what works even 
>>>> performance improvements are appealing. Maybe it would be better to make 
>>>> the upgrading process as smooth as possible so respective businesses are 
>>>> open to upgrade their clusters in a more frequent manner.
>>>> 
>>>> 2) Well, but Cassandra is not JDK. We need to fix bugs in older branches 
>>>> we said we support. This is again related to inertia Cassandra has as a 
>>>> database. Bug fixes are always welcome, especially if there is 0 risk 
>>>> deploying it. 
>>>> 
>>>> What particularly resonates with me is your wording "more frequent and 
>>>> predictable". Well ... I understand it would be the most ideal outcome, 
>>>> but please keep in mind that there are people behind the releases who are 
>>>> spending time on that. I have been following this project for a couple 
>>>> years and the only people who are taking care of releases are Brandon and 
>>>> Mick. I was helping here and there to at least stage it and I am willing 
>>>> to continue to do so, but that is basically it. "two and a half" people 
>>>> are doing releases. For all these years.
>>>> 
>>>> So if you ask for more frequent releases, that is something which is going 
>>>> to directly affect respective people involved in them. I guess they are 
>>>> doing it basically out of courtesy and it would be great to see more PMCs 
>>>> involved in release processes. As of now, it looks like everybody just 
>>>> assumes that "it will be somehow released" and "releases just happen" but 
>>>> that is not the case. Releases are not "just happening". There are people 
>>>> behind them who need to plan when it is going to happen and they need to 
>>>> find time for that etc. There are a lot of things not visible behind the 
>>>> scenes and doing releases is a job in itself.
>>>> 
>>>> So if we ask for more frequent releases, it is a good question to ask who 
>>>> would be actually releasing that.
>>>> 
>>>> On Wed, Jan 22, 2025 at 12:17 PM Dmitry Konstantinov <netud...@gmail.com> 
>>>> wrote:
>>>>> Hi all, 
>>>>> 
>>>>> I am one of the contributors for the recent perf changes, like: 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-20165 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-20226 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19557 
>>>>> ...
>>>>> 
>>>>> My motivation: I am currently using 4.1.x and planning to adopt 5.0.x in 
>>>>> the next quarter. Of course, I want to have it in the best possible share 
>>>>> from performance point of view, performance is one of important selling 
>>>>> points for upgrades. In general, performance is one of key reasons why 
>>>>> people select NoSQL and Cassandra particularly, so any improvement here 
>>>>> should be appreciated by users, especially in the current cloud-oriented 
>>>>> world where every such improvement is a potential cost saving.
>>>>> 
>>>>> For me the question is tightly related to the release scheduling. We have 
>>>>> periodic and quite frequent patch releases now, thank you a lot to the 
>>>>> people who spend their time to do it. When we speak about minor releases 
>>>>> - it looks like the release process is much slower and not so 
>>>>> predictable, it can be a year or even more before I can get any minor 
>>>>> release which includes a change, and nobody can say even a preliminary 
>>>>> date for it.
>>>>> As a result when I have a performance patch and it is suggested to merge 
>>>>> only to trunk I will not get the improvement back to use for a long time.
>>>>> So, I have 2 options in this case:
>>>>> 1) relax and wait (potentially losing an interest due to a delayed 
>>>>> feedback)
>>>>> 2) keep my own private fork to accumulate such changes with correspondent 
>>>>> overheads (what I am actually do now)
>>>>> 
>>>>> As a guy who supports Cassandra in production for systems with 99.999 
>>>>> availability requirements, of course I am curious about stability too, 
>>>>> but I think we need some balance here and we should rely more on things 
>>>>> like test coverage and different policies for different branches to not 
>>>>> stagnate due to fear of any change. I am not saying about massive 
>>>>> breaking changes, especially which modify (even in a compatible way) 
>>>>> network communication protocols or disk data formats, it should be a 
>>>>> separate individual discussion for them.
>>>>> 
>>>>> The situation reminds me of the story of JDK prior to Java 9. There were 
>>>>> also some big bang minor releases (1.5/1.6/1.7/1.8) which we waited for a 
>>>>> very long time and Java was evolving very slowly. Now we have a model 
>>>>> where a new release is available every 1/2 year and some of them are 
>>>>> supported as long term. So, the people who prefer stability select and 
>>>>> use LTS versions, the people who want to get access to new 
>>>>> features/improvements can take the latest release, all are happy. Similar 
>>>>> models like stable/latest releases are available for other products.
>>>>> 
>>>>> So, my suggestion is one of the following options:
>>>>> 1) Classify the current release branches as more and less stable, like: 
>>>>> -- 4.0.x/4.1.x - avoid perf changes unless it is really a bug-like
>>>>> -- 5.0.x - more relaxed rules
>>>>> 
>>>>> 2) Do something similar to JDK with LTS versions: make minor releases for 
>>>>> the latest major version (like: 5.1/5.2) more frequent and predictable, 
>>>>> like a train release, do not create a fix branch for every one, 
>>>>> periodically for some selected minor versions establish fix branches and 
>>>>> release patch versions for them. 
>>>>> 
>>>>> Thank you,
>>>>> Dmitry
>>>>> 
>>>>> On Wed, 22 Jan 2025 at 09:02, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>> 
>>>>>> I think the status quo is fine - perf goes to trunk, if you think 
>>>>>> something is special, it goes to the mailing list to justify exceptions
>>>>>> 
>>>>>> 
>>>>>>> On Jan 22, 2025, at 3:36 AM, Jordan West <jw...@apache.org> wrote:
>>>>>>> 
>>>>>>> Thanks for the initial feedback. I hear a couple different themes / 
>>>>>>> POVs. 
>>>>>>> 
>>>>>>> David/Paulo, it sounds like maybe a guide for perf backports + mailing 
>>>>>>> list consensus when necessary + clear documentation of this could be a 
>>>>>>> way forward. I agree that each change comes with stability risks but at 
>>>>>>> the same time the greatest stability risk with Cassandra historically 
>>>>>>> has been major version upgrades (although we have made great 
>>>>>>> improvements here). For folks who want only the performance 
>>>>>>> improvements, we are asking them to take greater risk by upgrading a 
>>>>>>> major version or to maintain a fork. The fork is reasonable for some of 
>>>>>>> the larger operators but not others. That said, I do agree we need to 
>>>>>>> use judgement. Not all changes are worth backporting and some may incur 
>>>>>>> too much risk. We could also add to the guide suggestions of how to 
>>>>>>> de-risk a change (e.g. code is isolated, config to turn it off / off by 
>>>>>>> default, etc). 
>>>>>>> 
>>>>>>> Jeff, I agree 1% wins aren't worth it if they are invasive and in risky 
>>>>>>> areas. Not all of the improvements are that minor.
>>>>>>> 
>>>>>>> Jordan
>>>>>>> 
>>>>>>> On Tue, Jan 21, 2025 at 1:57 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> We expect users to treat patch and minor releases as low risk. 
>>>>>>>> Changing something deep in the storage engine to be 1% faster is not 
>>>>>>>> worth the risk, because most users will skip the type of qualification 
>>>>>>>> that finds those one in a billion regressions.
>>>>>>>> 
>>>>>>>> Patch releases are for bug fixes not perf improvements. 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jan 21, 2025, at 9:10 PM, Jordan West <jw...@apache.org> wrote:
>>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> A topic that’s come up recently is what branches are valid targets 
>>>>>>>>> for performance improvements. Should they only go into trunk? This 
>>>>>>>>> has come up in the context of BTI improvements, Dmitry’s work on 
>>>>>>>>> reducing object overhead, and my work on CASSANDRA-15452.
>>>>>>>>> 
>>>>>>>>> We currently have guidelines published: 
>>>>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530302#Patching,versioning,andLTSreleases-Wheretoapplypatches.
>>>>>>>>>  But there’s no explicit discussion of how to handle performance 
>>>>>>>>> improvements. We tend to discuss whether they’re “bugfixes”.
>>>>>>>>> 
>>>>>>>>> I’d like to discuss whether performance improvements should target 
>>>>>>>>> more than just trunk. I believe they should target every active 
>>>>>>>>> branch because performance is a major selling point of Cassandra. 
>>>>>>>>> It’s not practical to ask users to upgrade major versions for simple 
>>>>>>>>> performance wins. A major version can be deployed for years, 
>>>>>>>>> especially when the next one has major changes. But we shouldn’t 
>>>>>>>>> target non-supported major versions, either. Also, there will be 
>>>>>>>>> exceptions: patches that are too large, invasive, risky, or 
>>>>>>>>> complicated to backport. For these, we rely on the contributor and 
>>>>>>>>> reviewer’s judgment and the mailing list. So, I’m proposing an 
>>>>>>>>> allowance to backport to active branches, not a requirement to merge 
>>>>>>>>> them.
>>>>>>>>> 
>>>>>>>>> I’m curious to hear your thoughts.
>>>>>>>>> Jordan
>>>>> 
>>>>> 
>>>>> --
>>>>> Dmitry Konstantinov
>>> 
>>> 
>>> --
>>> Dmitry Konstantinov
> 
> 
> --
> Dmitry Konstantinov

Re: What branches should perf fixes be targeting

Reply via email to