Re: [PROPOSAL] Improve our "collaboration schema"

Robert Stupp Wed, 22 Jan 2025 10:12:52 -0800

Hey,

I'm generally in favor of these things. But I find it quite difficult toput all that into a guideline/rules/bylaws and be sure that everybodyunderstands it in the exact same way as it is meant. In practice thereare also often situations that could not be resolved with such rules,worse if those are technically enforced. In the worst case,someone/something can abuse such technical "barriers".

My intent is not to shoot down this effort, but to think carefully aboutthe day-to-day implications, and also consider how suchrules/guidelines/bylaws look in maybe 1, 2 or 5 years. Not sure whetherit would be a good idea to change such rules/guidelines/bylaws regularly.

I'll be very nit-picky in the following ("little") wall of text. Butwe're discussing guidelines/rules/bylaws, the phrasing is extremelyimportant here.



On 22.01.25 13:42, Jean-Baptiste Onofré wrote:

Hi folks,

Thanks to your feedback and comments (in PRs, this thread, and the
other one regarding main branch stability), I'm updating the proposal
as follow:
* Changes on public interfaces/extension points (potentially used by
parties) should be discussed and approved on the dev mailing list (a
vote can be used if needed). The discussion on the dev mailing list (a
proposal document can be attached) should happen before any PR/change

Having in-code documentation that describe the intent _and_ scope("blast radius") is often beneficial.

In my opinion, configuration is also a "user facing" thing - the samerules should apply to configurations as well. Major (what's "major"?)changes to documentation as well.

Technically speaking, a typo-fix is also a change, but a discussion forthat on the ML seems to be overkill.

Specifically for Java: "public" or "interface" are not good indicatorsto "mark" a "user facing API/SPI" - and even that can be quite nuancedand open for interpretation. Both "public" and "interface" can benecessary in Java for technical reasons, for example when "crossing"packages in an implementation. In other cases the "public" modifiermight be there although it's not needed.

I propose to rephrase this to "stable user facing APIs and SPIs", andthen mark those those explicitly as API or SPI in the code with itsstatus (experimental, stable, deprecated/for-removal). Similar for otherprogramming languages (e.g. Python) or specification formats (OpenAPI).Not sure whether annotations make sense. But we should have a consistentapproach across all programming languages and file/spec formats.

For the future: I'm in favor to have modules that express the"audience". For example, for a (future) "foobar" component, there mightbe three or more more modules to help there. The "blast radius" of eachcomponent's API / SPI module however can vary (needs docs/text).

- polaris-foobar-api - consumer facing API, not necessarily user facing
- polaris-foobar-spi - provider facing SPI

- polaris-foobar-impl-client - an implementation, intended for clients,a runtimeOnly-dependency- polaris-foobar-impl-server - an implementation, intended for servers,a runtimeOnly-dependency

Also for the future and considering the above module structure, itshould be relatively easy, at best just by changing aruntime-dependency, to replace a module polaris-foobar-impl withpolaris-foobar-awesome-impl. CDI is our friend here ;)

* The original author of a code should be tagged in a PR comment
(something like "@author this PR changed a code you are the original
author, can you please take a look ?") as background/context can be
provided and valuable review made. It would be possible to "enforce"
that in the .asf.yaml but it would also need a much more detailed
.github/CODEOWNERS file. So, I propose to do it "explicitly" by the PR
author.

Shouldn't we give up on CODEOWNERS in that case? Or in general? Gettingtwo notifications (every committer's in CODEOWNERS) might not help then.

Also, not sure about only original author - over time things willchange, and the "original author(s)" may not even work on it anymore, orthe work has already been massively refactored, or someone else tookover the work. "Original author" feels not future proof ;)

Some committers are absolute experts on say Foo.java, others have a goodunderstanding and others might not even have a clue anymore, because itchanged drastically over time.

Practically speaking, if a file's moved, it's can be hard in Git toidentify the "original author" - it can appear as a new file. Also,Git's "rename/move detection" isn't perfect - it's based on heuristics,that regularly look rather funny than correct.


What we should have is to know "who knows what".

* No duplicated PRs is allowed (in order to keep history and comments)

TBH, I don't understand what you mean here. Creating 5 PRs for exactlythe same change??

However, a legit use case is if someone disappears from the project forwhatever reason (long holiday, sick for a long time, etc), but has agood PR open. Does that "No duplicated PRs" rule mean that nobody elsecan ever resume on that work?

* All discussions in PRs should be removed before merging (see
https://github.com/apache/polaris/pull/840)

BTW: i hope it's a typo: I think, it means "resolved" not "removed" -quite a difference ;)

I'd be in favor of having this, but as a "soft rule", not technicallyenforced.

Comments/threads, whether those are "plain" PR comments or comments oncode lines, can be very useful later to understand things. Resolvingcomments hides that information.

Simple use case is a discussion on a PR, that led to the conclusion todo something in a separate PR. Linking to the _unresolved_ conversationgives a lot of context.

Or an endless back and forth on nit-picks with an infinite loop of"Resolved" -> "Unresolved" -> "Resolved"... But also the definition of"what's a nit" can become quite heated.

Having it technically enforced could also mean that if someone justcomments on something, merging will be blocked - even if that was noteven the intent of the commenter. Or someone can actually abuse it andblock all PR merges, even the PR to fix the .asf.yaml to get out of thatsituation. Or some random evil bot comes along and places comments onthings (yes, that happens on GH).

* We wait at least 2 working days to comment/review. It doesn't mean
that the review should be completed in 3 working days, it simply means
that you have time to comment and start discussion. If there's no
comment/concern after 3 working days, we can consider a lazy consensus
and merge the PR (if approved).

I get the intent here - and it makes sense. 2 "working days" as arule-of-thumb also makes sense. Side note: the above phrase actuallymeans that you have to wait "3 working days".

Also for this point, I think a "soft rule" with good intent consideringthe blast radius at best knowledge makes the most sense.

However some PRs are required for hygiene and security reasons (forexample PRs from Renovate). Other PRs may just fix a typo in a javadoc.Having a general/hard rule of "at least 2 days" isn't practical. Buthaving a (very likely incomplete) listing of exceptions is neither. Inother cases it might be absolutely required to wait for someone, whomaybe on an island w/o the laptop for the next 5 days.

Following the 2-working-day rule strictly would also mean that afollow-up fix for an already merged PR cannot go in before these 2 days.

It's hard to put such a rule with all its legit exceptions into wordsthat everybody understands the same way. I think the only way is to saysomething like "Regular PRs **should** be open for comments for about 2days." But again, some PRs don't need that at all, some PRs must go inasap (security, build-fixes) and others need even more time than two days.

2 days can be 2 working-days for me and my reviewer, but 2 holidays forsomeone else. By the rule, I can hit merge and nobody can complain. :shrug:

And considering time zones it could be 2 days (think: there's daylight)for one guy, but 1 x daylight for someone else.

So - what's a "day" and what's a "working day". It's all tricky and fullof edge cases and exceptions and what not....

* If it's hard to find a consensus on a PR, the discussion should be
bring on the dev mailing list to be discussion globally with the
community


+1 on that one.


Robert

Thoughts ?

If we agree on the proposal, I will create a PR to update the
Contributing Guide (to be clear for any contributor).

NB: in order to facilitate PR "triage", I created
https://github.com/apache/polaris/pull/839 to run renovate only once a
day and add the "renovatebot" label to easily filter the PRs/emails.

Regards
JB

JB

On Fri, Jan 17, 2025 at 5:55 AM Jean-Baptiste Onofré <[email protected]> wrote:

Hi folks,

It's great to see soooo much activity on Apache Polaris right now !

By experience, this high activity can introduce frustration in the
community because contributors might be "lost" or have the feeling of
missing important changes/proposals.

I propose some good practices we could use to improve our
collaboration and facilitate team work:
1. GH Issues & PRs should be descriptive and accurate, especially the
description should describe a minimum the purpose and the labels
should be meaningful (a bug is a breaking change, an improvement is a
change on existing, etc).
2. We should avoid duplicating PRs (like creating a new PR and closing
another one on the same topic), else we lose the history and comments.
3. All discussions in a PR should be resolved before merging,
4. I propose to wait 2 working days before merging PR to give a chance
to contributors to take a look.
5. It's good to take the time for discussion in PR. If you think this
discussion needs to move on the dev mailing list (to include the whole
community) that's also fine.

Regarding 3 and 4, I think I can update the .asf.yaml to enforce that.

Thoughts ?

Regards
JB


--
Robert Stupp
@snazy

Re: [PROPOSAL] Improve our "collaboration schema"

Reply via email to