Hey,

I'm generally in favor of these things. But I find it quite difficult to put all that into a guideline/rules/bylaws and be sure that everybody understands it in the exact same way as it is meant. In practice there are also often situations that could not be resolved with such rules, worse if those are technically enforced. In the worst case, someone/something can abuse such technical "barriers".

My intent is not to shoot down this effort, but to think carefully about the day-to-day implications, and also consider how such rules/guidelines/bylaws look in maybe 1, 2 or 5 years. Not sure whether it would be a good idea to change such rules/guidelines/bylaws regularly.

I'll be very nit-picky in the following ("little") wall of text. But we're discussing guidelines/rules/bylaws, the phrasing is extremely important here.


On 22.01.25 13:42, Jean-Baptiste Onofré wrote:
Hi folks,

Thanks to your feedback and comments (in PRs, this thread, and the
other one regarding main branch stability), I'm updating the proposal
as follow:
* Changes on public interfaces/extension points (potentially used by
parties) should be discussed and approved on the dev mailing list (a
vote can be used if needed). The discussion on the dev mailing list (a
proposal document can be attached) should happen before any PR/change

Having in-code documentation that describe the intent _and_ scope ("blast radius") is often beneficial.

In my opinion, configuration is also a "user facing" thing - the same rules should apply to configurations as well. Major (what's "major"?) changes to documentation as well.

Technically speaking, a typo-fix is also a change, but a discussion for that on the ML seems to be overkill.

Specifically for Java: "public" or "interface" are not good indicators to "mark" a "user facing API/SPI" - and even that can be quite nuanced and open for interpretation. Both "public" and "interface" can be necessary in Java for technical reasons, for example when "crossing" packages in an implementation. In other cases the "public" modifier might be there although it's not needed.

I propose to rephrase this to "stable user facing APIs and SPIs", and then mark those those explicitly as API or SPI in the code with its status (experimental, stable, deprecated/for-removal). Similar for other programming languages (e.g. Python) or specification formats (OpenAPI). Not sure whether annotations make sense. But we should have a consistent approach across all programming languages and file/spec formats.

For the future: I'm in favor to have modules that express the "audience". For example, for a (future) "foobar" component, there might be three or more more modules to help there. The "blast radius" of each component's API / SPI module however can vary (needs docs/text).
- polaris-foobar-api - consumer facing API, not necessarily user facing
- polaris-foobar-spi - provider facing SPI
- polaris-foobar-impl-client - an implementation, intended for clients, a runtimeOnly-dependency - polaris-foobar-impl-server - an implementation, intended for servers, a runtimeOnly-dependency

Also for the future and considering the above module structure, it should be relatively easy, at best just by changing a runtime-dependency, to replace a module polaris-foobar-impl with polaris-foobar-awesome-impl. CDI is our friend here ;)

* The original author of a code should be tagged in a PR comment
(something like "@author this PR changed a code you are the original
author, can you please take a look ?") as background/context can be
provided and valuable review made. It would be possible to "enforce"
that in the .asf.yaml but it would also need a much more detailed
.github/CODEOWNERS file. So, I propose to do it "explicitly" by the PR
author.

Shouldn't we give up on CODEOWNERS in that case? Or in general? Getting two notifications (every committer's in CODEOWNERS) might not help then.

Also, not sure about only original author - over time things will change, and the "original author(s)" may not even work on it anymore, or the work has already been massively refactored, or someone else took over the work. "Original author" feels not future proof ;)

Some committers are absolute experts on say Foo.java, others have a good understanding and others might not even have a clue anymore, because it changed drastically over time.

Practically speaking, if a file's moved, it's can be hard in Git to identify the "original author" - it can appear as a new file. Also, Git's "rename/move detection" isn't perfect - it's based on heuristics, that regularly look rather funny than correct.

What we should have is to know "who knows what".

* No duplicated PRs is allowed (in order to keep history and comments)

TBH, I don't understand what you mean here. Creating 5 PRs for exactly the same change??

However, a legit use case is if someone disappears from the project for whatever reason (long holiday, sick for a long time, etc), but has a good PR open. Does that "No duplicated PRs" rule mean that nobody else can ever resume on that work?

* All discussions in PRs should be removed before merging (see
https://github.com/apache/polaris/pull/840)

BTW: i hope it's a typo: I think, it means "resolved" not "removed" - quite a difference ;)

I'd be in favor of having this, but as a "soft rule", not technically enforced.

Comments/threads, whether those are "plain" PR comments or comments on code lines, can be very useful later to understand things. Resolving comments hides that information.

Simple use case is a discussion on a PR, that led to the conclusion to do something in a separate PR. Linking to the _unresolved_ conversation gives a lot of context.

Or an endless back and forth on nit-picks with an infinite loop of "Resolved" -> "Unresolved" -> "Resolved"... But also the definition of "what's a nit" can become quite heated.

Having it technically enforced could also mean that if someone just comments on something, merging will be blocked - even if that was not even the intent of the commenter. Or someone can actually abuse it and block all PR merges, even the PR to fix the .asf.yaml to get out of that situation. Or some random evil bot comes along and places comments on things (yes, that happens on GH).

* We wait at least 2 working days to comment/review. It doesn't mean
that the review should be completed in 3 working days, it simply means
that you have time to comment and start discussion. If there's no
comment/concern after 3 working days, we can consider a lazy consensus
and merge the PR (if approved).

I get the intent here - and it makes sense. 2 "working days" as a rule-of-thumb also makes sense. Side note: the above phrase actually means that you have to wait "3 working days".

Also for this point, I think a "soft rule" with good intent considering the blast radius at best knowledge makes the most sense.

However some PRs are required for hygiene and security reasons (for example PRs from Renovate). Other PRs may just fix a typo in a javadoc. Having a general/hard rule of "at least 2 days" isn't practical. But having a (very likely incomplete) listing of exceptions is neither. In other cases it might be absolutely required to wait for someone, who maybe on an island w/o the laptop for the next 5 days.

Following the 2-working-day rule strictly would also mean that a follow-up fix for an already merged PR cannot go in before these 2 days.

It's hard to put such a rule with all its legit exceptions into words that everybody understands the same way. I think the only way is to say something like "Regular PRs **should** be open for comments for about 2 days." But again, some PRs don't need that at all, some PRs must go in asap (security, build-fixes) and others need even more time than two days.

2 days can be 2 working-days for me and my reviewer, but 2 holidays for someone else. By the rule, I can hit merge and nobody can complain. :shrug:

And considering time zones it could be 2 days (think: there's daylight) for one guy, but 1 x daylight for someone else.

So - what's a "day" and what's a "working day". It's all tricky and full of edge cases and exceptions and what not....

* If it's hard to find a consensus on a PR, the discussion should be
bring on the dev mailing list to be discussion globally with the
community

+1 on that one.


Robert


Thoughts ?

If we agree on the proposal, I will create a PR to update the
Contributing Guide (to be clear for any contributor).

NB: in order to facilitate PR "triage", I created
https://github.com/apache/polaris/pull/839 to run renovate only once a
day and add the "renovatebot" label to easily filter the PRs/emails.

Regards
JB

JB

On Fri, Jan 17, 2025 at 5:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
Hi folks,

It's great to see soooo much activity on Apache Polaris right now !

By experience, this high activity can introduce frustration in the
community because contributors might be "lost" or have the feeling of
missing important changes/proposals.

I propose some good practices we could use to improve our
collaboration and facilitate team work:
1. GH Issues & PRs should be descriptive and accurate, especially the
description should describe a minimum the purpose and the labels
should be meaningful (a bug is a breaking change, an improvement is a
change on existing, etc).
2. We should avoid duplicating PRs (like creating a new PR and closing
another one on the same topic), else we lose the history and comments.
3. All discussions in a PR should be resolved before merging,
4. I propose to wait 2 working days before merging PR to give a chance
to contributors to take a look.
5. It's good to take the time for discussion in PR. If you think this
discussion needs to move on the dev mailing list (to include the whole
community) that's also fine.

Regarding 3 and 4, I think I can update the .asf.yaml to enforce that.

Thoughts ?

Regards
JB

--
Robert Stupp
@snazy

Reply via email to