Hey,
I'm generally in favor of these things. But I find it quite difficult to
put all that into a guideline/rules/bylaws and be sure that everybody
understands it in the exact same way as it is meant. In practice there
are also often situations that could not be resolved with such rules,
worse if those are technically enforced. In the worst case,
someone/something can abuse such technical "barriers".
My intent is not to shoot down this effort, but to think carefully about
the day-to-day implications, and also consider how such
rules/guidelines/bylaws look in maybe 1, 2 or 5 years. Not sure whether
it would be a good idea to change such rules/guidelines/bylaws regularly.
I'll be very nit-picky in the following ("little") wall of text. But
we're discussing guidelines/rules/bylaws, the phrasing is extremely
important here.
On 22.01.25 13:42, Jean-Baptiste Onofré wrote:
Hi folks,
Thanks to your feedback and comments (in PRs, this thread, and the
other one regarding main branch stability), I'm updating the proposal
as follow:
* Changes on public interfaces/extension points (potentially used by
parties) should be discussed and approved on the dev mailing list (a
vote can be used if needed). The discussion on the dev mailing list (a
proposal document can be attached) should happen before any PR/change
Having in-code documentation that describe the intent _and_ scope
("blast radius") is often beneficial.
In my opinion, configuration is also a "user facing" thing - the same
rules should apply to configurations as well. Major (what's "major"?)
changes to documentation as well.
Technically speaking, a typo-fix is also a change, but a discussion for
that on the ML seems to be overkill.
Specifically for Java: "public" or "interface" are not good indicators
to "mark" a "user facing API/SPI" - and even that can be quite nuanced
and open for interpretation. Both "public" and "interface" can be
necessary in Java for technical reasons, for example when "crossing"
packages in an implementation. In other cases the "public" modifier
might be there although it's not needed.
I propose to rephrase this to "stable user facing APIs and SPIs", and
then mark those those explicitly as API or SPI in the code with its
status (experimental, stable, deprecated/for-removal). Similar for other
programming languages (e.g. Python) or specification formats (OpenAPI).
Not sure whether annotations make sense. But we should have a consistent
approach across all programming languages and file/spec formats.
For the future: I'm in favor to have modules that express the
"audience". For example, for a (future) "foobar" component, there might
be three or more more modules to help there. The "blast radius" of each
component's API / SPI module however can vary (needs docs/text).
- polaris-foobar-api - consumer facing API, not necessarily user facing
- polaris-foobar-spi - provider facing SPI
- polaris-foobar-impl-client - an implementation, intended for clients,
a runtimeOnly-dependency
- polaris-foobar-impl-server - an implementation, intended for servers,
a runtimeOnly-dependency
Also for the future and considering the above module structure, it
should be relatively easy, at best just by changing a
runtime-dependency, to replace a module polaris-foobar-impl with
polaris-foobar-awesome-impl. CDI is our friend here ;)
* The original author of a code should be tagged in a PR comment
(something like "@author this PR changed a code you are the original
author, can you please take a look ?") as background/context can be
provided and valuable review made. It would be possible to "enforce"
that in the .asf.yaml but it would also need a much more detailed
.github/CODEOWNERS file. So, I propose to do it "explicitly" by the PR
author.
Shouldn't we give up on CODEOWNERS in that case? Or in general? Getting
two notifications (every committer's in CODEOWNERS) might not help then.
Also, not sure about only original author - over time things will
change, and the "original author(s)" may not even work on it anymore, or
the work has already been massively refactored, or someone else took
over the work. "Original author" feels not future proof ;)
Some committers are absolute experts on say Foo.java, others have a good
understanding and others might not even have a clue anymore, because it
changed drastically over time.
Practically speaking, if a file's moved, it's can be hard in Git to
identify the "original author" - it can appear as a new file. Also,
Git's "rename/move detection" isn't perfect - it's based on heuristics,
that regularly look rather funny than correct.
What we should have is to know "who knows what".
* No duplicated PRs is allowed (in order to keep history and comments)
TBH, I don't understand what you mean here. Creating 5 PRs for exactly
the same change??
However, a legit use case is if someone disappears from the project for
whatever reason (long holiday, sick for a long time, etc), but has a
good PR open. Does that "No duplicated PRs" rule mean that nobody else
can ever resume on that work?
* All discussions in PRs should be removed before merging (see
https://github.com/apache/polaris/pull/840)
BTW: i hope it's a typo: I think, it means "resolved" not "removed" -
quite a difference ;)
I'd be in favor of having this, but as a "soft rule", not technically
enforced.
Comments/threads, whether those are "plain" PR comments or comments on
code lines, can be very useful later to understand things. Resolving
comments hides that information.
Simple use case is a discussion on a PR, that led to the conclusion to
do something in a separate PR. Linking to the _unresolved_ conversation
gives a lot of context.
Or an endless back and forth on nit-picks with an infinite loop of
"Resolved" -> "Unresolved" -> "Resolved"... But also the definition of
"what's a nit" can become quite heated.
Having it technically enforced could also mean that if someone just
comments on something, merging will be blocked - even if that was not
even the intent of the commenter. Or someone can actually abuse it and
block all PR merges, even the PR to fix the .asf.yaml to get out of that
situation. Or some random evil bot comes along and places comments on
things (yes, that happens on GH).
* We wait at least 2 working days to comment/review. It doesn't mean
that the review should be completed in 3 working days, it simply means
that you have time to comment and start discussion. If there's no
comment/concern after 3 working days, we can consider a lazy consensus
and merge the PR (if approved).
I get the intent here - and it makes sense. 2 "working days" as a
rule-of-thumb also makes sense. Side note: the above phrase actually
means that you have to wait "3 working days".
Also for this point, I think a "soft rule" with good intent considering
the blast radius at best knowledge makes the most sense.
However some PRs are required for hygiene and security reasons (for
example PRs from Renovate). Other PRs may just fix a typo in a javadoc.
Having a general/hard rule of "at least 2 days" isn't practical. But
having a (very likely incomplete) listing of exceptions is neither. In
other cases it might be absolutely required to wait for someone, who
maybe on an island w/o the laptop for the next 5 days.
Following the 2-working-day rule strictly would also mean that a
follow-up fix for an already merged PR cannot go in before these 2 days.
It's hard to put such a rule with all its legit exceptions into words
that everybody understands the same way. I think the only way is to say
something like "Regular PRs **should** be open for comments for about 2
days." But again, some PRs don't need that at all, some PRs must go in
asap (security, build-fixes) and others need even more time than two days.
2 days can be 2 working-days for me and my reviewer, but 2 holidays for
someone else. By the rule, I can hit merge and nobody can complain. :shrug:
And considering time zones it could be 2 days (think: there's daylight)
for one guy, but 1 x daylight for someone else.
So - what's a "day" and what's a "working day". It's all tricky and full
of edge cases and exceptions and what not....
* If it's hard to find a consensus on a PR, the discussion should be
bring on the dev mailing list to be discussion globally with the
community
+1 on that one.
Robert
Thoughts ?
If we agree on the proposal, I will create a PR to update the
Contributing Guide (to be clear for any contributor).
NB: in order to facilitate PR "triage", I created
https://github.com/apache/polaris/pull/839 to run renovate only once a
day and add the "renovatebot" label to easily filter the PRs/emails.
Regards
JB
JB
On Fri, Jan 17, 2025 at 5:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
Hi folks,
It's great to see soooo much activity on Apache Polaris right now !
By experience, this high activity can introduce frustration in the
community because contributors might be "lost" or have the feeling of
missing important changes/proposals.
I propose some good practices we could use to improve our
collaboration and facilitate team work:
1. GH Issues & PRs should be descriptive and accurate, especially the
description should describe a minimum the purpose and the labels
should be meaningful (a bug is a breaking change, an improvement is a
change on existing, etc).
2. We should avoid duplicating PRs (like creating a new PR and closing
another one on the same topic), else we lose the history and comments.
3. All discussions in a PR should be resolved before merging,
4. I propose to wait 2 working days before merging PR to give a chance
to contributors to take a look.
5. It's good to take the time for discussion in PR. If you think this
discussion needs to move on the dev mailing list (to include the whole
community) that's also fine.
Regarding 3 and 4, I think I can update the .asf.yaml to enforce that.
Thoughts ?
Regards
JB
--
Robert Stupp
@snazy