Hi everyone,
Martjin, I like your ideas. I think these labels will help, make it obvious 
what work is actionable. I really feel these sort of process improvements will 
incrementally help work to flow through appropriately.

2 additional thoughts – I hope these help this discussion:

  *   A triaged label on the issue would indicate that a maintainer has agreed 
this is a valid issue – this would be a better pool of issues for contributors 
to pickup. I am not sure if maintainers currently do this sort of work.
  *   I like the codeowners idea; did you find a way though this within the 
Apache rules? An extension to this is that increasingly we are moving out parts 
of the code from the main Flink repository to other repositories; would this be 
doable. Could experts in those repositories be given write access to those 
repos; so that each non core repo can work through its issues and merge its prs 
more independently. This is how LF project Egeria works with its connectors and 
UIS;  I guess the concern is that in ASF these people would need to be  
committers, or could they be a committer on a subset of repos. Another way to 
manage who can merge prs is to gate the pr process using git actions, so that 
if an approved approver indicates a pr is good then the raiser can merge – this 
would give us granularity on write access – PyTorch follows this sort of 
process.

      kind regards, David.


From: Martijn Visser <martijnvis...@apache.org>
Date: Thursday, 12 October 2023 at 10:32
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs
Hi everyone,

I'm overall +1 on Ryan's comment.
When we're talking about component ownership, I've started a
discussion on the Infra mailing list in the beginning of the year on
it. In principle, the "codeowners" idea goes against ASF principles.

Let's summarize things:
1. From a project perspective, we can have a discussion about closing
PRs automatically that a) are not followed-up within X number of days
after a review and/or b) PRs that don't have a passing build and/or
don't follow contribution guidelines and/or C) need to be rebased
2. In order to help understand which PRs are OK to get reviewed, we
could consider automatically adding a label "Ready for review" in case
1b (passing build/contribution guidelines met) is the case.
3. In order to help contributors, we could consider automatically
adding a label in case their PR isn't mergeable for the situations
that are displayed in situation 1

When that's done, we can see what the effect is on the PRs queue.

Best regards,

Martijn

On Wed, Oct 4, 2023 at 5:13 PM David Radley <david_rad...@uk.ibm.com> wrote:
>
> Hi Ryan,
>
> I agree that good communication is key to determining what can be worked on.
>
> In terms of metrics , we can use the gh cli to list prs and we can export 
> issues from Jira. A view across them, you could join on the Flink issue (at 
> the start of the pr comment and the flink issue itself – you could then see 
> which prs have an assigned Jira would be expected to be reviewed. There is no 
> explicit reviewer field in the Jira issue; I am not sure if we can easily get 
> this info without having a custom field (which others have tried).
>
> In terms of what prs a committer could / should review – I would think that 
> component ownership helps scope the subset of prs to review / merge.
>
>                 Kind regards, David.
>
>
> From: Ryan Skraba <ryan.skr...@aiven.io.INVALID>
> Date: Wednesday, 4 October 2023 at 15:09
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs
> Hey, this has been an interesting discussion -- this is something that
> has been on my mind as an open source contributor and committer (I'm
> not a Flink committer).
>
> A large number of open PRs doesn't _necessarily_ mean a project is
> unhealthy or has technical debt. If it's fun and easy to get your
> contribution accepted and committed, even for a small fix, you're more
> likely to raise another PR, and another.  I wouldn't be surprised if
> there's a natural equilibrium where adding capacity to smoothly review
> and manage more PRs cause more PRs to be submitted.  Everyone wins!
>
> I don't think there's a measure for the "average PR lifetime", or
> "time to first comment", but those would be more interesting things to
> know and those are the worrisome ones.
>
> As a contributor, I'm pretty willing to wait as long as necessary (and
> rebase and fix merge conflicts) if there's good communication in
> place. I'm pretty patient, especially if I knew that the PR would be
> looked at and merged for a specific fix version (for example).  I'd
> expect simple and obvious fixes with limited scope to take less time
> than a more complex, far-reaching change.  I'd probably appreciate
> that the boring-cyborg welcomes me on my first PR, but I'd be pretty
> irritated if any PR were closed without any human interaction.
>
> As a reviewer or committer, it's just overwhelming to see the big
> GitHub list, and sometimes it feels random just "picking one near the
> top" to look at.  In projects where I have the committer role, I
> sometimes feel more badly about work I'm *not* doing than the work I'm
> getting done! This isn't sustainable either.  A lot of people on the
> project are volunteering after hours, and grooming, reviewing and
> commenting PRs shouldn't be a thankless, unending job to feel bad
> about.
>
> As a contributor, one "magic" solution that I'd love to see is a
> better UI that could show (for example) tentative "review dates", like
> the number at a butcher shop, and proposed reviewers.
>
> If I was committing to reviewing a PR every day, it would be great if
> I could know which ones were the best "next" candidates to review: the
> one waiting longest, or a new, critical fix in my domain.  As it
> stands, there's next to no chance that the PRs in the middle of the
> list are going to get any attention, but closing them stand to lose
> valuable work or (worse) turn off a potential contributor forever.
>
> Taking a look at some open PRs that I authored or interacted with: I
> found one that should have been closed, one that was waiting for MY
> attention for a merge-squash-rebase (oops), another where I made some
> requested changes and it's back in review limbo.  Unfortunately, I
> don't think any of these would have been brought to my attention by a
> nag-bot. I don't think I'm alone; automated emails get far less
> attention  with sometime not giving automated emails much attention.
>
> OK, one more thing to think about: some underrepresented groups in
> tech can find it difficult to demand attention, through constant
> pinging and commenting and reminding...  So finding ways to make sure
> that non-squeaky wheels also get some love is a really fair goal.
>
> There's some pretty good ideas in this conversation, and I'm really
> glad to hear it being brought up!  I'd love to hear any other
> brainstorming for ideas, and get the virtual circle that David
> mentioned!
>
> All my best, Ryan
>
>
>
>
>
>
>
> On Wed, Oct 4, 2023 at 12:03 PM David Radley <david_rad...@uk.ibm.com> wrote:
> >
> > Hi,
> > To add I agree with Martijn’s insights; I think we are saying similar 
> > things. To progress agreed upon work, and not blanket close all stale prs,
> >       Kind regards, David.
> >
> > From: David Radley <david_rad...@uk.ibm.com>
> > Date: Wednesday, 4 October 2023 at 10:59
> > To: dev@flink.apache.org <dev@flink.apache.org>
> > Subject: [EXTERNAL] RE: Close orphaned/stale PRs
> > Hi ,
> > I agree Venkata this issue is bigger than closing out stale prs.
> >
> > We can see that issues are being raised at a rate way above the resolution 
> > time. 
> > https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12315522&periodName=daily&daysprevious=90&cumulative=true&versionLabels=major&selectedProjectId=12315522&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Acreatedvsresolved-report&atl_token=A5KQ-2QAV-T4JA-FDED_19ff17decb93662bafa09e4b3ffb3a385c202015_lin&Next=Next
> > Gaining over 500 issues to the backlog every 3 months.
> >
> > We have over 1000 open prs. This is a lot of technical debt. I came across 
> > a 6 month old pr recently that had not been merged. A second Jira issue was 
> > raised  for the same problem and a second pr fixed the issue (identically). 
> > The first pr was still on the backlog until we noticed it.
> >
> > I am looking to contribute to the community to be able to identify issues I 
> > can work on and then be reasonably certain they will be reviewed and merged 
> > so I can build on contributions. I have worked as a maintainer and 
> > committer in other communities and managed to spend some of the week 
> > addressing incoming work; I am happy to do this in some capacity with the 
> > support of committer(s) for Flink.  It seems to me it is virtuous circle to 
> > enable more contributions, to get more committers , builds those committers 
> > that can help merge and review the backlog.
> >
> > Some thoughts ( I am new to this – so apologise if I have misunderstood 
> > something or am unaware of other existing mechanisms) :
> >
> >   1.  If there is an issue that a committer has assigned to a contributor 
> > as per the 
> > process<https://flink.apache.org/how-to-contribute/contribute-code/   > , 
> > and there is a pr then it should be with the committer to review the pr, or 
> > return it to the work queue. I do not know how many prs are like this. It 
> > seems to me that if a committer assigns an issue, they are indicating they 
> > will review, unassign themselves or merge. I do not think these prs should 
> > be closed as stale.
> >   2.  Could we have a Git action to notify committers (tagged in the pr?) 
> > if a pr (that has an assigned Jira)  has not been reviewed in a certain 
> > period (7 days?) then subsequent nags if there has been no response . In 
> > this way busy committers can see that a pr needs looking at.
> >   3.  Other prs have been raised without a committer saying that they will 
> > fix it.  In this case there is likely to be value, but the merging and 
> > review work has not been taken on by anyone. I notice spelling mistake prs 
> > that have not been merged (there are 8 with this query 
> > https://github.com/apache/flink/pulls?q=is%3Apr+is%3Aopen+spelling   ) , 
> > these are typical newbee prs as they are simple but useful improvements.; 
> > it would be great if these simpler ones could just be merged – maybe they 
> > should be marked as a [hotfix] to indicate they should be merged.  If 
> > simpler prs are not merged – it is very difficult for new contributors to 
> > gain eminence to get towards being a committer.
> >   4.  There are also issues that have been raised by people who do not want 
> > to fix them. It seems to me that we need a “triaged” state to indicate the 
> > issue looks valid and reasonable, so could be picked up by someone – at 
> > which time they would need to agree with a committer to get the associated 
> > pr reviewed and merged. This triaged state would be a pool of issues that 
> > new contributors to choose from
> >
> >
> >
> > I am happy to help to improve – once we have consensus,
> >
> >
> >
> > Kind regards, David.
> >
> >
> >
> >
> > From: Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
> > Date: Wednesday, 4 October 2023 at 00:36
> > To: dev@flink.apache.org <dev@flink.apache.org>
> > Subject: [EXTERNAL] Re: Close orphaned/stale PRs
> > Gentle ping to surface this up for more discussions.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Tue, Sep 26, 2023 at 4:59 PM Venkatakrishnan Sowrirajan 
> > <vsowr...@asu.edu>
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > Agree with your point that closing a PR without any review feedback even
> > > after 'X' days is discouraging to a new contributor. I understand that 
> > > this
> > > is a capacity problem. Capacity problem cannot be solved by this proposal
> > > and it is beyond the scope of this proposal.
> > >
> > > Regarding your earlier question,
> > > > What's the added value of
> > > closing these PRs
> > >
> > >    - Having lots of inactive PRs lingering around shows the project is
> > >    less active. I am not saying this is the only way to determine how 
> > > active a
> > >    project is, but this is one of the key factors.
> > >    - A large number of PRs open can be discouraging for (new)
> > >    contributors but on the other hand I agree closing an inactive PR 
> > > without
> > >    any reviews can also drive contributors away.
> > >
> > > Having said all of that, I agree closing PRs that don't have any reviews
> > > to start with should be avoided from the final proposal.
> > >
> > > > I'm +1 for (automatically) closing up PRs after X days which:
> > > a) Don't have a CI that has passed
> > > b) Don't follow the code contribution guide (like commit naming
> > > conventions)
> > > c) Have changes requested but aren't being followed-up by the contributor
> > >
> > > In general, I'm largely +1 on your above proposal except for the
> > > implementation feasibility.
> > >
> > > Also, I have picked a few other popular projects that have implemented the
> > > Github's actions stale rule to see if we can borrow some ideas. Below
> > > projects are listed in the order of the most invasive (for lack of a 
> > > better
> > > word) to the least invasive actions taken wrt PR without any updates for a
> > > long period of time.
> > >
> > > 1. Trino
> > >
> > > TL;DR - No updates in the PR for the last 21 days, tag other maintainers
> > > for review. If there are no updates for 21 days after that, close the PR
> > > with this message - "*Closing this pull request, as it has been stale for
> > > six weeks. Feel free to re-open at any time.*"
> > > Trino's stale PR Github action rule (stale.yaml)
> > > <https://github.com/trinodb/trino/blob/master/.github/workflows/stale.yml 
> > >    >
> > >
> > >
> > > 2. Apache Spark
> > >
> > > TL;DR - No updates in the PR in the last 100 days, closing the PR with
> > > this message - "*We're closing this PR because it hasn't been updated in
> > > a while. This isn't a judgement on the merit of the PR in any way. It's
> > > just a way of keeping the PR queue manageable. If you'd like to revive 
> > > this
> > > PR, please reopen it and ask a committer to remove the Stale tag!*"
> > > Spark's discussion in their mailing list
> > > <https://lists.apache.org/thread/yg3ggtvpt2dbjpnb2q0yblq30sc1g2yx    > on
> > > closing stale PRs. Spark's stale PR github action rule (stale.yaml
> > > <https://github.com/apache/spark/blob/master/.github/workflows/stale.yml  
> > >   >
> > > ).
> > >
> > > 3. Python
> > >
> > > TL;DR - No updates in the PR for the last 30 days, then tag the PR as
> > > stale. Note: Python project *doesn't* close the stale PRs.
> > >
> > > Python discussion
> > > <https://discuss.python.org/t/decision-needed-should-we-close-stale-prs-and-how-many-lapsed-days-are-prs-considered-stale/4637
> > >     >
> > > in the mailing list to close stale PRs. Python's stale PR github action
> > > rule (stale.yaml
> > > <https://github.com/python/cpython/blob/main/.github/workflows/stale.yml  
> > >   >)
> > >
> > > Few others Apache Beam
> > > <https://github.com/apache/beam/blob/master/.github/workflows/stale.yml   
> > >  > (closes
> > > inactive PRs after 60+ days), Apache Airflow
> > > <https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml  
> > >   > (closes
> > > inactive PRs after 50 days)
> > >
> > > Let me know what you think. Looking forward to hearing from others in the
> > > community and their experiences.
> > >
> > > [1] Github Action - Close Stale Issues -
> > > https://github.com/marketplace/actions/close-stale-issues
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Thu, Sep 21, 2023 at 6:03 AM Martijn Visser <martijnvis...@apache.org>
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I really believe that the problem of the number of open PRs is just
> > >> that there aren't enough reviewers/resources available to review them.
> > >>
> > >> > Stale PRs can clutter the repository, and closing them helps keep it
> > >> organized and ensures that only relevant and up-to-date PRs are present.
> > >>
> > >> Sure, but what's the indicator that the PR is stale? The fact that
> > >> there has been no reviewer yet to review it, doesn't mean that the PR
> > >> is stale. For me, a stale PR is a PR that has been reviewed, changes
> > >> have been requested and the contributor isn't participating in the
> > >> discussion anymore. But that's a different story compared to closing
> > >> PRs where there has been no review done at all.
> > >>
> > >> > It mainly helps the project maintainers/reviewers to focus on only the
> > >> actively updated trimmed list of PRs that are ready for review.
> > >>
> > >> I disagree that closing PRs helps with this. If you want to help
> > >> maintainers/reviewers, we should have a situation where it's obvious
> > >> that a PR is really ready (meaning, CI has passed, PR contents/commit
> > >> message etc are following the code contribution guidelines).
> > >>
> > >> > It helps Flink users who are waiting on a PR that enhances an existing
> > >> feature or fixes an issue a clear indication on whether the PR will be
> > >> continually worked on and eventually get a closure or not and therefore
> > >> will be closed.
> > >>
> > >> Having other PRs being closed doesn't increase the guarantee that
> > >> other PRs will be reviewed. It's still a capacity problem.
> > >>
> > >> > It would be demotivating for any contributor when there is no feedback
> > >> for a PR within a sufficient period of time anyway.
> > >>
> > >> Definitely. But I think it would be even worse if someone makes a
> > >> contribution, there is no response but after X days they get a message
> > >> that their PR was closed automatically.
> > >>
> > >> I'm +1 for (automatically) closing up PRs after X days which:
> > >> a) Don't have a CI that has passed
> > >> b) Don't follow the code contribution guide (like commit naming
> > >> conventions)
> > >> c) Have changes requested but aren't being followed-up by the contributor
> > >>
> > >> I'm -1 for automatically closing PRs where no maintainers have taken a
> > >> review for the reasons I've listed above.
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Wed, Sep 20, 2023 at 7:41 AM Venkatakrishnan Sowrirajan
> > >> <vsowr...@asu.edu> wrote:
> > >> >
> > >> > Thanks for your response, Martijn.
> > >> >
> > >> > > What's the added value of
> > >> > closing these PRs
> > >> >
> > >> > It mainly helps the project maintainers/reviewers to focus on only the
> > >> > actively updated trimmed list of PRs that are ready for review.
> > >> >
> > >> > It helps Flink users who are waiting on a PR that enhances an existing
> > >> > feature or fixes an issue a clear indication on whether the PR will be
> > >> > continually worked on and eventually get a closure or not and therefore
> > >> > will be closed.
> > >> >
> > >> > Btw, I am open to other suggestions or enhancements on top of the
> > >> proposal
> > >> > as well.
> > >> >
> > >> > > it would
> > >> > just close PRs where maintainers haven't been able to perform a
> > >> > review, but getting a PR closed without any feedback is also
> > >> > demotivating for a (potential new) contributor
> > >> >
> > >> > It would be demotivating for any contributor when there is no feedback
> > >> for
> > >> > a PR within a sufficient period of time anyway. I don't see closing the
> > >> PR
> > >> > which is inactive after a sufficient period of time (say 60 to 90 days)
> > >> > would be any more discouraging than not getting any feedback. The
> > >> problem
> > >> > of not getting feedback due to not enough maintainer's bandwidth has to
> > >> be
> > >> > solved through other mechanisms.
> > >> >
> > >> > > I think the important
> > >> > thing is that we get into a cycle where maintainers can see which PRs
> > >> > are ready for review, and also a way to divide the bulk of the work.
> > >> >
> > >> > Yes, exactly my point as well. It helps the maintainers to see a 
> > >> > trimmed
> > >> > list which is ready to be reviewed.
> > >> >
> > >> > +1 for the other automation to nudge/help the contributor to fix the PR
> > >> > that follows the contribution guide, CI checks passed etc.
> > >> >
> > >> > > IIRC we can't really fix that until we can
> > >> > finally move to dedicated Github Action Runners instead of the current
> > >> > setup with Azure, but that's primarily blocked by ASF Infra.
> > >> >
> > >> > Curious, if you can share the JIRA or prior discussion on this topic. I
> > >> > would like to learn more about why Github Actions cannot be used for
> > >> Apache
> > >> > Flink.
> > >> >
> > >> > Regards
> > >> > Venkata krishnan
> > >> >
> > >> >
> > >> > On Tue, Sep 19, 2023 at 2:00 PM Martijn Visser <
> > >> martijnvis...@apache.org>
> > >> > wrote:
> > >> >
> > >> > > Hi Venkata,
> > >> > >
> > >> > > Thanks for opening the discussion, I've been thinking about it quite 
> > >> > > a
> > >> > > bit but I'm not sure what's the right approach.
> > >> > >
> > >> > > From your proposal, the question would be "What's the added value of
> > >> > > closing these PRs"? I don't see an immediate value of that: it would
> > >> > > just close PRs where maintainers haven't been able to perform a
> > >> > > review, but getting a PR closed without any feedback is also
> > >> > > demotivating for a (potential new) contributor. I think the important
> > >> > > thing is that we get into a cycle where maintainers can see which PRs
> > >> > > are ready for review, and also a way to divide the bulk of the work.
> > >> > > Because doing proper reviews requires time, and these resources are
> > >> > > scarce.
> > >> > >
> > >> > > I do think that we can make lives a bit easier with some automation:
> > >> > > * There are a lot of PRs which don't follow the contribution guide 
> > >> > > (No
> > >> > > Jira ticket, no correct commit message etc). For the externalized
> > >> > > connector repositories, we've been trying Boring Cyborg to provide
> > >> > > information back to contributors if their PRs are as expected. If the
> > >> > > PR doesn't follow the contribution guide, I'm included to give such a
> > >> > > PR less attention review. That's primarily because there are other 
> > >> > > PRs
> > >> > > out there that do follow these guides.
> > >> > > * There are even more PRs where the CI has failed: in those cases, a
> > >> > > review also makes less sense, given that the PR can't be merged as 
> > >> > > is.
> > >> > > I do see that contributors sometimes don't know where to look for the
> > >> > > status of the CI, but IIRC we can't really fix that until we can
> > >> > > finally move to dedicated Github Action Runners instead of the 
> > >> > > current
> > >> > > setup with Azure, but that's primarily blocked by ASF Infra.
> > >> > >
> > >> > > I'm curious what others in the community think.
> > >> > >
> > >> > > Best regards,
> > >> > >
> > >> > > Martijn
> > >> > >
> > >> > > On Tue, Sep 19, 2023 at 10:33 PM Venkatakrishnan Sowrirajan
> > >> > > <vsowr...@asu.edu> wrote:
> > >> > > >
> > >> > > > Hi Flink devs,
> > >> > > >
> > >> > > > There are currently over 1,000 open pull requests
> > >> > > > <
> > >> > >
> > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc   
> > >> <https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc  
> > >>  >
> > >> > > >
> > >> > > > (PRs) in the Apache Flink repository, with only 162 having been
> > >> updated
> > >> > > in
> > >> > > > the last two months
> > >> > > > <
> > >> > >
> > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
> > >>    
> > >> :>2023-07-19<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
> > >>    :>2023-07-19>
> > >> > > >.
> > >> > > > This means that more than 85% of the PRs are stale and have not 
> > >> > > > been
> > >> > > > touched.
> > >> > > >
> > >> > > > I suggest setting up Github actions to monitor these stale PRs, and
> > >> > > > automatically closing them if they have not been updated in the
> > >> last 'x'
> > >> > > > days. Authors would still be able to reopen the closed PRs if they
> > >> > > continue
> > >> > > > with their work. This would help to keep the PR queue manageable.
> > >> > > >
> > >> > > > Not sure if this has been discussed in the Apache Flink community
> > >> before.
> > >> > > >
> > >> > > > Thoughts?
> > >> > > >
> > >> > > > Regards
> > >> > > > Venkata krishnan
> > >> > >
> > >>
> > >
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Reply via email to