justinmclean commented on PR #18:
URL: https://github.com/apache/comdev/pull/18#issuecomment-4613300724
Some other comments/possible considerations, roughtly in order:
- Move the confidentiality framing to the top. "These reports concern
governance decisions; treat as confidential PMC guidance, private@ only"
belongs in the system role at the very top of the prompt, not as an
afterthought at the end. An AI that reads the prompt sequentially might
generate content with a "share publicly" framing before it ever hits that rule.
- The identity-matching problem is a huge problem. Mapping a GitHub username
→ Apache ID → mailing-list From address is the single hardest part of this
task. May need to add an explicit step to try to resolve this; otherwise,
you'll get the same person counted twice, or merged identities that don't
belong together, or other issues. I've tried to solve this issue and have not
had much success.
- Tier criteria need quantitative anchors. "Tier 1 — Strong Candidates" and
"Tier 2 — Growing" are pure vibes right now. Even soft thresholds help: e.g.,
"Tier 1: ≥10 merged PRs in last 12 months across project repos AND sustained
dev@ engagement (≥20 messages across ≥3 months) AND at least one substantive
design discussion. Tier 2: meets one or two of those." Projects can tune the
numbers, but the structure forces consistency.
- Missing high-signal data sources.
- Code review activity
- Release vote participation
- Steps 1–5 are mostly independent. Possible add: "Steps 1, 2, 3, and 5 can
run in parallel"; otherwise, an agent will serialize, and the report takes much
longer than it needs to.
- Bot filter is incomplete and brittle. Add: also exclude commits where the
author email is [email protected], GitHub Actions, asfgit, infra-related
accounts, and any login containing -bot, -ci, or -automation. Possibly try:
filter to humans by checking that the GitHub user type is "User" and that the
real-name field is non-empty.
- Sampling Step 4 is fragile. "8 months every 3 months" plus a top-10 filter
will miss bursty contributors.
- The "no employer mentions" rule is good. Possibly extend: "Evaluate
contributions, not communication style, language fluency, time zone, or
response speed. A contributor who contributes asynchronously is not weaker than
one who matches your working hours."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]