justinmclean commented on PR #18:
URL: https://github.com/apache/comdev/pull/18#issuecomment-4613300724

   Some other comments/possible considerations, roughtly in order:
   
   - Move the confidentiality framing to the top. "These reports concern 
governance decisions; treat as confidential PMC guidance, private@ only" 
belongs in the system role at the very top of the prompt, not as an 
afterthought at the end. An AI that reads the prompt sequentially might 
generate content with a "share publicly" framing before it ever hits that rule.
   
   - The identity-matching problem is a huge problem. Mapping a GitHub username 
→ Apache ID → mailing-list From address is the single hardest part of this 
task. May need to add an explicit step to try to resolve this; otherwise, 
you'll get the same person counted twice, or merged identities that don't 
belong together, or other issues. I've tried to solve this issue and have not 
had much success. 
   
   - Tier criteria need quantitative anchors. "Tier 1 — Strong Candidates" and 
"Tier 2 — Growing" are pure vibes right now. Even soft thresholds help: e.g., 
"Tier 1: ≥10 merged PRs in last 12 months across project repos AND sustained 
dev@ engagement (≥20 messages across ≥3 months) AND at least one substantive 
design discussion. Tier 2: meets one or two of those." Projects can tune the 
numbers, but the structure forces consistency.
   
   - Missing high-signal data sources.
     -  Code review activity
     - Release vote participation
   
   -  Steps 1–5 are mostly independent. Possible add: "Steps 1, 2, 3, and 5 can 
run in parallel"; otherwise, an agent will serialize, and the report takes much 
longer than it needs to.
   
   - Bot filter is incomplete and brittle. Add: also exclude commits where the 
author email is [email protected], GitHub Actions, asfgit, infra-related 
accounts, and any login containing -bot, -ci, or -automation. Possibly try: 
filter to humans by checking that the GitHub user type is "User" and that the 
real-name field is non-empty.
   
   - Sampling Step 4 is fragile. "8 months every 3 months" plus a top-10 filter 
will miss bursty contributors.
   
   - The "no employer mentions" rule is good. Possibly extend: "Evaluate 
contributions, not communication style, language fluency, time zone, or 
response speed. A contributor who contributes asynchronously is not weaker than 
one who matches your working hours."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to