rodrigoluizs commented on issue #8381: URL: https://github.com/apache/incubator-devlake/issues/8381#issuecomment-2842399237
Thanks @klesh ! I went ahead and did a bit of analysis on how this could be implemented. Below is a draft proposal with possible approaches, pros/cons, and a few open questions for feedback. Let me know what you think — happy to iterate on it or prepare an RFC based on where you’d see the most value! # Excluding Automated Bot Changes from DORA Metrics --- ## 1. Filtering Approaches I'm thinking about two main approaches to apply bot filtering in the **queries used by Grafana dashboards**: ### Option 1: Filter dynamically by author name pattern (e.g., `%bot%`) **SQL example:** ```sql SELECT a.full_name, prm.* FROM devlake.project_pr_metrics prm INNER JOIN devlake.pull_requests pr ON prm.id = pr.id INNER JOIN devlake.accounts a ON pr.author_name = a.full_name WHERE a.full_name NOT LIKE '%bot%' ``` **Pros:** - No schema changes needed - Quick to implement **Cons:** - Pattern matching is error-prone (some bots might not match) - Makes queries heavier and harder to maintain - Hard to manually override for special cases --- ### Option 2: Introduce a flag to pre-classify bots and filter on it Add a boolean `is_bot` field to the `accounts` table and propagate it into `project_pr_metrics` with a new `is_authored_by_bot` field. **SQL example:** ```sql SELECT * FROM devlake.project_pr_metrics WHERE NOT is_authored_by_bot ``` **Pros:** - Queries stay clean and fast - Easier to manually control (allow exceptions, etc.) - More future-proof if bot naming conventions change **Cons:** - Requires schema changes and changes to data ingestion --- ## 2. How to Populate the New Flag In case we decide to go for a new flag, we see two options for how the `is_bot` flag could be populated: ### Option 1: Automatic Detection - When saving accounts, automatically set `is_bot = true` if: - `full_name` or `user_name` contains typical bot patterns like `bot`, `renovate`, `dependabot`, etc. **Pros:** - No manual maintenance needed - Works for most typical bots **Cons:** - Might incorrectly miss or misclassify accounts if naming conventions vary - This could be addressed by allowing the bot name patterns to be defined in the Git integration configs --- ### Option 2: Manual Control - Provide an API endpoint (or reuse an existing admin tool) that allows us to manually set or update `is_bot` on accounts. **Pros:** - Full flexibility for maintainers - Correct classification possible even for unusual bot names **Cons:** - Requires manual effort to maintain the bot list --- **Note:** Both approaches could potentially be combined if preferred — automatic detection for most accounts, with manual overrides where needed. --- ## 3. Grafana Dashboard Behavior We also considered two options for displaying the filtered DORA metrics: ### Option 1: Add a new variable to existing dashboards - New Grafana variable: `include_bots` - Values: `true`, `false` - Default: `true` (bots included) to preserve current behavior and avoid surprising users - Queries would check the variable: ```sql WHERE (${include_bots} = 'true' OR is_authored_by_bot = FALSE) ``` **Pros:** - No need to maintain separate dashboards - Easy for users to switch views - Low maintenance overhead **Cons:** - Slightly more complex queries - Might be overlooked without proper UI hint --- ### Option 2: Create separate dashboards excluding bots - Clone existing DORA dashboards - Hardcode filtering (`WHERE is_authored_by_bot = FALSE`) - Label clearly, e.g., “DORA Metrics (Human-only Changes)” **Pros:** - Very clear and easy for users to find - No need for users to toggle anything **Cons:** - Need to maintain two sets of dashboards - Possible risk of dashboard drift over time --- ## Summary | Decision Point | Option 1 | Option 2 | |----------------------|---------------------------------------------------|----------------------------------------------------| | Filtering Method | Author name pattern matching | Pre-calculated `is_bot` flag | | Bot Detection | Automatic detection | Manual control (or both combined) | | Grafana Dashboards | Add a variable to existing dashboards | Create new dashboards for human-only metrics | --- ## Open Questions for Maintainers - What’s the preferred approach for filtering out automated changes — would dynamic filtering (e.g., by name pattern), a flag-based approach, or something else work best within DevLake’s model? - For identifying bots, do you see more value in automatic detection, manual, both combined or would you suggest a different mechanism? - What would be the best way to surface filtered metrics in Grafana — a dashboard variable, a separate dashboard, or another method? - If this sounds good, should I prepare a full RFC? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org