rodrigoluizs commented on issue #8381:
URL: 
https://github.com/apache/incubator-devlake/issues/8381#issuecomment-2842399237

   Thanks @klesh !
   I went ahead and did a bit of analysis on how this could be implemented.
   
   Below is a draft proposal with possible approaches, pros/cons, and a few 
open questions for feedback. Let me know what you think — happy to iterate on 
it or prepare an RFC based on where you’d see the most value!
   
   # Excluding Automated Bot Changes from DORA Metrics
   
   ---
   
   ## 1. Filtering Approaches
   
   I'm thinking about two main approaches to apply bot filtering in the 
**queries used by Grafana dashboards**:
   
   ### Option 1: Filter dynamically by author name pattern (e.g., `%bot%`)
   
   **SQL example:**
   
   ```sql
   SELECT a.full_name, prm.*
   FROM devlake.project_pr_metrics prm
   INNER JOIN devlake.pull_requests pr ON prm.id = pr.id
   INNER JOIN devlake.accounts a ON pr.author_name = a.full_name
   WHERE a.full_name NOT LIKE '%bot%'
   ```
   
   **Pros:**
   - No schema changes needed
   - Quick to implement
   
   **Cons:**
   - Pattern matching is error-prone (some bots might not match)
   - Makes queries heavier and harder to maintain
   - Hard to manually override for special cases
   
   ---
   
   ### Option 2: Introduce a flag to pre-classify bots and filter on it
   
   Add a boolean `is_bot` field to the `accounts` table and propagate it into 
`project_pr_metrics` with a new `is_authored_by_bot` field.
   
   **SQL example:**
   
   ```sql
   SELECT *
   FROM devlake.project_pr_metrics
   WHERE NOT is_authored_by_bot
   ```
   
   **Pros:**
   - Queries stay clean and fast
   - Easier to manually control (allow exceptions, etc.)
   - More future-proof if bot naming conventions change
   
   **Cons:**
   - Requires schema changes and changes to data ingestion
   
   ---
   
   ## 2. How to Populate the New Flag
   
   In case we decide to go for a new flag, we see two options for how the 
`is_bot` flag could be populated:
   
   ### Option 1: Automatic Detection
   
   - When saving accounts, automatically set `is_bot = true` if:
     - `full_name` or `user_name` contains typical bot patterns like `bot`, 
`renovate`, `dependabot`, etc.
   
   **Pros:**
   - No manual maintenance needed
   - Works for most typical bots
   
   **Cons:**
   - Might incorrectly miss or misclassify accounts if naming conventions vary
     - This could be addressed by allowing the bot name patterns to be defined 
in the Git integration configs
   
   ---
   
   ### Option 2: Manual Control
   
   - Provide an API endpoint (or reuse an existing admin tool) that allows us 
to manually set or update `is_bot` on accounts.
   
   **Pros:**
   - Full flexibility for maintainers
   - Correct classification possible even for unusual bot names
   
   **Cons:**
   - Requires manual effort to maintain the bot list
   
   ---
   
   **Note:**  
   Both approaches could potentially be combined if preferred — automatic 
detection for most accounts, with manual overrides where needed.
   
   ---
   
   ## 3. Grafana Dashboard Behavior
   
   We also considered two options for displaying the filtered DORA metrics:
   
   ### Option 1: Add a new variable to existing dashboards
   
   - New Grafana variable: `include_bots`
     - Values: `true`, `false`
     - Default: `true` (bots included) to preserve current behavior and avoid 
surprising users
   - Queries would check the variable:
   
   ```sql
   WHERE (${include_bots} = 'true' OR is_authored_by_bot = FALSE)
   ```
   
   **Pros:**
   - No need to maintain separate dashboards
   - Easy for users to switch views
   - Low maintenance overhead
   
   **Cons:**
   - Slightly more complex queries
   - Might be overlooked without proper UI hint
   
   ---
   
   ### Option 2: Create separate dashboards excluding bots
   
   - Clone existing DORA dashboards
   - Hardcode filtering (`WHERE is_authored_by_bot = FALSE`)
   - Label clearly, e.g., “DORA Metrics (Human-only Changes)”
   
   **Pros:**
   - Very clear and easy for users to find
   - No need for users to toggle anything
   
   **Cons:**
   - Need to maintain two sets of dashboards
   - Possible risk of dashboard drift over time
   
   ---
   
   ## Summary
   
   | Decision Point       | Option 1                                          | 
Option 2                                           |
   
|----------------------|---------------------------------------------------|----------------------------------------------------|
   | Filtering Method     | Author name pattern matching                      | 
Pre-calculated `is_bot` flag                       |
   | Bot Detection        | Automatic detection                               | 
Manual control (or both combined)                  |
   | Grafana Dashboards   | Add a variable to existing dashboards             | 
Create new dashboards for human-only metrics       |
   
   ---
   
   ## Open Questions for Maintainers
   
   - What’s the preferred approach for filtering out automated changes — would 
dynamic filtering (e.g., by name pattern), a flag-based approach, or something 
else work best within DevLake’s model?
   - For identifying bots, do you see more value in automatic detection, 
manual, both combined or would you suggest a different mechanism?
   - What would be the best way to surface filtered metrics in Grafana — a 
dashboard variable, a separate dashboard, or another method?
   - If this sounds good, should I prepare a full RFC?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to