dmora opened a new issue, #8646:
URL: https://github.com/apache/incubator-devlake/issues/8646

   
   ## Search before asking
   
   - [x] I searched the issues and found no similar feature request
   
   ## Use case
   
   **Problem**: Bot accounts (dependabot, github-actions, renovate, etc.) 
significantly skew DevLake metrics, particularly DORA calculations.
   
   In our organization's deployment:
   - **28% of commits** are from bots (1,763 of 6,222)
   - **17% of deployment commits** are bot-generated (70 of 560)
   - **Lead Time for Changes** is artificially affected by automated dependency 
updates
   
   The current bot handling (PR #7845) only ignores PRs where `author_id=0`, 
which doesn't cover:
   - Commits authored by `*[bot]` accounts
   - PRs with valid author IDs but bot-generated content (dependabot has a real 
GitHub account)
   - Title-pattern bot PRs ("Bump X from Y to Z")
   
   **Current workaround**: Users must create MySQL views or modify every 
Grafana dashboard query manually, which is error-prone and doesn't persist 
across DevLake upgrades.
   
   ## Describe the solution you'd like
   
   Add a **Bot Exclusion** section to Scope Config with the following options:
   
   ### 1. Author Pattern Exclusion
   ```
   Exclude authors matching patterns:
   [ ] *[bot]
   [ ] dependabot*
   [ ] renovate*
   [ ] github-actions*
   [ ] Custom: ___________
   ```
   
   ### 2. Title Pattern Exclusion (for PRs)
   ```
   Exclude PRs with titles matching:
   [ ] Bump * from * to *
   [ ] Update * from * to *
   [ ] Custom regex: ___________
   ```
   
   ### 3. Exclusion Scope
   ```
   Apply exclusion to:
   [x] Commits
   [x] Pull Requests
   [x] DORA Metrics (deployment commits)
   [ ] Issues
   ```
   
   ### Implementation Suggestion
   
   The filtering could be applied during the **transformation phase** (similar 
to how issue type mapping works) rather than at collection, allowing users to:
   1. Still collect all data for audit purposes
   2. Filter at query time via domain layer tables
   3. Toggle filtering without re-collecting data
   
   ### API Example
   ```json
   {
     "scopeConfig": {
       "botExclusion": {
         "enabled": true,
         "authorPatterns": ["*[bot]", "dependabot*"],
         "titlePatterns": ["Bump * from * to *"],
         "applyTo": ["commits", "pull_requests", "cicd_deployment_commits"]
       }
     }
   }
   ```
   
   ## Related issues
   
   - #7845 - fix(github): ignore bot account (partial solution for author_id=0)
   - #7786 - fix(github): process bot account in pull_requests table
   
   ## Are you willing to submit a PR?
   
   - [ ] Yes, I am willing to submit a PR
   
   ## Code of Conduct
   
   - [x] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to