rodrigoluizs commented on issue #8381:
URL: 
https://github.com/apache/incubator-devlake/issues/8381#issuecomment-2898676920

   Thanks for the feedback and questions, @Startrekzky and @klesh!
   
   > **Which tables will the `is_bot` flag be added, only table accounts?**
   
   Yes, the `is_bot` flag would be added only to the `accounts` table. In 
addition to that, my plan was to add a new column to `project_pr_metrics` 
called `is_authored_by_bot`, since I believe that name better reflects the 
context of the pull request entity and makes the intent clearer when querying.
   
   > **Which cases will the `is_bot` take effect? I'm not worrying about the 
dashboard queries but the plugin's internal processing logic, for instance, the 
calculation in the DORA plugin to generate table `project_pr_metrics` might 
also take the bot PRs or commits. If so, the plugin's processing logic needs to 
be updated as well after the `is_bot` flag is introduced.**
   
   You’re absolutely right — for this to work reliably, the DORA plugin’s 
processing logic that populates `project_pr_metrics` would also need to be 
updated to propagate the `is_bot` value from the author account into the 
`is_authored_by_bot` field during metric calculation. That way, downstream 
queries (like Grafana dashboards) can filter without needing to join back to 
the `accounts` table.
   
   > **For bot detection, we could use the environment variables with default 
values to achieve both auto + manual control**
   
   Using environment variables to control the bot name patterns sounds like a 
great way to support both automatic detection and manual overrides — I’ll 
incorporate that into the plan as well.
   
   > **For Grafana dashboards, if the `is_bot` is added, updating the SQL in 
the existing dashboard would be my choice.**
   
   That was also my preferred approach — nice to hear that you agree!  
   Just a small note: my intention was to filter on the new 
`is_authored_by_bot` column in `project_pr_metrics`.
   
   ---
   
   ### Follow-up
   
   Based on your input, my current understanding of the preferred direction is:
   
   1. **Filtering approach:** Use **Option 2** — introduce a flag (`is_bot` in 
`accounts`, and `is_authored_by_bot` in `project_pr_metrics`)
   2. **Bot detection:** Combine **automatic detection** with **manual 
override**, using an environment variable to define bot name patterns
   3. **Dashboard behavior:**  
      - Update existing dashboards to support filtering based on 
`is_authored_by_bot`  
      - I’d like your feedback on the idea to introduce an **`include_bots` 
variable** to control whether bot-authored changes should be filtered in the 
queries or not.  
        The idea here is to avoid introducing a breaking change and to keep the 
DORA metrics the same for users who do not explicitly opt in to this new 
feature.
   
   ---
   
   Does this align with how you both see it?
   
   Do you agree with the proposed column names — `is_bot` for the `accounts` 
table and `is_authored_by_bot` for `project_pr_metrics`?
   
   Additionally, I’d appreciate some clarification on how new environment 
variables can be introduced in DevLake, as I’m not very familiar with that part 
of the project yet.
   
   Just want to make sure we’re on the same page before moving forward with an 
RFC or implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to