rodrigoluizs commented on issue #8381: URL: https://github.com/apache/incubator-devlake/issues/8381#issuecomment-2898676920
Thanks for the feedback and questions, @Startrekzky and @klesh! > **Which tables will the `is_bot` flag be added, only table accounts?** Yes, the `is_bot` flag would be added only to the `accounts` table. In addition to that, my plan was to add a new column to `project_pr_metrics` called `is_authored_by_bot`, since I believe that name better reflects the context of the pull request entity and makes the intent clearer when querying. > **Which cases will the `is_bot` take effect? I'm not worrying about the dashboard queries but the plugin's internal processing logic, for instance, the calculation in the DORA plugin to generate table `project_pr_metrics` might also take the bot PRs or commits. If so, the plugin's processing logic needs to be updated as well after the `is_bot` flag is introduced.** You’re absolutely right — for this to work reliably, the DORA plugin’s processing logic that populates `project_pr_metrics` would also need to be updated to propagate the `is_bot` value from the author account into the `is_authored_by_bot` field during metric calculation. That way, downstream queries (like Grafana dashboards) can filter without needing to join back to the `accounts` table. > **For bot detection, we could use the environment variables with default values to achieve both auto + manual control** Using environment variables to control the bot name patterns sounds like a great way to support both automatic detection and manual overrides — I’ll incorporate that into the plan as well. > **For Grafana dashboards, if the `is_bot` is added, updating the SQL in the existing dashboard would be my choice.** That was also my preferred approach — nice to hear that you agree! Just a small note: my intention was to filter on the new `is_authored_by_bot` column in `project_pr_metrics`. --- ### Follow-up Based on your input, my current understanding of the preferred direction is: 1. **Filtering approach:** Use **Option 2** — introduce a flag (`is_bot` in `accounts`, and `is_authored_by_bot` in `project_pr_metrics`) 2. **Bot detection:** Combine **automatic detection** with **manual override**, using an environment variable to define bot name patterns 3. **Dashboard behavior:** - Update existing dashboards to support filtering based on `is_authored_by_bot` - I’d like your feedback on the idea to introduce an **`include_bots` variable** to control whether bot-authored changes should be filtered in the queries or not. The idea here is to avoid introducing a breaking change and to keep the DORA metrics the same for users who do not explicitly opt in to this new feature. --- Does this align with how you both see it? Do you agree with the proposed column names — `is_bot` for the `accounts` table and `is_authored_by_bot` for `project_pr_metrics`? Additionally, I’d appreciate some clarification on how new environment variables can be introduced in DevLake, as I’m not very familiar with that part of the project yet. Just want to make sure we’re on the same page before moving forward with an RFC or implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org