mindlesscloud opened a new issue, #3867:
URL: https://github.com/apache/incubator-devlake/issues/3867
## What and why to refactor
From v0.12.0, DevLake introduced `entities` for the blueprint, which
indicates what kind of data would be processed. Users could select `entities`
they need to customize the data processing
There are 5 types of `entities` for now.
- CODE
- CODEREVIEW
- TICKET
- CICD
- CROSS
Beneath the surface, all plugins of DevLake consist of some subtasks, every
subtask is assigned some `entities`. For example, the `entities` of subtask
`GitHub issue collector` is `TICKET`. If a blueprint with `entities` is
specified as `CODE, CODEREVIEW`, the `GitHub issue collector` would not be
executed. As a rule of thumb, we take the `entities` of the blueprint and
subtasks as sets, the subtasks would be executed only if the intersection is
nonempty. It seems like an elegant solution. However, the subtasks are not
independent, and in some cases, this would cause serious problems.Assuming
there is a subtask A which is dependent on another subtask B. the `entities` of
A and B are `CODE` and `TICKET` respectively, if we want to execute a `CODE`
blueprint, task A will be executed without B. The bug reported in
[#3720](https://github.com/apache/incubator-devlake/issues/3720) was
essentially caused by this inconsistency.
## Describe the solution you'd like
There are three ways to fix it.
1. check every subtask and update their `entities` to make sure the
dependency is consistent with all subtasks
2. refactor plugins and framework to let the DevLake be aware of the
dependency among subtasks, the dependent subtasks will be inferred and executed
3. select some subtasks as default ones, they will be executed anyway, this
could be achieved by setting all five `entities` to them.
After discussion within the team, we will go with the first one, due to the
simplicity and will not break the current architecture. As the first step, we
are going to refactor the subtasks of the plugin `GitHub` related to pull
requests. The plan is summarized in the following table. Any comments or
suggestions will be appreciated.
| subtask | current | after refactor
|
|-----------------------------|---------------------|---------------------|
| pr collector | CODEREVIEW | CROSS
CODEREVIEW |
| pr extractor | CODEREVIEW | CROSS
CODEREVIEW |
| pr convertor | CODEREVIEW | CROSS
CODEREVIEW |
| pr review collector | CROSS CODEREVIEW | CROSS
CODEREVIEW |
| pr review extractor | CROSS CODEREVIEW | CROSS
CODEREVIEW |
| pr review convertor | CROSS CODEREVIEW |
CODEREVIEW |
| pr review comment collector | CODEREVIEW | CROSS
CODEREVIEW |
| pr review comment extractor | CODEREVIEW | CROSS
CODEREVIEW |
| pr review comment convertor | CODEREVIEW |
CODEREVIEW |
| pr commit collector | CODEREVIEW | CROSS
CODEREVIEW |
| pr commit extractor | CODEREVIEW | CROSS
CODEREVIEW |
| pr commit convertor | CODEREVIEW | CROSS
CODEREVIEW |
| pr issue convertor | CROSS | CROSS
|
| pr issue enricher | CROSS | CROSS
|
| pr label convertor | CODEREVIEW |
CODEREVIEW |
## Related issues
#3720
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]