mindlesscloud opened a new issue, #3867:
URL: https://github.com/apache/incubator-devlake/issues/3867

   ## What and why to refactor
   From v0.12.0, DevLake introduced `entities` for the blueprint, which 
indicates what kind of data would be processed. Users could select  `entities` 
they need to customize the data processing
   There are 5 types of `entities` for now. 
    - CODE
    - CODEREVIEW
    - TICKET
    - CICD
    - CROSS
   
   Beneath the surface, all plugins of DevLake consist of some subtasks, every 
subtask is assigned some `entities`. For example, the `entities` of subtask 
`GitHub issue collector` is `TICKET`. If a blueprint with `entities` is 
specified as `CODE, CODEREVIEW`, the `GitHub issue collector` would not be 
executed. As a rule of thumb, we take the `entities` of the blueprint and 
subtasks as sets, the subtasks would be executed only if the intersection is 
nonempty. It seems like an elegant solution. However, the subtasks are not 
independent, and in some cases, this would cause serious problems.Assuming 
there is a subtask A which is dependent on another subtask B. the `entities` of 
A and B are `CODE` and `TICKET` respectively, if we want to execute a `CODE` 
blueprint, task A will be executed without B. The bug reported in 
[#3720](https://github.com/apache/incubator-devlake/issues/3720) was 
essentially caused by this inconsistency.
   
   ## Describe the solution you'd like
   There are three ways to fix it.
   1. check every subtask and update their `entities` to make sure the 
dependency is consistent with all subtasks
   2. refactor plugins and framework to let the DevLake be aware of the 
dependency among subtasks, the dependent subtasks will be inferred and executed
   3. select some subtasks as default ones, they will be executed anyway, this 
could be achieved by setting all five `entities` to them.
   
   After discussion within the team, we will go with the first one, due to the 
simplicity and will not break the current architecture. As the first step, we 
are going to refactor the subtasks of the plugin `GitHub` related to pull 
requests. The plan is summarized in the following table. Any comments or 
suggestions will be appreciated.
   
   | subtask                     |      current            |    after refactor  
   |
   |-----------------------------|---------------------|---------------------|
   | pr collector               |       CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr extractor                |      CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr convertor                |      CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr review collector         |      CROSS  CODEREVIEW        | CROSS  
CODEREVIEW      |
   | pr review extractor         |      CROSS  CODEREVIEW        | CROSS  
CODEREVIEW      |
   | pr review convertor                |       CROSS  CODEREVIEW        |      
CODEREVIEW              |
   | pr review comment collector |      CODEREVIEW              | CROSS 
CODEREVIEW    |
   | pr review comment extractor |      CODEREVIEW              | CROSS 
CODEREVIEW    |
   | pr review comment convertor |      CODEREVIEW              |       
CODEREVIEW              |
   | pr commit collector         |      CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr commit extractor         |      CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr commit convertor         |      CODEREVIEW         |    CROSS  
CODEREVIEW        |
   | pr issue convertor          |      CROSS                |  CROSS           
     |
   | pr issue enricher           |      CROSS                |  CROSS           
     |
   | pr label convertor          |      CODEREVIEW              |       
CODEREVIEW              |
   
   ## Related issues
   #3720 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to