dosubot[bot] commented on issue #7516:
URL: 
https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132776650

   <!-- Greeting -->
   @gustavobini Hello! I'm here to help you with any bugs, questions, or 
contributions. I'll assist you while waiting for a human maintainer.
   
   <!-- Answer -->
   To address the issue of the `extractApiBuilds` task in Jenkins multi-branch 
jobs getting incrementally slower due to reprocessing the entire raw table, you 
can modify the task to only extract new or changed data during each execution. 
This can be achieved by leveraging the `StatefulApiCollector` mechanism 
provided by the DevLake framework.
   
   Here's how you can modify the `collectMultiBranchJobApiBuilds` function:
   
   1. **Use the `StatefulApiCollector` to manage the state and track the last 
processed build.**
   2. **Modify the `collectMultiBranchJobApiBuilds` function to use this 
collector.**
   
   Here is the modified code:
   
   ```go
   func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext) 
errors.Error {
        db := taskCtx.GetDal()
        data := taskCtx.GetData().(*JenkinsTaskData)
        logger := taskCtx.GetLogger()
   
        clauses := []dal.Clause{
                dal.Select("j.full_name,j.name,j.path,j.class,j.url"),
                dal.From("_tool_jenkins_jobs as j"),
                dal.Where(`j.connection_id = ? and j.class = ? and 
j._raw_data_table = ?`,
                        data.Options.ConnectionId, WORKFLOW_JOB, 
fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)),
        }
        cursor, err := db.Cursor(clauses...)
        if err != nil {
                return err
        }
        defer cursor.Close()
   
        iterator, err := helper.NewDalCursorIterator(db, cursor, 
reflect.TypeOf(SimpleJob{}))
        if err != nil {
                return err
        }
   
        collectorWithState, err := 
helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{
                Params: JenkinsApiParams{
                        ConnectionId: data.Options.ConnectionId,
                        FullName:     data.Options.JobFullName,
                },
                Ctx:   taskCtx,
                Table: RAW_BUILD_TABLE,
        })
        if err != nil {
                return err
        }
   
        logger.Debug("About to call collectorWithState.InitCollector")
   
        err = collectorWithState.InitCollector(helper.ApiCollectorArgs{
                ApiClient:   data.ApiClient,
                Input:       iterator,
                UrlTemplate: "{{ .Input.Path }}api/json",
                Query: func(reqData *helper.RequestData) (url.Values, 
errors.Error) {
                        query := url.Values{}
                        treeValue := 
"allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]"
                        query.Set("tree", treeValue)
   
                        logger.Debug("Query: %v", query)
   
                        return query, nil
                },
                ResponseParser: func(res *http.Response) ([]json.RawMessage, 
errors.Error) {
                        var data struct {
                                Builds []json.RawMessage `json:"allBuilds"`
                        }
                        err := helper.UnmarshalResponse(res, &data)
                        if err != nil {
                                return nil, err
                        }
   
                        builds := make([]json.RawMessage, 0, len(data.Builds))
                        for _, build := range data.Builds {
                                var buildObj map[string]interface{}
                                err := json.Unmarshal(build, &buildObj)
                                if err != nil {
                                        return nil, errors.Convert(err)
                                }
                                if buildObj["result"] != nil {
                                        builds = append(builds, build)
                                }
                        }
   
                        logger.Debug("Returning this number of builds: %v", 
len(builds))
                        return builds, nil
                },
                AfterResponse: ignoreHTTPStatus404,
        })
   
        if err != nil {
                return err
        }
   
        return collectorWithState.Execute()
   }
   ```
   
   This code uses the `StatefulApiCollector` to manage the state and track the 
last processed build, ensuring that only new or changed builds are processed. 
The `ResponseParser` function filters out builds that have already been 
processed, and the `AfterResponse` function handles any necessary 
post-processing. This approach leverages the existing DevLake framework to 
efficiently manage state and avoid reprocessing the entire raw table 
<sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup>.
   
   
   <!-- Dosu Comment Footer -->
   <details open>
   <summary>
   
   #### Details
   </summary>
   
   **About Dosu**
   This response is meant to be useful and save you time. It is **not meant to 
be a precise solution**, but rather a starting point for your own research.
   
   Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is 
irrelevant.
   </details>
   
   *To continue the conversation, mention @dosu.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to