dosubot[bot] commented on issue #7516: URL: https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132776650
<!-- Greeting --> @gustavobini Hello! I'm here to help you with any bugs, questions, or contributions. I'll assist you while waiting for a human maintainer. <!-- Answer --> To address the issue of the `extractApiBuilds` task in Jenkins multi-branch jobs getting incrementally slower due to reprocessing the entire raw table, you can modify the task to only extract new or changed data during each execution. This can be achieved by leveraging the `StatefulApiCollector` mechanism provided by the DevLake framework. Here's how you can modify the `collectMultiBranchJobApiBuilds` function: 1. **Use the `StatefulApiCollector` to manage the state and track the last processed build.** 2. **Modify the `collectMultiBranchJobApiBuilds` function to use this collector.** Here is the modified code: ```go func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext) errors.Error { db := taskCtx.GetDal() data := taskCtx.GetData().(*JenkinsTaskData) logger := taskCtx.GetLogger() clauses := []dal.Clause{ dal.Select("j.full_name,j.name,j.path,j.class,j.url"), dal.From("_tool_jenkins_jobs as j"), dal.Where(`j.connection_id = ? and j.class = ? and j._raw_data_table = ?`, data.Options.ConnectionId, WORKFLOW_JOB, fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)), } cursor, err := db.Cursor(clauses...) if err != nil { return err } defer cursor.Close() iterator, err := helper.NewDalCursorIterator(db, cursor, reflect.TypeOf(SimpleJob{})) if err != nil { return err } collectorWithState, err := helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{ Params: JenkinsApiParams{ ConnectionId: data.Options.ConnectionId, FullName: data.Options.JobFullName, }, Ctx: taskCtx, Table: RAW_BUILD_TABLE, }) if err != nil { return err } logger.Debug("About to call collectorWithState.InitCollector") err = collectorWithState.InitCollector(helper.ApiCollectorArgs{ ApiClient: data.ApiClient, Input: iterator, UrlTemplate: "{{ .Input.Path }}api/json", Query: func(reqData *helper.RequestData) (url.Values, errors.Error) { query := url.Values{} treeValue := "allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]" query.Set("tree", treeValue) logger.Debug("Query: %v", query) return query, nil }, ResponseParser: func(res *http.Response) ([]json.RawMessage, errors.Error) { var data struct { Builds []json.RawMessage `json:"allBuilds"` } err := helper.UnmarshalResponse(res, &data) if err != nil { return nil, err } builds := make([]json.RawMessage, 0, len(data.Builds)) for _, build := range data.Builds { var buildObj map[string]interface{} err := json.Unmarshal(build, &buildObj) if err != nil { return nil, errors.Convert(err) } if buildObj["result"] != nil { builds = append(builds, build) } } logger.Debug("Returning this number of builds: %v", len(builds)) return builds, nil }, AfterResponse: ignoreHTTPStatus404, }) if err != nil { return err } return collectorWithState.Execute() } ``` This code uses the `StatefulApiCollector` to manage the state and track the last processed build, ensuring that only new or changed builds are processed. The `ResponseParser` function filters out builds that have already been processed, and the `AfterResponse` function handles any necessary post-processing. This approach leverages the existing DevLake framework to efficiently manage state and avoid reprocessing the entire raw table <sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup>. <!-- Dosu Comment Footer --> <details open> <summary> #### Details </summary> **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. </details> *To continue the conversation, mention @dosu.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org