dosubot[bot] commented on issue #7516: URL: https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132788285
<!-- Answer --> To address the issue of incremental data collection in the `collectMultiBranchJobApiBuilds` function, you can indeed modify it to call the relevant section of the code for each entry in the iterator. This approach can help ensure that only new or changed data is processed. ### Modifying `collectMultiBranchJobApiBuilds`: You can use the `NewStatefulApiCollectorForFinalizableEntity` function to manage state and track the last processed build. However, since it does not support iterators directly, you can adapt the function to handle each entry in the iterator. Here's a modified version of the `collectMultiBranchJobApiBuilds` function: ```go func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext) errors.Error { db := taskCtx.GetDal() data := taskCtx.GetData().(*JenkinsTaskData) logger := taskCtx.GetLogger() clauses := []dal.Clause{ dal.Select("j.full_name,j.name,j.path,j.class,j.url"), dal.From("_tool_jenkins_jobs as j"), dal.Where(`j.connection_id = ? and j.class = ? and j._raw_data_table = ?`, data.Options.ConnectionId, WORKFLOW_JOB, fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)), } cursor, err := db.Cursor(clauses...) if err != nil { return err } defer cursor.Close() iterator, err := helper.NewDalCursorIterator(db, cursor, reflect.TypeOf(SimpleJob{})) if err != nil { return err } for iterator.HasNext() { job := iterator.Next().(*SimpleJob) collectorWithState, err := helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{ Params: JenkinsApiParams{ ConnectionId: data.Options.ConnectionId, FullName: job.FullName, }, Ctx: taskCtx, Table: RAW_BUILD_TABLE, }) if err != nil { return err } err = collectorWithState.InitCollector(helper.ApiCollectorArgs{ ApiClient: data.ApiClient, Input: iterator, UrlTemplate: "{{ .Input.Path }}api/json", Query: func(reqData *helper.RequestData) (url.Values, errors.Error) { query := url.Values{} treeValue := "allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]" query.Set("tree", treeValue) logger.Debug("Query: %v", query) return query, nil }, ResponseParser: func(res *http.Response) ([]json.RawMessage, errors.Error) { var data struct { Builds []json.RawMessage `json:"allBuilds"` } err := helper.UnmarshalResponse(res, &data) if err != nil { return nil, err } builds := make([]json.RawMessage, 0, len(data.Builds)) for _, build := range data.Builds { var buildObj map[string]interface{} err := json.Unmarshal(build, &buildObj) if err != nil { return nil, errors.Convert(err) } if buildObj["result"] != nil { builds = append(builds, build) } } logger.Debug("Returning this number of builds: %v", len(builds)) return builds, nil }, AfterResponse: ignoreHTTPStatus404, }) if err != nil { return err } err = collectorWithState.Execute() if err != nil { return err } } return nil } ``` ### Cleaning or Archiving Raw Tables: Regarding the cleaning or archiving of raw tables, there is no specific information available in the provided context. However, it is generally a good practice to periodically clean or archive raw tables to prevent them from growing indefinitely and impacting performance. You can implement a scheduled task to archive or delete old data based on your retention policy. ### Summary: - Modify `collectMultiBranchJobApiBuilds` to call the relevant section of the code for each entry in the iterator. - Use `NewStatefulApiCollector` to manage state and track the last processed build. - Consider implementing a scheduled task to clean or archive raw tables periodically. This approach ensures efficient incremental data collection and helps maintain the performance of your system <sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup>. <!-- Dosu Comment Footer --> <details> <summary> #### Details </summary> **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. </details> *To continue the conversation, mention @dosu.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org