dosubot[bot] commented on issue #7516:
URL: 
https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132788285

   <!-- Answer -->
   To address the issue of incremental data collection in the 
`collectMultiBranchJobApiBuilds` function, you can indeed modify it to call the 
relevant section of the code for each entry in the iterator. This approach can 
help ensure that only new or changed data is processed.
   
   ### Modifying `collectMultiBranchJobApiBuilds`:
   You can use the `NewStatefulApiCollectorForFinalizableEntity` function to 
manage state and track the last processed build. However, since it does not 
support iterators directly, you can adapt the function to handle each entry in 
the iterator.
   
   Here's a modified version of the `collectMultiBranchJobApiBuilds` function:
   
   ```go
   func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext) 
errors.Error {
        db := taskCtx.GetDal()
        data := taskCtx.GetData().(*JenkinsTaskData)
        logger := taskCtx.GetLogger()
   
        clauses := []dal.Clause{
                dal.Select("j.full_name,j.name,j.path,j.class,j.url"),
                dal.From("_tool_jenkins_jobs as j"),
                dal.Where(`j.connection_id = ? and j.class = ? and 
j._raw_data_table = ?`,
                        data.Options.ConnectionId, WORKFLOW_JOB, 
fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)),
        }
        cursor, err := db.Cursor(clauses...)
        if err != nil {
                return err
        }
        defer cursor.Close()
   
        iterator, err := helper.NewDalCursorIterator(db, cursor, 
reflect.TypeOf(SimpleJob{}))
        if err != nil {
                return err
        }
   
        for iterator.HasNext() {
                job := iterator.Next().(*SimpleJob)
                collectorWithState, err := 
helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{
                        Params: JenkinsApiParams{
                                ConnectionId: data.Options.ConnectionId,
                                FullName:     job.FullName,
                        },
                        Ctx:   taskCtx,
                        Table: RAW_BUILD_TABLE,
                })
                if err != nil {
                        return err
                }
   
                err = collectorWithState.InitCollector(helper.ApiCollectorArgs{
                        ApiClient:   data.ApiClient,
                        Input:       iterator,
                        UrlTemplate: "{{ .Input.Path }}api/json",
                        Query: func(reqData *helper.RequestData) (url.Values, 
errors.Error) {
                                query := url.Values{}
                                treeValue := 
"allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]"
                                query.Set("tree", treeValue)
   
                                logger.Debug("Query: %v", query)
                                return query, nil
                        },
                        ResponseParser: func(res *http.Response) 
([]json.RawMessage, errors.Error) {
                                var data struct {
                                        Builds []json.RawMessage 
`json:"allBuilds"`
                                }
                                err := helper.UnmarshalResponse(res, &data)
                                if err != nil {
                                        return nil, err
                                }
   
                                builds := make([]json.RawMessage, 0, 
len(data.Builds))
                                for _, build := range data.Builds {
                                        var buildObj map[string]interface{}
                                        err := json.Unmarshal(build, &buildObj)
                                        if err != nil {
                                                return nil, errors.Convert(err)
                                        }
                                        if buildObj["result"] != nil {
                                                builds = append(builds, build)
                                        }
                                }
   
                                logger.Debug("Returning this number of builds: 
%v", len(builds))
                                return builds, nil
                        },
                        AfterResponse: ignoreHTTPStatus404,
                })
   
                if err != nil {
                        return err
                }
   
                err = collectorWithState.Execute()
                if err != nil {
                        return err
                }
        }
   
        return nil
   }
   ```
   
   ### Cleaning or Archiving Raw Tables:
   Regarding the cleaning or archiving of raw tables, there is no specific 
information available in the provided context. However, it is generally a good 
practice to periodically clean or archive raw tables to prevent them from 
growing indefinitely and impacting performance. You can implement a scheduled 
task to archive or delete old data based on your retention policy.
   
   ### Summary:
   - Modify `collectMultiBranchJobApiBuilds` to call the relevant section of 
the code for each entry in the iterator.
   - Use `NewStatefulApiCollector` to manage state and track the last processed 
build.
   - Consider implementing a scheduled task to clean or archive raw tables 
periodically.
   
   This approach ensures efficient incremental data collection and helps 
maintain the performance of your system 
<sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup>.
   
   
   <!-- Dosu Comment Footer -->
   <details>
   <summary>
   
   #### Details
   </summary>
   
   **About Dosu**
   This response is meant to be useful and save you time. It is **not meant to 
be a precise solution**, but rather a starting point for your own research.
   
   Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is 
irrelevant.
   </details>
   
   *To continue the conversation, mention @dosu.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to