dosubot[bot] commented on issue #7516:
URL:
https://github.com/apache/incubator-devlake/issues/7516#issuecomment-2132776650
<!-- Greeting -->
@gustavobini Hello! I'm here to help you with any bugs, questions, or
contributions. I'll assist you while waiting for a human maintainer.
<!-- Answer -->
To address the issue of the `extractApiBuilds` task in Jenkins multi-branch
jobs getting incrementally slower due to reprocessing the entire raw table, you
can modify the task to only extract new or changed data during each execution.
This can be achieved by leveraging the `StatefulApiCollector` mechanism
provided by the DevLake framework.
Here's how you can modify the `collectMultiBranchJobApiBuilds` function:
1. **Use the `StatefulApiCollector` to manage the state and track the last
processed build.**
2. **Modify the `collectMultiBranchJobApiBuilds` function to use this
collector.**
Here is the modified code:
```go
func collectMultiBranchJobApiBuilds(taskCtx plugin.SubTaskContext)
errors.Error {
db := taskCtx.GetDal()
data := taskCtx.GetData().(*JenkinsTaskData)
logger := taskCtx.GetLogger()
clauses := []dal.Clause{
dal.Select("j.full_name,j.name,j.path,j.class,j.url"),
dal.From("_tool_jenkins_jobs as j"),
dal.Where(`j.connection_id = ? and j.class = ? and
j._raw_data_table = ?`,
data.Options.ConnectionId, WORKFLOW_JOB,
fmt.Sprintf("_raw_%s", RAW_JOB_TABLE)),
}
cursor, err := db.Cursor(clauses...)
if err != nil {
return err
}
defer cursor.Close()
iterator, err := helper.NewDalCursorIterator(db, cursor,
reflect.TypeOf(SimpleJob{}))
if err != nil {
return err
}
collectorWithState, err :=
helper.NewStatefulApiCollector(helper.RawDataSubTaskArgs{
Params: JenkinsApiParams{
ConnectionId: data.Options.ConnectionId,
FullName: data.Options.JobFullName,
},
Ctx: taskCtx,
Table: RAW_BUILD_TABLE,
})
if err != nil {
return err
}
logger.Debug("About to call collectorWithState.InitCollector")
err = collectorWithState.InitCollector(helper.ApiCollectorArgs{
ApiClient: data.ApiClient,
Input: iterator,
UrlTemplate: "{{ .Input.Path }}api/json",
Query: func(reqData *helper.RequestData) (url.Values,
errors.Error) {
query := url.Values{}
treeValue :=
"allBuilds[timestamp,number,duration,building,estimatedDuration,fullDisplayName,result,actions[lastBuiltRevision[SHA1,branch[name]],remoteUrls,mercurialRevisionNumber,causes[*]],changeSet[kind,revisions[revision]]]"
query.Set("tree", treeValue)
logger.Debug("Query: %v", query)
return query, nil
},
ResponseParser: func(res *http.Response) ([]json.RawMessage,
errors.Error) {
var data struct {
Builds []json.RawMessage `json:"allBuilds"`
}
err := helper.UnmarshalResponse(res, &data)
if err != nil {
return nil, err
}
builds := make([]json.RawMessage, 0, len(data.Builds))
for _, build := range data.Builds {
var buildObj map[string]interface{}
err := json.Unmarshal(build, &buildObj)
if err != nil {
return nil, errors.Convert(err)
}
if buildObj["result"] != nil {
builds = append(builds, build)
}
}
logger.Debug("Returning this number of builds: %v",
len(builds))
return builds, nil
},
AfterResponse: ignoreHTTPStatus404,
})
if err != nil {
return err
}
return collectorWithState.Execute()
}
```
This code uses the `StatefulApiCollector` to manage the state and track the
last processed build, ensuring that only new or changed builds are processed.
The `ResponseParser` function filters out builds that have already been
processed, and the `AfterResponse` function handles any necessary
post-processing. This approach leverages the existing DevLake framework to
efficiently manage state and avoid reprocessing the entire raw table
<sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_extractor.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/jenkins/tasks/build_collector.go)</sup>.
<!-- Dosu Comment Footer -->
<details open>
<summary>
#### Details
</summary>
**About Dosu**
This response is meant to be useful and save you time. It is **not meant to
be a precise solution**, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is
irrelevant.
</details>
*To continue the conversation, mention @dosu.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]