dosubot[bot] commented on issue #7508:
URL:
https://github.com/apache/incubator-devlake/issues/7508#issuecomment-2128826930
<!-- Answer -->
The "pull_requests" table might not populate all the repos added to the
project after a successful pipeline run because the `CollectPrs` function
relies on the `GithubApiParams` struct, which includes the `ConnectionId` and
`Name` parameters. If these parameters are not correctly set or if the new
repositories are not included in the scope defined by these parameters, the
pull requests for those repositories will not be extracted and populated into
the table.
Additionally, the `CollectPrs` function uses a GraphQL query to collect pull
requests, and it orders the results by `CREATED_AT` and uses cursor pagination.
This approach may miss some pull requests that are updated during the
collection process. Although these missed pull requests will be collected in
the next run, it might not be sufficient for immediate data consistency.
Here is the relevant code snippet from the `CollectPrs` function:
```go
func CollectPrs(taskCtx plugin.SubTaskContext) errors.Error {
data := taskCtx.GetData().(*tasks.GithubTaskData)
var err errors.Error
apiCollector, err := api.NewStatefulApiCollector(api.RawDataSubTaskArgs{
Ctx: taskCtx,
Params: tasks.GithubApiParams{
ConnectionId: data.Options.ConnectionId,
Name: data.Options.Name,
},
Table: RAW_PRS_TABLE,
})
if err != nil {
return err
}
err = apiCollector.InitGraphQLCollector(api.GraphqlCollectorArgs{
GraphqlClient: data.GraphqlClient,
PageSize: 10,
BuildQuery: func(reqData *api.GraphqlRequestData) (interface{},
map[string]interface{}, error) {
query := &GraphqlQueryPrWrapper{}
if reqData == nil {
return query, map[string]interface{}{}, nil
}
ownerName := strings.Split(data.Options.Name, "/")
variables := map[string]interface{}{
"pageSize": graphql.Int(reqData.Pager.Size),
"skipCursor": (*graphql.String)(reqData.Pager.SkipCursor),
"owner": graphql.String(ownerName[0]),
"name": graphql.String(ownerName[1]),
}
return query, variables, nil
},
GetPageInfo: func(iQuery interface{}, args
*api.GraphqlCollectorArgs) (*api.GraphqlQueryPageInfo, error) {
query := iQuery.(*GraphqlQueryPrWrapper)
return query.Repository.PullRequests.PageInfo, nil
},
ResponseParser: func(iQuery interface{}, variables
map[string]interface{}) ([]interface{}, error) {
query := iQuery.(*GraphqlQueryPrWrapper)
prs := query.Repository.PullRequests.Prs
for _, rawL := range prs {
if apiCollector.GetSince() != nil &&
!apiCollector.GetSince().Before(rawL.CreatedAt) {
return nil, api.ErrFinishCollect
}
}
return nil, nil
},
})
if err != nil {
return err
}
return apiCollector.Execute()
}
```
Ensure that the `ConnectionId` and `Name` parameters are correctly set to
include all the repositories you want to track. Additionally, consider the
limitations of the current pagination and ordering strategy, and plan for
potential updates to the collection logic to ensure all relevant pull requests
are captured
<sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github/tasks/pr_extractor.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/python/plugins/azuredevops/azuredevops/streams/pull_requests.py)</sup><sup>[[3]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github_graphql/tasks/pr_collector.go)</sup>.
<!-- Dosu Comment Footer -->
<details>
<summary>
#### Details
</summary>
**About Dosu**
This response is meant to be useful and save you time. It is **not meant to
be a precise solution**, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is
irrelevant.
</details>
*To continue the conversation, mention @dosu.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]