dosubot[bot] commented on issue #7508: URL: https://github.com/apache/incubator-devlake/issues/7508#issuecomment-2128826930
<!-- Answer --> The "pull_requests" table might not populate all the repos added to the project after a successful pipeline run because the `CollectPrs` function relies on the `GithubApiParams` struct, which includes the `ConnectionId` and `Name` parameters. If these parameters are not correctly set or if the new repositories are not included in the scope defined by these parameters, the pull requests for those repositories will not be extracted and populated into the table. Additionally, the `CollectPrs` function uses a GraphQL query to collect pull requests, and it orders the results by `CREATED_AT` and uses cursor pagination. This approach may miss some pull requests that are updated during the collection process. Although these missed pull requests will be collected in the next run, it might not be sufficient for immediate data consistency. Here is the relevant code snippet from the `CollectPrs` function: ```go func CollectPrs(taskCtx plugin.SubTaskContext) errors.Error { data := taskCtx.GetData().(*tasks.GithubTaskData) var err errors.Error apiCollector, err := api.NewStatefulApiCollector(api.RawDataSubTaskArgs{ Ctx: taskCtx, Params: tasks.GithubApiParams{ ConnectionId: data.Options.ConnectionId, Name: data.Options.Name, }, Table: RAW_PRS_TABLE, }) if err != nil { return err } err = apiCollector.InitGraphQLCollector(api.GraphqlCollectorArgs{ GraphqlClient: data.GraphqlClient, PageSize: 10, BuildQuery: func(reqData *api.GraphqlRequestData) (interface{}, map[string]interface{}, error) { query := &GraphqlQueryPrWrapper{} if reqData == nil { return query, map[string]interface{}{}, nil } ownerName := strings.Split(data.Options.Name, "/") variables := map[string]interface{}{ "pageSize": graphql.Int(reqData.Pager.Size), "skipCursor": (*graphql.String)(reqData.Pager.SkipCursor), "owner": graphql.String(ownerName[0]), "name": graphql.String(ownerName[1]), } return query, variables, nil }, GetPageInfo: func(iQuery interface{}, args *api.GraphqlCollectorArgs) (*api.GraphqlQueryPageInfo, error) { query := iQuery.(*GraphqlQueryPrWrapper) return query.Repository.PullRequests.PageInfo, nil }, ResponseParser: func(iQuery interface{}, variables map[string]interface{}) ([]interface{}, error) { query := iQuery.(*GraphqlQueryPrWrapper) prs := query.Repository.PullRequests.Prs for _, rawL := range prs { if apiCollector.GetSince() != nil && !apiCollector.GetSince().Before(rawL.CreatedAt) { return nil, api.ErrFinishCollect } } return nil, nil }, }) if err != nil { return err } return apiCollector.Execute() } ``` Ensure that the `ConnectionId` and `Name` parameters are correctly set to include all the repositories you want to track. Additionally, consider the limitations of the current pagination and ordering strategy, and plan for potential updates to the collection logic to ensure all relevant pull requests are captured <sup>[[1]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github/tasks/pr_extractor.go)</sup><sup>[[2]](https://github.com/apache/incubator-devlake/blob/main/backend/python/plugins/azuredevops/azuredevops/streams/pull_requests.py)</sup><sup>[[3]](https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github_graphql/tasks/pr_collector.go)</sup>. <!-- Dosu Comment Footer --> <details> <summary> #### Details </summary> **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. </details> *To continue the conversation, mention @dosu.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org