FlomoN opened a new pull request, #8616:
URL: https://github.com/apache/incubator-devlake/pull/8616
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### ⚠️ Pre Checklist
> Please complete _ALL_ items in this checklist, and remove before submitting
- [x ] I have read through the [Contributing
Documentation](https://devlake.apache.org/community/).
- [ ] I have added relevant tests. -> Will do
- [ ] I have added relevant documentation. -> Will do. I guess on the
website? Could you point me to the right file?
- [x] I will add labels to the PR, such as `pr-type/bug-fix`,
`pr-type/feature-development`, etc.
<!--
Thanks for submitting a pull request!
We appreciate you spending the time to work on these changes.
Please fill out as many sections below as possible.
-->
### Summary
This PR fixes the Pagination bug from #8615 and adds the ability to adjust
the collection paramaters of the Github GraphQL Job collector through
environment variables.
The bug was caused by a pagination implementation for workflow jobs in
`job_collector.go` that didn't play well together with the simultaneous
batching of multiple workflow runs. In `getPageInfo` the first workflow run
that had `HasNextPage` true returned it's `EndCursor` which was then used for
pagination for all Workflow runs in the batch, but only worked for the one it
came from, therefore missing all the pages of other workflow runs in the same
batch.
```go
func getPageInfo(query interface{}, args *helper.GraphqlCollectorArgs)
(*helper.GraphqlQueryPageInfo, error) {
queryWrapper := query.(*GraphqlQueryCheckRunWrapper)
hasNextPage := false
endCursor := ""
for _, node := range queryWrapper.Node {
if node.CheckSuite.CheckRuns.PageInfo.HasNextPage {
hasNextPage = true
endCursor =
node.CheckSuite.CheckRuns.PageInfo.EndCursor // <- This cursor will be used for
all workflow runs, since only one skipCursor variable exists.
break // <- Then stops after the first was found
}
}
return &helper.GraphqlQueryPageInfo{
EndCursor: endCursor,
HasNextPage: hasNextPage,
}, nil
}
```
Since some people reported having timeout issues with this plugin
(@ClaudioMascaro and @robaca) for large repositories with large workflows I
didn't want to take away either option. But combining pagination and batching
at the same time didn't seem feasible without adding a lot of hard to maintain
complexity to it (like keeping track of all produced EndCursors and submitting
them to following collection, but only for those workflow runs that had more
pages...).
I ended up implementing a configurable mode switch:
- Either you choose **BATCHING** mode, if you want a little bit better
performance while collecting to reduce graphQL API calls, when your Workflow
Runs dont have more than 20 Jobs (this number is also adjustable however).
- OR you choose **PAGINATING** mode, if your workflow runs typically have
more than 20 Jobs and would potentially cause timeouts of the graphql API when
trying to fetch all jobs at once.
I think Rate Limit wise it doesnt make a difference (If I understood that
point system correctly), since both ways would in total consume about the same
amount of points.
Since it was suggested in #8469 to have the batch size and page size
configurable via environment variable, I also added this to give more
flexibility to the users in finding the sweet spot for their setup.
### Does this close any open issues?
Closes #8615
### Screenshots
Include any relevant screenshots here.
### Other Information
Maybe it would also make sense to add these configuration options to Scope
Config and thus make them configurable on a per scope config basis, but this
would also need migration scripts and so on. So maybe an improvement for the
future?
I will add tests aswell in the following days, if this approach seems
appropriate to you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]