sstojak1 opened a new issue, #7457:
URL: https://github.com/apache/incubator-devlake/issues/7457
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## What and why to refactor
What are you trying to refactor? Why should it be refactored now?
The
[pr_collector](https://github.com/apache/incubator-devlake/blob/5aac71130d12868f89074414c441e54cef51ca64/backend/plugins/bitbucket_server/tasks/pr_collector.go#L30)
for Bitbucket Server consistently adds the same data to the
RAW_PULL_REQUEST_TABLE after each run.
Consequently, the extractApiPullRequests process slows down because it has
to sift through all the records in the raw table, including duplicates.
For instance, if a repository has 1000 pull requests, after 10 job runs, the
raw table will contain 10,000 rows, and extractApiPullRequests will have to
process each of these records.
## Describe the solution you'd like
How to refactor?
1. Perhaps we could go with deleting all records from the raw table and
importing PR again with each job run. This would prevent duplicates and avoid
slowing down extractApiPullRequests task.
2. Check how Bitbucket plugin is doing it and maybe reuse the logic if it's
better?
## Related issues
Please link any other
## Additional context
Add any other context or screenshots about the feature request here.
How to recreate:
Run Collect Data for Bitbucket Server more than once and observe the size of
_raw_bitbucket_server_api_pull_requests table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]