sstojak1 opened a new issue, #7457: URL: https://github.com/apache/incubator-devlake/issues/7457
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> ## What and why to refactor What are you trying to refactor? Why should it be refactored now? The [pr_collector](https://github.com/apache/incubator-devlake/blob/5aac71130d12868f89074414c441e54cef51ca64/backend/plugins/bitbucket_server/tasks/pr_collector.go#L30) for Bitbucket Server consistently adds the same data to the RAW_PULL_REQUEST_TABLE after each run. Consequently, the extractApiPullRequests process slows down because it has to sift through all the records in the raw table, including duplicates. For instance, if a repository has 1000 pull requests, after 10 job runs, the raw table will contain 10,000 rows, and extractApiPullRequests will have to process each of these records. ## Describe the solution you'd like How to refactor? 1. Perhaps we could go with deleting all records from the raw table and importing PR again with each job run. This would prevent duplicates and avoid slowing down extractApiPullRequests task. 2. Check how Bitbucket plugin is doing it and maybe reuse the logic if it's better? ## Related issues Please link any other ## Additional context Add any other context or screenshots about the feature request here. How to recreate: Run Collect Data for Bitbucket Server more than once and observe the size of _raw_bitbucket_server_api_pull_requests table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org