sstojak1 opened a new issue, #7457:
URL: https://github.com/apache/incubator-devlake/issues/7457

   <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   -->
   
   ## What and why to refactor
   What are you trying to refactor? Why should it be refactored now?
   The 
[pr_collector](https://github.com/apache/incubator-devlake/blob/5aac71130d12868f89074414c441e54cef51ca64/backend/plugins/bitbucket_server/tasks/pr_collector.go#L30)
 for Bitbucket Server consistently adds the same data to the 
RAW_PULL_REQUEST_TABLE after each run.
   Consequently, the extractApiPullRequests process slows down because it has 
to sift through all the records in the raw table, including duplicates. 
   For instance, if a repository has 1000 pull requests, after 10 job runs, the 
raw table will contain 10,000 rows, and extractApiPullRequests will have to 
process each of these records.
   
   ## Describe the solution you'd like
   How to refactor?
   
   1. Perhaps we could go with deleting all records from the raw table and 
importing PR again with each job run. This would prevent duplicates and avoid 
slowing down extractApiPullRequests task.
   2. Check how Bitbucket plugin is doing it and maybe reuse the logic if it's 
better? 
   
   ## Related issues
   Please link any other
   
   ## Additional context
   Add any other context or screenshots about the feature request here.
   
   How to recreate:
   Run Collect Data for Bitbucket Server more than once and observe the size of 
_raw_bitbucket_server_api_pull_requests table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to