wouldd commented on issue #7826:
URL: 
https://github.com/apache/incubator-devlake/issues/7826#issuecomment-2343343281

   @klesh the problem is not consistently happening (which is part of the 
problem) so I don't think it's something obviously co-inciding with code 
changes. rather I suspect a subtle timing condition based on the jira project 
itself and how things happen to run in the code.
   I've been adding debug logging as best I can to flesh out my understanding 
of what's happening and I do wonder if there is a potential problem with the 
batch divider logic.
   my understanding is that the code batches db writes by issue type into sets 
of 500 before they are then written in one go. The first time the code sees a 
given issue type it creates an empty batch to start using and at that point it 
calls delete on the database
   
![image](https://github.com/user-attachments/assets/e644c204-460e-41a1-9557-3ea826cc0442)
   I'm seeing quite a few deletes to the same raw database during the process 
and it's not clear to me that this is scoped. I'm wondering if there is a 
scenario in which data has been written by one batch when another is created 
and triggers a wipe of the data that was already written?
   
   in general my observation is that the stricture of these raw data tables is 
forcing a situation whereby there is no unique identifier for given issue 
payload? maybe I'm misreading things but it would seem there would be no need 
to purge this table ahead of a full refresh if the id was based on the jira 
unique issue id, it would just be able to do a createorupdate which would mean 
you'd never have weird gaps when the data is dropped etc.
   
   I will say that having instrumented the code and switched on debug logging I 
have not caught a failure scenario which could be bad luck or it could be that 
the act of logging more has shifted the timing a little to make it less of a 
problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to