wouldd commented on issue #7826: URL: https://github.com/apache/incubator-devlake/issues/7826#issuecomment-2378969448
@d4x1 So I alluded to the observation in an earlier comment. the current implementation is designed in such a way that it must delete all the contents from the raw tables before populating again because it just uses randomly generated primary keys. So if anything goes wrong during the process then you can wind up without data. I'm not 100% certain but I think there are cases where an sql deadlock error in a batch save can cause a failure that gets swallowed. So I updated the code to allow the plugins to specify the json path to the unqiue id of the object being retrieved from the remote system. Everything being pulled in raw data always has a pretty obvious unique value from that system. often it's called 'id' but I basically just set a CONST on all the plugin objects to define what it is for that objects payload and then used this value when storing data in the raw tables. that means it can always just do createorUpdate so regardless of any transitory issues I never run into problems that previously fetched data disappear. I also put some explicitly deadlock detection/retry logic into the code around those createOrUpdate calls. I will note that I also had to upgrade grafana because the version you were using had a bug around handling true unit64 values even though the tables were already defined that way. I confess I never completely identified exactly the code path that was causing me problems, I just re-worked things to an architecture that seemed more appropriate to me and avoided the entire need to delete things as part of any refresh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org