klesh commented on issue #6763: URL: https://github.com/apache/incubator-devlake/issues/6763#issuecomment-1909636977
Hi, Sorry for replying so late. # TLDR: upgrade to v0.19 or later # Long version: All tables start with `_raw` are designed for the Incremental Collection feature. It contains all historical records of an Entity from the API responses, in this case, the gitlab `job`. Historical means that for a specific `job`, we might collect it multiple times and the Latest collected record (with the highest `id` in the `_raw_github_api_jobs`) would end up in the `_tool_github_jobs` table. For APIs that support filtering records by `updated at` timestamp, it works fine and won't produce too many duplicate records if any. However, the Gitlab jobs API doesn't support such a filter which leads to the problem you all are facing. So, I took another route to avoid such a problem https://github.com/apache/incubator-devlake/pull/5889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
