klesh commented on issue #6763:
URL: 
https://github.com/apache/incubator-devlake/issues/6763#issuecomment-1909636977

   Hi, Sorry for replying so late.
   
   # TLDR: upgrade to v0.19 or later
   
   
   # Long version:
   All tables start with `_raw` are designed for the Incremental Collection 
feature. It contains all historical records of an Entity from the API 
responses, in this case, the gitlab `job`. Historical means that for a specific 
`job`, we might collect it multiple times and the Latest collected record (with 
the highest `id` in the `_raw_github_api_jobs`) would end up in the 
`_tool_github_jobs` table.
   For APIs that support filtering records by `updated at` timestamp, it works 
fine and won't produce too many duplicate records if any.
   However, the Gitlab jobs API doesn't support such a filter which leads to 
the problem you all are facing.
   So, I took another route to avoid such a problem 
https://github.com/apache/incubator-devlake/pull/5889 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to