klesh opened a new issue, #2117: URL: https://github.com/apache/incubator-devlake/issues/2117
## Description In order to support differential data collection for `changelogs`, we have to filter issues by `updated` date to select only those `updated` > `changelog_updated`, by doing so, we collect only a portion of new data instead of the full collection. This reduces the time of collection drastically. But this approach depends on extra fields on `issues` table, which is not elegant because we have to update this `changelog_updated` whenever issue changelogs are collected successfully, and due to the fact we don't support partial update in `api_extractor`, we have to call `db.Update` directly in `changelog_collector`, which is a side-effect operation makes it less portable. Issue #1711 try to remove those fields, and calculate the `changelog_updated` dynamic based on `jira_changelogs` table, which lately proven by @mindlesscloud that is unreliable. Since collection/extraction might fail, and lead to some missing-data situations. We agreed that we should keep the original approach until otherwise. Same problem apply to `remotelinks`/`worklogs` ## Describe the solution you'd like 1. Close #1711 2. `api_collector` / `extractor` and other helpers to support partial update, by introducing a special struct `PartialUpdate` 3. update jira worklog/remotelink/changelog to use `PartialUpdate` instead of calling `db` directly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
