klesh opened a new issue, #2117:
URL: https://github.com/apache/incubator-devlake/issues/2117

   ## Description
   
   In order to support differential data collection for `changelogs`, we have 
to filter issues by `updated` date to select only those `updated` > 
`changelog_updated`, by doing so, we collect only a portion of new data instead 
of the full collection. This reduces the time of collection drastically.  But 
this approach depends on extra fields on `issues` table, which is not elegant 
because we have to update this `changelog_updated` whenever issue changelogs 
are collected successfully, and due to the fact we don't support partial update 
in `api_extractor`, we have to call `db.Update` directly in 
`changelog_collector`, which is a side-effect operation makes it less portable.
   
   Issue #1711  try to remove those fields, and calculate the 
`changelog_updated` dynamic based on `jira_changelogs` table, which lately 
proven by @mindlesscloud that is unreliable. Since collection/extraction might 
fail, and lead to some missing-data situations. We agreed that we should keep 
the original approach until otherwise.
   
   Same problem apply to `remotelinks`/`worklogs`
   
   
   ## Describe the solution you'd like
   1. Close #1711 
   2. `api_collector` / `extractor` and other helpers to support partial 
update, by introducing a special struct `PartialUpdate`
   3. update jira worklog/remotelink/changelog to use `PartialUpdate` instead 
of calling `db` directly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to