keon94 opened a new issue, #3123:
URL: https://github.com/apache/incubator-devlake/issues/3123

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Description
   
   Do a POC on adding support for streamed, incremental data collections using 
the [Singer spec](https://www.singer.io/). The existing incremental update 
support does not protect against a half-way failure or error in the collection 
process, forcing the user to have to start all over again.
   
   ### Use case
   
   The goal would be to add resilience to the collection phase of devlake 
plugins, and to do so using a standard. Currently if there's an interruption in 
the middle of collection we don't have a means of recovery/resuming from that 
interrupted point. This becomes a problem when pulling data from large data 
sources. We can alleviate this problem by introducing incremental states as the 
collection takes place. If, say, we have to make 10 API calls to perform 10 DB 
writes for a given collector, and a failure happens on the 5th invocation, our 
next attempt should try to pick the collection back up from that step.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to