keon94 opened a new issue, #3123: URL: https://github.com/apache/incubator-devlake/issues/3123
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar feature requirement. ### Description Do a POC on adding support for streamed, incremental data collections using the [Singer spec](https://www.singer.io/). The existing incremental update support does not protect against a half-way failure or error in the collection process, forcing the user to have to start all over again. ### Use case The goal would be to add resilience to the collection phase of devlake plugins, and to do so using a standard. Currently if there's an interruption in the middle of collection we don't have a means of recovery/resuming from that interrupted point. This becomes a problem when pulling data from large data sources. We can alleviate this problem by introducing incremental states as the collection takes place. If, say, we have to make 10 API calls to perform 10 DB writes for a given collector, and a failure happens on the 5th invocation, our next attempt should try to pick the collection back up from that step. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
