Startrekzky opened a new issue, #6853: URL: https://github.com/apache/incubator-devlake/issues/6853
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar feature requirement. ### Use case As a user who has large repos with more than 100,000 commits, I'd like to have incremental sync when collecting Git data. Currently, every pipeline takes more than 5 hours to collect data. That makes it difficult to utilize DevLake in my org. ### Description Support incremental sync in the GitExtractor plugin. Specifically, | Entity | Sync Mode | Cursor Field | | ------| ------------| ------------ | | repos | Full refresh. There's no need to be incremental | N/A | | refs | Full refresh. There's no create/update date of the ref as far as I know | N/A | | commits | Incremental | committed_date. It seems to make more sense than commits.authored_date | | commit_files | Incremental | committed_date. Update the commit_files of the new commits. | ### Related issues https://github.com/apache/incubator-devlake/issues/6138 ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
