oliviertassinari opened a new issue, #8543:
URL: https://github.com/apache/incubator-devlake/issues/8543

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar feature requirement.
   
   
   ### Use case
   
   Get up-to-date GitHub data. We want to keep track of SLA on fast we can 
close issues. We need to sync every hour.
   
   ### Description
   
   It seems that the current way DevLake syncs with GitHub is by iterating on 
all the objects and copying them into a MySQL database. In our case, it takes 6 
hours to sync 2 years' worth of history. So we sync every 8 hours. This is 
pretty slow.
   
   Could we also rely on the GitHub event API to collect events in real time 
and update the data models accordingly? This way we could have:
   
   1. Real-time data
   2. A daily sync that collects all the data to fix any potential skew from 1.
   
   For example, this is how those tools work:
   
   - https://www.gharchive.org/ is real-time
   - https://ossinsight.io/blog/why-we-choose-tidb-to-support-ossinsight is 
real-time
   - https://docs.airbyte.com/integrations/sources/github#notes has 4 real-time 
incremental streams (comments, commits, issues, and review comments)
   
   ### Related issues
   
   https://github.com/apache/incubator-devlake/pull/1253
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to