Re: [PR] Checking on the gradle cache [kafka]

via GitHub Wed, 15 Jan 2025 14:10:27 -0800


mumrah commented on PR #18449:
URL: https://github.com/apache/kafka/pull/18449#issuecomment-2594038906


   Ok, there is a fundamental problem here. The `pull_request` target is 
building the merge commit of this PR against the base rather than just the PR 
contents. This means, the build will include changes on trunk which have not 
yet been cached.
   
   When trunk is moving quickly, our PRs will have little hope to benefit from 
much caching. 
   
   For example:
   ```
   (trunk) HEAD --- A --- B --- C
   (PR) HEAD --- X --- Y --- Z --- C
   ```
   
   If commit C was the last trunk commit to be built, there will be Gradle 
cache files for that commit. Commits A and B are still building. If the PR was 
simply building X, this would be fine and we would expect cache hits for 
anything not changed by X, Y, Z. However, the `pull_request` event will result 
in a build of something totally different:
   
   ```
   (merge) HEAD --- A --- B --- C
               `X --- Y --- Z --- C
   ```
   
   So when the PR is built, it will be fetching the latest cache (C), but will 
include file changes from A and B in addition to the PR changes. This greatly 
increases cache misses.
   
   ---
   
   I think the merge queue might be a solution to this. If we do a full build 
as part of the merge queue, then no code will land on trunk that has not been 
built, tested, and cached. The risk with this approach is that flaky builds 
will prevent things from getting into trunk.
   
   @ijuma @dajac thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Checking on the gradle cache [kafka]

Reply via email to