I realized that although I had responded on Aug. 24, the response was sent only to Uv, not to the list. The commits on the PR referenced have changed now, since I went ahead and rebased for this PR in order to be able to merge it. But the response should be applicable to other PRs as well. And in fact, the changed commits are themselves a good example of why it would be good to change this workflow so that it doesn't effectively require force-pushes!
My response from Aug. 24: Thank you for your considered response! Having the specific commands is helpful. Taking https://github.com/apache/solr/pull/1351 as a concrete example, I wonder whether there's a reason that you don't use the more straightforward `git merge-base` command for determining the merge base? Running the `git rev-list` command you provided results in 14dd04903a68f69dcfc2104fc70a8ff1c44986b3^ as the merge base, whereas the more straightforward `git merge-base <PR_HEAD> <UPSTREAM_HEAD>` results in 9c43fc138b04c1420b912ec4371eea77335f905b, which is in fact the "merge base", according to the usual git semantics. Using the latter commit with format-patch command you supplied in this case generates a sequence of 5 patches that don't apply cleanly. Using the other commit (the one that crave.io is currently using) with the format-patch command generates a sequence of 455 patches, which don't apply cleanly. But even if they did apply cleanly, with 455 separate commits it's clear (unless I'm missing something) that in this case the goal of minimizing bandwidth and achieving incremental updates is not being achieved. What I'm suggesting is that in step 2.2, rather than using `git format-patch`, using `git diff 9c43fc138b04c1420b912ec4371eea77335f905b..<PR_HEAD>` (where 9c43fc138b04c1420b912ec4371eea77335f905b is the "merge base" according to the usual git semantics) would generate a single patch (albeit without the irrelevant commit metadata) that is guaranteed to apply cleanly if the PR_HEAD has recently merged from (or rebased onto) upstream head. I'm also a bit confused -- in 2.2, you state that you're using `git format-patch` to generate the patches, but subsequently you say "we don't use format-patch"? On Tue, Aug 22, 2023 at 8:54 PM Yuvraaj Kelkar <yuvr...@gmail.com> wrote: > > Sorry for the delay, I had to collect this information from the team. > > As you said: Crave doesn't need to know the commit history, it just needs > enough information from git to be able to reproduce the filesystem state on > to the remote side without having to do a full rsync. > We have to account for the possibility that there's any combination of: > (common merge base whether on branch or not) + (staged commits) + (unstaged > commits) > > Patch process: > > Get the merge base > > Use the command git rev-list --topo-order --reverse <localBranch> --not > --remotes=<remote> -- to get a list of commit ID. > If the list is empty, then the HEAD commit in the branch is the common merge > base. Use it. > If the list is NOT empty, then use the first entry in the list to get one > commit ID before it. This is the merge base. > > With the merge base, creare a series of patches > > Staged changes: git --no-pager diff --src-prefix=a/ --dst-prefix=b/ --binary > --no-ext-diff --ignore-submodules --no-color --cached > Local changes: git --no-pager format-patch --src-prefix=a/ --dst-prefix=b/ > --no-color --stdout <mergeBase> > > Send patches list across to the remote side, and then apply using git apply > --verbose --whitespace=nowarn <patch> > > In the past, we did use git am , but it is all git apply now. > Also, we don't use format-patch because we had to implement "incremental git > patching" for users repeatedly doing incremental builds in the common case, > eg: > User does incremental builds as changes keep getting "collected". At the end > of 3-4 days, there was 200 files modified. Patching 200 files on every > incremental caused a mtime update that causes the build system from > unnecessarily rebuilding far more than actually needed. > A simple git diff allows us to do incremental diff patching. > > But besides all this explanation, we're looking to improve the entire > patching process so that there's no need to repeatedly restart Github Actions. > One of the ways we're trying to get debug information is to get access to the > Action Runner so that we can look at the git tree so we can get a repro > without having to guess every time. > > Thanks, > -Uv > > On Aug 2 2023, at 10:50 am, Michael Gibney <mich...@michaelgibney.net> wrote: > > Hi Uv: > > regarding https://lists.apache.org/thread/l8hgqwkm9vt8yhygoo2blw5j4n5gh121 > > Would it be possible for Crave, instead of using `git format-patch` / > `git am`, to instead generate/apply a simple single diff/patch, via > `git diff <merge-base>..<pr_head> > file.patch` / `git apply > file.patch`? > > This would address a couple of problems, and should be more efficient > on IO as well (which is I gather the motivation for the patch-based > approach in the first place). > > In some cases (e.g., https://github.com/apache/solr/pull/1351), Crave > fails to apply patches even when the source PR branch is completely > up-to-date (has merged in the latest commit from `main`). The > workaround of rebasing is not ideal, in that it necessitates > force-pushes, which are considered an anti-pattern in some contexts, > and can generally be an inconvenience for transparency and > collaborative development. > > But separately relevant: Crave doesn't care about the commit history > -- it's only using patches to optimize IO bandwidth. So generating a > single diff between the merge-base and the PR head should iiuc be > exactly what you want, and would be guaranteed to always apply > cleanly. > > Would you mind sharing how you're deriving the "merge-base" commit, > and the command you're using (I assume, `git format-patch`?) to > generate the patch files? Having just merged `main` into a PR branch, > the merge-base of the PR branch and `main` should be: `main`. > > Thank you for your consideration! > Michael --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org