Re: [PATCH v1] teach git to support a virtual (partially populated) work directory
On 11/28/2018 8:31 AM, SZEDER Gábor wrote: On Tue, Nov 27, 2018 at 02:50:57PM -0500, Ben Peart wrote: diff --git a/t/t1092-virtualworkdir.sh b/t/t1092-virtualworkdir.sh new file mode 100755 index 00..0cdfe9b362 --- /dev/null +++ b/t/t1092-virtualworkdir.sh @@ -0,0 +1,393 @@ +#!/bin/sh + +test_description='virtual work directory tests' + +. ./test-lib.sh + +# We need total control of the virtual work directory hook +sane_unset GIT_TEST_VIRTUALWORKDIR + +clean_repo () { + rm .git/index && + git -c core.virtualworkdir=false reset --hard HEAD && + git -c core.virtualworkdir=false clean -fd && + touch untracked.txt && We would usually run '>untracked.txt' instead, sparing the external process. A further nit is that a function called 'clean_repo' creates new untracked files... Thanks, all good suggestions I've incorporated for the next iteration. + touch dir1/untracked.txt && + touch dir2/untracked.txt +} + +test_expect_success 'setup' ' + mkdir -p .git/hooks/ && + cat > .gitignore <<-\EOF && CodingGuidelines suggest no space between redirection operator and filename. + .gitignore + expect* + actual* + EOF + touch file1.txt && + touch file2.txt && + mkdir -p dir1 && + touch dir1/file1.txt && + touch dir1/file2.txt && + mkdir -p dir2 && + touch dir2/file1.txt && + touch dir2/file2.txt && + git add . && + git commit -m "initial" && + git config --local core.virtualworkdir true +' +test_expect_success 'verify files not listed are ignored by git clean -f -x' ' + clean_repo && I find it odd to clean the repo right after setting it up; but then again, 'clean_repo' not only cleans, but also creates new files. Perhaps rename it to 'reset_repo'? Dunno. + write_script .git/hooks/virtual-work-dir <<-\EOF && + printf "untracked.txt\0" + printf "dir1/\0" + EOF + mkdir -p dir3 && + touch dir3/untracked.txt && + git clean -f -x && + test -f file1.txt && Please use the 'test_path_is_file', ... + test -f file2.txt && + test ! -f untracked.txt && ... 'test_path_is_missing', and ... + test -d dir1 && ... 'test_path_is_dir' helpers, respectively, because they print informative error messages on failure. + test -f dir1/file1.txt && + test -f dir1/file2.txt && + test ! -f dir1/untracked.txt && + test -f dir2/file1.txt && + test -f dir2/file2.txt && + test -f dir2/untracked.txt && + test -d dir3 && + test -f dir3/untracked.txt +'
Re: [PATCH v1] teach git to support a virtual (partially populated) work directory
On Tue, Nov 27, 2018 at 02:50:57PM -0500, Ben Peart wrote: > diff --git a/t/t1092-virtualworkdir.sh b/t/t1092-virtualworkdir.sh > new file mode 100755 > index 00..0cdfe9b362 > --- /dev/null > +++ b/t/t1092-virtualworkdir.sh > @@ -0,0 +1,393 @@ > +#!/bin/sh > + > +test_description='virtual work directory tests' > + > +. ./test-lib.sh > + > +# We need total control of the virtual work directory hook > +sane_unset GIT_TEST_VIRTUALWORKDIR > + > +clean_repo () { > + rm .git/index && > + git -c core.virtualworkdir=false reset --hard HEAD && > + git -c core.virtualworkdir=false clean -fd && > + touch untracked.txt && We would usually run '>untracked.txt' instead, sparing the external process. A further nit is that a function called 'clean_repo' creates new untracked files... > + touch dir1/untracked.txt && > + touch dir2/untracked.txt > +} > + > +test_expect_success 'setup' ' > + mkdir -p .git/hooks/ && > + cat > .gitignore <<-\EOF && CodingGuidelines suggest no space between redirection operator and filename. > + .gitignore > + expect* > + actual* > + EOF > + touch file1.txt && > + touch file2.txt && > + mkdir -p dir1 && > + touch dir1/file1.txt && > + touch dir1/file2.txt && > + mkdir -p dir2 && > + touch dir2/file1.txt && > + touch dir2/file2.txt && > + git add . && > + git commit -m "initial" && > + git config --local core.virtualworkdir true > +' > +test_expect_success 'verify files not listed are ignored by git clean -f -x' > ' > + clean_repo && I find it odd to clean the repo right after setting it up; but then again, 'clean_repo' not only cleans, but also creates new files. Perhaps rename it to 'reset_repo'? Dunno. > + write_script .git/hooks/virtual-work-dir <<-\EOF && > + printf "untracked.txt\0" > + printf "dir1/\0" > + EOF > + mkdir -p dir3 && > + touch dir3/untracked.txt && > + git clean -f -x && > + test -f file1.txt && Please use the 'test_path_is_file', ... > + test -f file2.txt && > + test ! -f untracked.txt && ... 'test_path_is_missing', and ... > + test -d dir1 && ... 'test_path_is_dir' helpers, respectively, because they print informative error messages on failure. > + test -f dir1/file1.txt && > + test -f dir1/file2.txt && > + test ! -f dir1/untracked.txt && > + test -f dir2/file1.txt && > + test -f dir2/file2.txt && > + test -f dir2/untracked.txt && > + test -d dir3 && > + test -f dir3/untracked.txt > +'
[PATCH v1] teach git to support a virtual (partially populated) work directory
From: Ben Peart To make git perform well on the very largest repos, we must make git operations O(modified) instead of O(size of repo). This takes advantage of the fact that the number of files a developer has modified (especially in very large repos) is typically a tiny fraction of the overall repo size. We accomplished this by utilizing the existing internal logic for the skip worktree bit and excludes to tell git to ignore all files and folders other than those that have been modified. This logic is driven by an external process that monitors writes to the repo and communicates the list of files and folders with changes to git via the virtual work directory hook in this patch. The external process maintains a list of files and folders that have been modified. When git runs, it requests the list of files and folders that have been modified via the virtual work directory hook. Git then sets/clears the skip-worktree bit on the cache entries and builds a hashmap of the modified files/folders that is used by the excludes logic to avoid scanning the entire repo looking for changes and untracked files. With this system, we have been able to make local git command performance on extremely large repos (millions of files, 1/2 million folders) entirely manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second commit, etc). On index load, clear/set the skip worktree bits based on the virtual work directory data. Use virtual work directory data to update skip-worktree bit in unpack-trees. Use virtual work directory data to exclude files and folders not explicitly requested. Signed-off-by: Ben Peart --- I believe I've incorporated all the feedback from the RFC. Renaming the feature, updating the setting to be a boolean with a hard coded hook name, labeling the feature "experimental," and only calling get_dtype() if the feature is turned on. If there are other suggestions on how to ensure this is a useful and general purpose feature please let me know. Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/65c3ca2e5f Checkout: git fetch https://github.com/benpeart/git virtual-workdir-v1 && git checkout 65c3ca2e5f Documentation/config/core.txt | 9 + Documentation/githooks.txt| 23 ++ Makefile | 1 + cache.h | 1 + config.c | 32 ++- config.h | 1 + dir.c | 26 ++- environment.c | 1 + read-cache.c | 2 + t/t1092-virtualworkdir.sh | 393 ++ unpack-trees.c| 23 +- virtualworkdir.c | 314 +++ virtualworkdir.h | 25 +++ 13 files changed, 843 insertions(+), 8 deletions(-) create mode 100755 t/t1092-virtualworkdir.sh create mode 100644 virtualworkdir.c create mode 100644 virtualworkdir.h diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index d0e6635fe0..49b7699a4e 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -68,6 +68,15 @@ core.fsmonitor:: avoiding unnecessary processing of files that have not changed. See the "fsmonitor-watchman" section of linkgit:githooks[5]. +core.virtualWorkDir:: + Please regard this as an experimental feature. + If set to true, utilize the virtual-work-dir hook to identify all + files and directories that are present in the working directory. + Git will only track and update files listed in the virtual work + directory. Using the virtual work directory will supersede the + sparse-checkout settings which will be ignored. + See the "virtual-work-dir" section of linkgit:githooks[6]. + core.trustctime:: If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 959044347e..9888d504b4 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -485,6 +485,29 @@ The exit status determines whether git will use the data from the hook to limit its search. On error, it will fall back to verifying all files and folders. +virtual-work-dir + + +Please regard this as an experimental feature. + +The "Virtual Work Directory" hook allows populating the working directory +sparsely. The virtual work directory data is typically automatically +generated by an external process. Git will limit what files it checks for +changes as well as which directories are checked for untracked files based +on the path names given. Git will also only update those files listed in the +virtual work directory. + +The hook is invoked when the configuration option core.virtualWorkDir is +set to true. The hook takes one argument, a version (currently 1). + +The hook should output to stdout the list of