Re: [PATCH v1] teach git to support a virtual (partially populated) work directory

2018-11-29 Thread Ben Peart



On 11/28/2018 8:31 AM, SZEDER Gábor wrote:

On Tue, Nov 27, 2018 at 02:50:57PM -0500, Ben Peart wrote:


diff --git a/t/t1092-virtualworkdir.sh b/t/t1092-virtualworkdir.sh
new file mode 100755
index 00..0cdfe9b362
--- /dev/null
+++ b/t/t1092-virtualworkdir.sh
@@ -0,0 +1,393 @@
+#!/bin/sh
+
+test_description='virtual work directory tests'
+
+. ./test-lib.sh
+
+# We need total control of the virtual work directory hook
+sane_unset GIT_TEST_VIRTUALWORKDIR
+
+clean_repo () {
+   rm .git/index &&
+   git -c core.virtualworkdir=false reset --hard HEAD &&
+   git -c core.virtualworkdir=false clean -fd &&
+   touch untracked.txt &&

We would usually run '>untracked.txt' instead, sparing the external
process.

A further nit is that a function called 'clean_repo' creates new
untracked files...


Thanks, all good suggestions I've incorporated for the next iteration.




+   touch dir1/untracked.txt &&
+   touch dir2/untracked.txt
+}
+
+test_expect_success 'setup' '
+   mkdir -p .git/hooks/ &&
+   cat > .gitignore <<-\EOF &&

CodingGuidelines suggest no space between redirection operator and
filename.


+   .gitignore
+   expect*
+   actual*
+   EOF
+   touch file1.txt &&
+   touch file2.txt &&
+   mkdir -p dir1 &&
+   touch dir1/file1.txt &&
+   touch dir1/file2.txt &&
+   mkdir -p dir2 &&
+   touch dir2/file1.txt &&
+   touch dir2/file2.txt &&
+   git add . &&
+   git commit -m "initial" &&
+   git config --local core.virtualworkdir true
+'



+test_expect_success 'verify files not listed are ignored by git clean -f -x' '
+   clean_repo &&

I find it odd to clean the repo right after setting it up; but then
again, 'clean_repo' not only cleans, but also creates new files.
Perhaps rename it to 'reset_repo'?  Dunno.


+   write_script .git/hooks/virtual-work-dir <<-\EOF &&
+   printf "untracked.txt\0"
+   printf "dir1/\0"
+   EOF
+   mkdir -p dir3 &&
+   touch dir3/untracked.txt &&
+   git clean -f -x &&
+   test -f file1.txt &&

Please use the 'test_path_is_file', ...


+   test -f file2.txt &&
+   test ! -f untracked.txt &&

... 'test_path_is_missing', and ...


+   test -d dir1 &&

... 'test_path_is_dir' helpers, respectively, because they print
informative error messages on failure.


+   test -f dir1/file1.txt &&
+   test -f dir1/file2.txt &&
+   test ! -f dir1/untracked.txt &&
+   test -f dir2/file1.txt &&
+   test -f dir2/file2.txt &&
+   test -f dir2/untracked.txt &&
+   test -d dir3 &&
+   test -f dir3/untracked.txt
+'


Re: [PATCH v1] teach git to support a virtual (partially populated) work directory

2018-11-28 Thread SZEDER Gábor
On Tue, Nov 27, 2018 at 02:50:57PM -0500, Ben Peart wrote:

> diff --git a/t/t1092-virtualworkdir.sh b/t/t1092-virtualworkdir.sh
> new file mode 100755
> index 00..0cdfe9b362
> --- /dev/null
> +++ b/t/t1092-virtualworkdir.sh
> @@ -0,0 +1,393 @@
> +#!/bin/sh
> +
> +test_description='virtual work directory tests'
> +
> +. ./test-lib.sh
> +
> +# We need total control of the virtual work directory hook
> +sane_unset GIT_TEST_VIRTUALWORKDIR
> +
> +clean_repo () {
> + rm .git/index &&
> + git -c core.virtualworkdir=false reset --hard HEAD &&
> + git -c core.virtualworkdir=false clean -fd &&
> + touch untracked.txt &&

We would usually run '>untracked.txt' instead, sparing the external
process.

A further nit is that a function called 'clean_repo' creates new
untracked files...

> + touch dir1/untracked.txt &&
> + touch dir2/untracked.txt
> +}
> +
> +test_expect_success 'setup' '
> + mkdir -p .git/hooks/ &&
> + cat > .gitignore <<-\EOF &&

CodingGuidelines suggest no space between redirection operator and
filename.

> + .gitignore
> + expect*
> + actual*
> + EOF
> + touch file1.txt &&
> + touch file2.txt &&
> + mkdir -p dir1 &&
> + touch dir1/file1.txt &&
> + touch dir1/file2.txt &&
> + mkdir -p dir2 &&
> + touch dir2/file1.txt &&
> + touch dir2/file2.txt &&
> + git add . &&
> + git commit -m "initial" &&
> + git config --local core.virtualworkdir true
> +'


> +test_expect_success 'verify files not listed are ignored by git clean -f -x' 
> '
> + clean_repo &&

I find it odd to clean the repo right after setting it up; but then
again, 'clean_repo' not only cleans, but also creates new files.
Perhaps rename it to 'reset_repo'?  Dunno.

> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "untracked.txt\0"
> + printf "dir1/\0"
> + EOF
> + mkdir -p dir3 &&
> + touch dir3/untracked.txt &&
> + git clean -f -x &&
> + test -f file1.txt &&

Please use the 'test_path_is_file', ...

> + test -f file2.txt &&
> + test ! -f untracked.txt &&

... 'test_path_is_missing', and ...

> + test -d dir1 &&

... 'test_path_is_dir' helpers, respectively, because they print
informative error messages on failure.

> + test -f dir1/file1.txt &&
> + test -f dir1/file2.txt &&
> + test ! -f dir1/untracked.txt &&
> + test -f dir2/file1.txt &&
> + test -f dir2/file2.txt &&
> + test -f dir2/untracked.txt &&
> + test -d dir3 &&
> + test -f dir3/untracked.txt
> +'


[PATCH v1] teach git to support a virtual (partially populated) work directory

2018-11-27 Thread Ben Peart
From: Ben Peart 

To make git perform well on the very largest repos, we must make git
operations O(modified) instead of O(size of repo).  This takes advantage of
the fact that the number of files a developer has modified (especially
in very large repos) is typically a tiny fraction of the overall repo size.

We accomplished this by utilizing the existing internal logic for the skip
worktree bit and excludes to tell git to ignore all files and folders other
than those that have been modified.  This logic is driven by an external
process that monitors writes to the repo and communicates the list of files
and folders with changes to git via the virtual work directory hook in this
patch.

The external process maintains a list of files and folders that have been
modified.  When git runs, it requests the list of files and folders that
have been modified via the virtual work directory hook.  Git then sets/clears
the skip-worktree bit on the cache entries and builds a hashmap of the
modified files/folders that is used by the excludes logic to avoid scanning
the entire repo looking for changes and untracked files.

With this system, we have been able to make local git command performance on
extremely large repos (millions of files, 1/2 million folders) entirely
manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
commit, etc).

On index load, clear/set the skip worktree bits based on the virtual
work directory data. Use virtual work directory data to update skip-worktree
bit in unpack-trees. Use virtual work directory data to exclude files and
folders not explicitly requested.

Signed-off-by: Ben Peart 
---

I believe I've incorporated all the feedback from the RFC.  Renaming the
feature, updating the setting to be a boolean with a hard coded hook name,
labeling the feature "experimental," and only calling get_dtype() if the
feature is turned on.

If there are other suggestions on how to ensure this is a useful and general
purpose feature please let me know.

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/65c3ca2e5f
Checkout: git fetch https://github.com/benpeart/git virtual-workdir-v1 && 
git checkout 65c3ca2e5f

 Documentation/config/core.txt |   9 +
 Documentation/githooks.txt|  23 ++
 Makefile  |   1 +
 cache.h   |   1 +
 config.c  |  32 ++-
 config.h  |   1 +
 dir.c |  26 ++-
 environment.c |   1 +
 read-cache.c  |   2 +
 t/t1092-virtualworkdir.sh | 393 ++
 unpack-trees.c|  23 +-
 virtualworkdir.c  | 314 +++
 virtualworkdir.h  |  25 +++
 13 files changed, 843 insertions(+), 8 deletions(-)
 create mode 100755 t/t1092-virtualworkdir.sh
 create mode 100644 virtualworkdir.c
 create mode 100644 virtualworkdir.h

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index d0e6635fe0..49b7699a4e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -68,6 +68,15 @@ core.fsmonitor::
avoiding unnecessary processing of files that have not changed.
See the "fsmonitor-watchman" section of linkgit:githooks[5].
 
+core.virtualWorkDir::
+   Please regard this as an experimental feature.
+   If set to true, utilize the virtual-work-dir hook to identify all
+   files and directories that are present in the working directory.
+   Git will only track and update files listed in the virtual work
+   directory.  Using the virtual work directory will supersede the
+   sparse-checkout settings which will be ignored.
+   See the "virtual-work-dir" section of linkgit:githooks[6].
+
 core.trustctime::
If false, the ctime differences between the index and the
working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 959044347e..9888d504b4 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -485,6 +485,29 @@ The exit status determines whether git will use the data 
from the
 hook to limit its search.  On error, it will fall back to verifying
 all files and folders.
 
+virtual-work-dir
+
+
+Please regard this as an experimental feature.
+
+The "Virtual Work Directory" hook allows populating the working directory
+sparsely. The virtual work directory data is typically automatically
+generated by an external process.  Git will limit what files it checks for
+changes as well as which directories are checked for untracked files based
+on the path names given. Git will also only update those files listed in the
+virtual work directory.
+
+The hook is invoked when the configuration option core.virtualWorkDir is
+set to true.  The hook takes one argument, a version (currently 1).
+
+The hook should output to stdout the list of