Karsten Blees <karsten.bl...@gmail.com> writes:

> With core.ignorecase=true, name-hash.c builds a case insensitive index of
> all tracked directories. Currently, the existing cache entry structures are
> added multiple times to the same hashtable (with different name lengths and
> hash codes). However, there's only one dir_next pointer, which gets
> completely messed up in case of hash collisions. In the worst case, this
> causes an endless loop if ce == ce->dir_next (see t7062).
>
> Use a separate hashtable and separate structures for the directory index
> so that each directory entry has its own next pointer. Use reference
> counting to track which directory entry contains files.
>
> There are only slight changes to the name-hash.c API:
> - new free_name_hash() used by read_cache.c::discard_index()
> - remove_name_hash() takes an additional index_state parameter
> - index_name_exists() for a directory (trailing '/') may return a cache
>   entry that has been removed (CE_UNHASHED). This is not a problem as the
>   return value is only used to check if the directory exists (dir.c) or to
>   normalize casing of directory names (read-cache.c).
>
> Getting rid of cache_entry.dir_next reduces memory consumption, especially
> with core.ignorecase=false (which doesn't use that member at all).
>
> With core.ignorecase=true, building the directory index is slightly faster
> as we add / check the parent directory first (instead of going through all
> directory levels for each file in the index). E.g. with WebKit (~200k
> files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
> to 130ms.
>
> Signed-off-by: Karsten Blees <bl...@dcon.de>
> ---

One thing that still puzzles me is what guarantee we have on the
liftime of these ce's that are borrowed by these dir_hash entries.
There are a few places where we call free(ce) around "aliased"
entries (only happens with ignore_case set).  I do not think it is a
new issue (we used to borrow a ce to represent a directory in the
name_hash by using the leading prefix of its name anyway, and this
patch only changes which hash table is used to hold it), and I do
not think it will be an issue for case sensitive systems, so I would
stop being worried about it for now, though ;-)

Thanks, will replace and queue.


> diff --git a/t/t7062-wtstatus-ignorecase.sh b/t/t7062-wtstatus-ignorecase.sh
> new file mode 100755
> index 0000000..73709db
> --- /dev/null
> +++ b/t/t7062-wtstatus-ignorecase.sh
> @@ -0,0 +1,20 @@
> +#!/bin/sh
> +
> +test_description='git-status with core.ignorecase=true'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'status with hash collisions' '
> +     # note: "V/", "V/XQANY/" and "WURZAUP/" produce the same hash code
> +     # in name-hash.c::hash_name
> +     mkdir V &&
> +     mkdir V/XQANY &&
> +     mkdir WURZAUP &&
> +     touch V/XQANY/test &&
> +     git config core.ignorecase true &&
> +     git add . &&
> +     # test is successful if git status completes (no endless loop)
> +     git status
> +'
> +
> +test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to