> From: Lars Schneider <larsxschnei...@gmail.com>
> Consider the case of a file that requires filtering and is present in
> branch A but not in branch B. If A is the current HEAD and we checkout B
> then the following happens:
> 1. ce_compare_data() opens the file
> 2. index_fd() detects that the file requires to run a clean filter and
> calls index_stream_convert_blob()
> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
> 5. convert_to_git_filter_fd() calls apply_filter() which creates a
> new long running filter process (in case it is the first file
> of this kind to be filtered)
> 6. The new filter process inherits all file handles. This is the
> default on Linux/OSX and is explicitly defined in the
> `CreateProcessW` call in `mingw.c` on Windows.
> 7. ce_compare_data() closes the file
> 8. Git unlinks the file as it is not present in B
> The unlink operation does not work on Windows because the filter process
> has still an open handle to the file. Apparently that is no problem on
> Linux/OSX. Probably because "[...] the two file descriptors share open
> file status flags" (see fork(2)).
Wait, a, minute. "that is no problem" may be true as long as "that"
is "unlinking the now-gone file in the filesystem", but the reason
does not have anything to do with the "open-file status flags";
unlike Windows, you _can_ unlink file that has an open file
descriptor on it.
And even on POSIX systems, if you are doing a long-running helper
any open file descriptor in the parent process when the long-running
helper is spawned will become leaked fd. CLOEXEC is a possible
solution (but not necessarily the only or the best one) to the fd
leak in this case.
How much does the code that spawns these long-running helpers know
about the file descriptors that happen to be open? The parent is
very likely to have pack windows open into .pack files and they need
to be closed on the child side after fork(2) starts the child
process but before execve(2) runs the helper, if we want to avoid
file descriptor leaks.
> Fix this problem by opening files in read-cache with the `O_CLOEXEC`
> flag to ensure that the file descriptor does not remain open in a newly
> spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A
> similar fix for temporary file handles was applied on Git for Windows
After a few iterations, the final version of the same commit is now
in our tree as d5cb9cbd ("Git 2.10-rc2", 2016-08-26).