On 08/09, Junio C Hamano wrote:
> Thomas Gummerer <t.gumme...@gmail.com> writes:
> > On 08/08, Junio C Hamano wrote:
> >> ...
> >> Let me ask the same question in a more direct way.  Which part of
> >> this test break with your series?
> >> 
> >>         test_expect_success 'git add --refresh with pathspec' '
> >>                 git reset --hard &&
> >>                 echo >foo && echo >bar && echo >baz &&
> >>                 git add foo bar baz && H=$(git rev-parse :foo) && git rm 
> >> -f foo &&
> >>                 echo "100644 $H 3  foo" | git update-index --index-info &&
> >>    # "sleep 1 &&" in the update here ...
> >>                 test-chmtime -60 bar baz &&
> >>                 >expect &&
> >>                 git add --refresh bar >actual &&
> >>                 test_cmp expect actual &&
> >> 
> >>                 git diff-files --name-only >actual &&
> >>                 ! grep bar actual&&
> >>                 grep baz actual
> >>         '
> >> 
> >> We prepare an index with bunch of paths, we make "foo" unmerged, we
> >> smudge bar and baz stat-dirty, so that "diff-files" would report
> >> them, even though their contents match what is recorded in the
> >> index.
> >
> > After getting confused a bit myself, I now think here is the problem.
> > The v5 code smudges baz when doing git add --refresh bar.  Therefore
> > baz isn't considered stat-dirty by the code, but a racily smudged entry
> > and therefore its content gets checked, thus not showing up in
> > git diff-files.
> So in short, the breakage is the last one among the three choices I
> gave you in my message you are responding to.  The user asked to
> refresh "bar" so that later diff-files won't report a false change
> on it, but "baz" effectively ends up getting refreshed at the same
> time and a false change is not reported.


> That "breakage" is, from the correctness point of view, not a
> breakage.  As the primary purpose of "refreshing" is to support
> commands that want to rely on a quick ce_modified() call to tell
> files that are modified in the working tree since it was last added
> to the index---you refresh once, and then you call such commands
> many times without having to worry about having to compare the
> contents between the indexed objects and the working tree files.
> But from the performance point of view, which is the whole point of
> "refresh", the behaviour of the new code is dubious.  If the user is
> working in a large working tree (which automatically means large
> index, the primary reason we are doing this v5 experiment), the user
> often is working in a deep and narrow subdirectory of it, and a path
> limited refresh (the test names a specific file "bar", but imagine
> it were "." to limit it to the directory the user is working in) may
> be a cheap way not to bother even checking outside the area the user
> currently is working in.

That's true, but once we have the partial reader/writer, we do not
bother checking outside the area the user is currently working in

Also and probably more importantly, this will only affect a *very*
small number of entries, because timestamps outside of the directory
in which the user is working in are rarely updated recently and
thus racy.

> Also, smudging more entries than necessary
> to be checked by ce_modified_check_fs() later at runtime may mean
> that it defeats the "refresh once and then compare cheaply many
> times" pattern that is employed by existing scripts.

The new racy code also calls ce_modified_check_fs() only if the size
and the stat_crc are not changed.  It's true that ce_modified_check_fs()
can be called multiple times, when match_stat_crc() is called, but that
could be solved by adding an additional flag CE_IS_MODIFIED, which
indicates that ce_modified_check_fs() was already run.

> Is the root cause really where the "racily-clean so smudge to tell
> later runtime to check contents" bit goes?  I am hoping that the
> issue is not coming from the difference between the current code and
> your code when they decide to "smudge", what entries they decide to
> "smudge" and based on what condition.

I just gave it a try using a CE_SMUDGED flag, instead of the mtime
as smudge marker, which which this test works without any problems.
It doesn't work the other way round, the test as the test doesn't
break when using mtime as smudge marker in v2, because we do the
ce_modified_check_fs() test earlier.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to