Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-27 Thread Junio C Hamano
Duy Nguyen  writes:

> Sounds good, except..
>
>> When write_index() writes out an index that wasn't initialized from
>> the filesystem, a new UNTR gets added only when the configuration is
>> set to 'true' and there is no in-core UNTR already.
>
> Do we really need this?

We don't; you're right.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-24 Thread Christian Couder
On Thu, Dec 24, 2015 at 2:56 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Duy Nguyen  writes:
>>
>>> In that case we can just check config once in read_index_from and
>>> destroy UNTR extension. Or the middle ground, we check config once in
>>> that place, make a note in struct index_state, and make invalidate_*
>>> check that note instead of config file. The middle ground has an
>>> advantage over destroying UNTR: (probably) many operations will touch
>>> index but do not add or remove entries.
>>
>> Or we can even teach read_index_from() to skip reading the extension
>> without even parsing when the configuration tells it that the
>> feature is force-disabled.  It can also add an empty one when the
>> configuration tells it that the feature is force-enabled and there
>> is no UNTR extension yet.
>
> Thinking about this a bit more, I am starting to feel that the
> config that can be used to optionally override the presence of
> in-index UNTR extension does not have to be too bad a thing,
> provided if it is done with a few tweaks to the design I read in
> Christian & Ævar's messages.

Great!

> One tweak is to address the following from Ævar's message:
>
>>> Once this series is applied and git is shipped with it existing
>>> users that have set "git update-index --untracked-cache" will have
>>> their UC feature disabled. This is because we fall back to "oh no
>>> config here? Must have been disabled, rm it from the index" clause
>>> which keeps our UC from going stale in the face of config
>>> flip-flopping.
>
> The above would happen only if the configuration is a boolean that
> defaults to false.  I'd think we can have this a tristate instead.
> That is, config.untrackedCache can be explicitly set to 'true',
> 'false', or 'keep'.  And make a missing config.untrackedCache
> default to 'keep'.

Ok. The first RFC patch series about this had a tristate and I
switched to a boolean as it seemed that people prefered a boolean, but
you are right that a tristate is more backward compatible.

> When read_index_from() reads an existing index:
>
> When the configuration is set to 'true':
> an existing UNTR is kept, otherwise a new UNTR gets added.
> When the configuration is set to 'false:
> an existing UNTR is dropped.
> When the configuration is set to 'keep':
> an existing UNTR is kept, otherwise nothing happens.
>
> When write_index() writes out an index that wasn't initialized from
> the filesystem, a new UNTR gets added only when the configuration is
> set to 'true' and there is no in-core UNTR already.

My current patch series does these changes in
wt_status_collect_untracked() because currently the UNTR is mostly
used only in git status, so it feels safer to me to not affect other
code paths.

> That way, those who use /etc/gitconfig to force the feature over
> their users would be able to set it to 'true', those who have been
> using the feature in some but not all of their repositories won't
> see any different behaviour until they muck with the configuration
> (and if they are paranoid and want to opt out of their administrator's
> setting, they can set it to 'keep' in their $HOME/.gitconfig to make
> sure their repositories are not affected).
>
> Orthogonal to the "config overrides the existing users' practice"
> issue, I still think that [PATCH v2 10/10] goes too far to remove
> the safety based on the working tree location.  Checking kernel
> version and other thing may be too much, but the check based on the
> working tree location is a cheap way to make sure that those who do
> unusual things (namely, use $GIT_DIR/$GIT_WORK_TREE or their
> equivalents to tell Git that the working tree for this invocation is
> at a place different from what UNTR data was prepared for) are not
> harmed by incorrectly reusing the cached data for an unrelated
> location.  So another tweak I'd feel better to see is to resurrect
> that safety.

This has been changed in v3, see patch 09/11, and I think it should
now work as you suggest.

> I wouldn't have as much issue with such a scheme as I had with the
> earlier description of the design in the Christian's series.

Great! I am preparing a v4 that I hope to send in a few days.
By the way the v3 does not pass its own tests due to a bug introduced
in last minute changes.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-24 Thread Duy Nguyen
On Thu, Dec 24, 2015 at 8:56 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Duy Nguyen  writes:
>>
>>> In that case we can just check config once in read_index_from and
>>> destroy UNTR extension. Or the middle ground, we check config once in
>>> that place, make a note in struct index_state, and make invalidate_*
>>> check that note instead of config file. The middle ground has an
>>> advantage over destroying UNTR: (probably) many operations will touch
>>> index but do not add or remove entries.
>>
>> Or we can even teach read_index_from() to skip reading the extension
>> without even parsing when the configuration tells it that the
>> feature is force-disabled.  It can also add an empty one when the
>> configuration tells it that the feature is force-enabled and there
>> is no UNTR extension yet.
>
> Thinking about this a bit more, I am starting to feel that the
> config that can be used to optionally override the presence of
> in-index UNTR extension does not have to be too bad a thing,
> provided if it is done with a few tweaks to the design I read in
> Christian & Ævar's messages.
>
> One tweak is to address the following from Ævar's message:
>
>>> Once this series is applied and git is shipped with it existing
>>> users that have set "git update-index --untracked-cache" will have
>>> their UC feature disabled. This is because we fall back to "oh no
>>> config here? Must have been disabled, rm it from the index" clause
>>> which keeps our UC from going stale in the face of config
>>> flip-flopping.
>
> The above would happen only if the configuration is a boolean that
> defaults to false.  I'd think we can have this a tristate instead.
> That is, config.untrackedCache can be explicitly set to 'true',
> 'false', or 'keep'.  And make a missing config.untrackedCache
> default to 'keep'.
>
> When read_index_from() reads an existing index:
>
> When the configuration is set to 'true':
> an existing UNTR is kept, otherwise a new UNTR gets added.
> When the configuration is set to 'false:
> an existing UNTR is dropped.
> When the configuration is set to 'keep':
> an existing UNTR is kept, otherwise nothing happens.

Sounds good, except..

> When write_index() writes out an index that wasn't initialized from
> the filesystem, a new UNTR gets added only when the configuration is
> set to 'true' and there is no in-core UNTR already.

Do we really need this? If the index is created fresh from the core,
first write will not have UNTR. Next time it's loaded up, UNTR is
added if config is true, based on the above rules. The new UNTR
extension (should?) marks the index as dirty as writing out is
required.

> Orthogonal to the "config overrides the existing users' practice"
> issue,

Yeah it looks orthogonal to me too. There should be a separate config
file like /etc/gitconfig.lockdown, that will override user config vars
(but we still can't undefine user config vars yet, known missing
feature).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-23 Thread Junio C Hamano
Junio C Hamano  writes:

> Duy Nguyen  writes:
>
>> In that case we can just check config once in read_index_from and
>> destroy UNTR extension. Or the middle ground, we check config once in
>> that place, make a note in struct index_state, and make invalidate_*
>> check that note instead of config file. The middle ground has an
>> advantage over destroying UNTR: (probably) many operations will touch
>> index but do not add or remove entries.
>
> Or we can even teach read_index_from() to skip reading the extension
> without even parsing when the configuration tells it that the
> feature is force-disabled.  It can also add an empty one when the
> configuration tells it that the feature is force-enabled and there
> is no UNTR extension yet.

Thinking about this a bit more, I am starting to feel that the
config that can be used to optionally override the presence of
in-index UNTR extension does not have to be too bad a thing,
provided if it is done with a few tweaks to the design I read in
Christian & Ævar's messages.

One tweak is to address the following from Ævar's message:

>> Once this series is applied and git is shipped with it existing
>> users that have set "git update-index --untracked-cache" will have
>> their UC feature disabled. This is because we fall back to "oh no
>> config here? Must have been disabled, rm it from the index" clause
>> which keeps our UC from going stale in the face of config
>> flip-flopping.

The above would happen only if the configuration is a boolean that
defaults to false.  I'd think we can have this a tristate instead.
That is, config.untrackedCache can be explicitly set to 'true',
'false', or 'keep'.  And make a missing config.untrackedCache
default to 'keep'.

When read_index_from() reads an existing index:

When the configuration is set to 'true':
an existing UNTR is kept, otherwise a new UNTR gets added.
When the configuration is set to 'false:
an existing UNTR is dropped.
When the configuration is set to 'keep':
an existing UNTR is kept, otherwise nothing happens.

When write_index() writes out an index that wasn't initialized from
the filesystem, a new UNTR gets added only when the configuration is
set to 'true' and there is no in-core UNTR already.

That way, those who use /etc/gitconfig to force the feature over
their users would be able to set it to 'true', those who have been
using the feature in some but not all of their repositories won't
see any different behaviour until they muck with the configuration
(and if they are paranoid and want to opt out of their administrator's
setting, they can set it to 'keep' in their $HOME/.gitconfig to make
sure their repositories are not affected).

Orthogonal to the "config overrides the existing users' practice"
issue, I still think that [PATCH v2 10/10] goes too far to remove
the safety based on the working tree location.  Checking kernel
version and other thing may be too much, but the check based on the
working tree location is a cheap way to make sure that those who do
unusual things (namely, use $GIT_DIR/$GIT_WORK_TREE or their
equivalents to tell Git that the working tree for this invocation is
at a place different from what UNTR data was prepared for) are not
harmed by incorrectly reusing the cached data for an unrelated
location.  So another tweak I'd feel better to see is to resurrect
that safety.

I wouldn't have as much issue with such a scheme as I had with the
earlier description of the design in the Christian's series.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-22 Thread Duy Nguyen
On Tue, Dec 22, 2015 at 1:30 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Thu, Dec 17, 2015 at 2:44 PM, Jeff King  wrote:
>>> I think we may actually be thinking of the same thing. Naively, I would
>>> expect:
>>>
>>> ..
>>>   - if there is cache data in the index but that config flag is not set,
>>> presumably we would not update it (we could even explicitly drop it,
>>> but my understanding is that is not necessary for correctness, but
>>> only as a possible optimization).
>>
>> No, if somebody adds or removes something from the index, we either
>> update or drop it, or it's stale. There's the invalidate_untracked()
>> or something in dir.c that we can hook in, check config var and do
>> that. And because config is cached recently, it should be a cheap
>> operation.
>
> Checking the config may be cheap, but it bothers me a lot that we
> have to call that "invalidate" thing every time we go into the
> codepath to deal with the index, from code cleanliness point of
> view.

In that case we can just check config once in read_index_from and
destroy UNTR extension. Or the middle ground, we check config once in
that place, make a note in struct index_state, and make invalidate_*
check that note instead of config file. The middle ground has an
advantage over destroying UNTR: (probably) many operations will touch
index but do not add or remove entries. Though I may be wrong,
replacing an entry may be implemented as delete then add, I haven't
checked the code.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-22 Thread Junio C Hamano
Duy Nguyen  writes:

> In that case we can just check config once in read_index_from and
> destroy UNTR extension. Or the middle ground, we check config once in
> that place, make a note in struct index_state, and make invalidate_*
> check that note instead of config file. The middle ground has an
> advantage over destroying UNTR: (probably) many operations will touch
> index but do not add or remove entries.

Or we can even teach read_index_from() to skip reading the extension
without even parsing when the configuration tells it that the
feature is force-disabled.  It can also add an empty one when the
configuration tells it that the feature is force-enabled and there
is no UNTR extension yet.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-21 Thread Junio C Hamano
Duy Nguyen  writes:

> On Thu, Dec 17, 2015 at 2:44 PM, Jeff King  wrote:
>> I think we may actually be thinking of the same thing. Naively, I would
>> expect:
>>
>> ..
>>   - if there is cache data in the index but that config flag is not set,
>> presumably we would not update it (we could even explicitly drop it,
>> but my understanding is that is not necessary for correctness, but
>> only as a possible optimization).
>
> No, if somebody adds or removes something from the index, we either
> update or drop it, or it's stale. There's the invalidate_untracked()
> or something in dir.c that we can hook in, check config var and do
> that. And because config is cached recently, it should be a cheap
> operation.

Checking the config may be cheap, but it bothers me a lot that we
have to call that "invalidate" thing every time we go into the
codepath to deal with the index, from code cleanliness point of
view.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-18 Thread Christian Couder
On Thu, Dec 17, 2015 at 1:36 PM, Duy Nguyen  wrote:
> On Wed, Dec 16, 2015 at 4:53 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano  wrote:
>>> Ævar Arnfjörð Bjarmason  writes:
>>> I still have a problem with the approach from "design cleanliness"
>>> point of view[...]
>>>
>>> In any case I think we already have agreed to disagree on this
>>> point, so there is no use discussing it any longer from my side.  I
>>> am not closing the door to this series, but I am not convinced,
>>> either.  At least not yet.
>>
>> In general the fantastic thing about the git configuration facility is
>> that it provides both systems administrators and normal users with
>> what they want. It's possible to configure things system-wide and
>> override those on a user or repository basis.
>>
>> Of course hindsight is 20/20, but I think that given what's been
>> covered in this thread it's been established that it's categorically
>> better that if we introduce features like these that they be
>> configured through the normal configuration facility rather than the
>> configuration being sticky to the index.
>
> A minor note for implementers. We need to check that config is loaded
> first. read-cache.c, being part of the core, does not bother itself
> with config loading. And I think so far it has not used any config
> vars. If a command forgets (*) to load the config, the cache may be
> deleted (if we do it the safe way).
>
> (*) is there any command deliberately avoid loading config? git-clone
> and git-init are special, but for this particular case it's probably
> fine.

Thanks for this note.

Looking at the current patch, the global variable in which the value
of the core.untrackedCache config var is stored is
"use_untracked_cache".
It is used in the following places:

- wt_status_collect_untracked() in wt-status.c which is called only by
"git status" and "git commit" after the config has been loaded.

- cmd_update_index() in builtin/update-index.c which loads the config
before using it

- validate_untracked_cache() in dir.c where it is used in:

   if (use_untracked_cache != 1 && !ident_in_untracked(dir->untracked)) {
warning(_("Untracked cache is disabled on this system."));
return NULL;
}

but this "if" and its contents are removed by patch 10/10 in v2.

So at the end of this patch series, there is no risk of
use_untracked_cache not being properly setup.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-17 Thread Duy Nguyen
On Wed, Dec 16, 2015 at 4:53 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason  writes:
>> I still have a problem with the approach from "design cleanliness"
>> point of view[...]
>>
>> In any case I think we already have agreed to disagree on this
>> point, so there is no use discussing it any longer from my side.  I
>> am not closing the door to this series, but I am not convinced,
>> either.  At least not yet.
>
> In general the fantastic thing about the git configuration facility is
> that it provides both systems administrators and normal users with
> what they want. It's possible to configure things system-wide and
> override those on a user or repository basis.
>
> Of course hindsight is 20/20, but I think that given what's been
> covered in this thread it's been established that it's categorically
> better that if we introduce features like these that they be
> configured through the normal configuration facility rather than the
> configuration being sticky to the index.

A minor note for implementers. We need to check that config is loaded
first. read-cache.c, being part of the core, does not bother itself
with config loading. And I think so far it has not used any config
vars. If a command forgets (*) to load the config, the cache may be
deleted (if we do it the safe way).

(*) is there any command deliberately avoid loading config? git-clone
and git-init are special, but for this particular case it's probably
fine.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-17 Thread Duy Nguyen
On Thu, Dec 17, 2015 at 2:44 PM, Jeff King  wrote:
> I think we may actually be thinking of the same thing. Naively, I would
> expect:
>
> ..
>   - if there is cache data in the index but that config flag is not set,
> presumably we would not update it (we could even explicitly drop it,
> but my understanding is that is not necessary for correctness, but
> only as a possible optimization).

No, if somebody adds or removes something from the index, we either
update or drop it, or it's stale. There's the invalidate_untracked()
or something in dir.c that we can hook in, check config var and do
that. And because config is cached recently, it should be a cheap
operation.

Apart from that the idea is fine.

> You could have a config option for "if there is a cache there, pretend
> it isn't and ignore it", but I don't see much point.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-16 Thread Jeff King
On Tue, Dec 15, 2015 at 10:05:57PM -0800, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > This is why the_index.has_untracked_cache is not just a simple "Do I
> > want to use this feature?" boolean configuration.  The index also
> > stores the real data, and "Am I using this feature?" bit goes hand
> > in hand with that real data.  Thinking that this is merely a boolean
> > configuration is the real source of the confusion, and introducing a
> > config that overrules what the user has stored in the index needs to
> > add complexity.
> >
> > The additional complexity may (or may not) be justifiable, but in
> > any case "all other things being equal, this is a config" feels like
> > a flawed observation.
> 
> To put it another way, the "bit" in the index (i.e. the presence of
> the cached data) is "Am I using the feature now?".  The effect of
> the feature has to (and is designed to) persist, as it is a cache
> and you do not want a stale cache to give you wrong answers.  There
> is no "Do I want to use the feature?" preference, in other words.
> 
> And I do not mind creating such a preference bit as a configuration.
> 
> That is why I suggested such a configuration to cause the equivalent
> of "update-index --untracked-cache" when a new index is created from
> scratch (as opposed to the case where the previously created cache
> data is carried forward across read_cache() -> do things to the
> index -> write_cache() flow).  Doing it that way will not have to
> involve additional complexity that comes from the desire that
> setting a single configuration on (or off) has to suddenly change
> the behaviour of an index file that is already using (or not using)
> the feature.

I think we may actually be thinking of the same thing. Naively, I would
expect:

  - if there is untracked cache data in the index, we will make use of
it when enumerating untracked files (and my understanding is that if
it is stale, we can detect that)

  - if core.untrackedCache is set, we will update and write out an
untracked cache when we are enumerating the untracked files

  - if there is cache data in the index but that config flag is not set,
presumably we would not update it (we could even explicitly drop it,
but my understanding is that is not necessary for correctness, but
only as a possible optimization).

You could have a config option for "if there is a cache there, pretend
it isn't and ignore it", but I don't see much point.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Christian Couder
On Tue, Dec 15, 2015 at 10:49 AM, Torsten Bögershausen  wrote:
> On 15.12.15 10:34, Christian Couder wrote:
>> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano  wrote:
>>> Junio C Hamano  writes:
>>>
>>> The primary reason why I do not like your "configuration decides" is
>>> it will be a huge source of confusions and bugs.  Imagine what
>>> should happen in this sequence, and when should a stale cached
>>> information be discarded?
>>>
>>>  - the configuration is set to 'yes'.
>>>  - the index is updated and written by various commands.
>>>  - more work is done in the working tree without updating the index.
>>>  - the configuration is set to 'no'.
>>>  - more work is done in the working tree without updating the index.
>>>  - the configuration is set to 'yes'.
>>>  - more work is done in the working tree without updating the index.
>>>  - somebody asks "what untracked paths are there?"
>>
>
>> As far as I understand the UC just stores the mtime of the directories
>> in the working tree to avoid the need of lstat'ing all the files in
>> the directories.
>
> This is what I understand:
> UC stores the mtime of the directories in the working tree to avoid the need
> opendir() readdir() closedir() to find new, yet untracked, files.
> (including sub-directories)

I think you are probably right too.

In the v2 patch series I just sent, there is:

+This feature works by recording the mtime of the working tree
+directories and then omitting reading directories and stat calls
+against files in those directories whose mtime hasn't changed.

I hope it is better.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Christian Couder
On Tue, Dec 15, 2015 at 11:02 AM, Duy Nguyen  wrote:
> On Tue, Dec 15, 2015 at 4:34 PM, Christian Couder
>  wrote:
>> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano  wrote:
>>> Junio C Hamano  writes:
>>>
>>> The primary reason why I do not like your "configuration decides" is
>>> it will be a huge source of confusions and bugs.  Imagine what
>>> should happen in this sequence, and when should a stale cached
>>> information be discarded?
>>>
>>>  - the configuration is set to 'yes'.
>>>  - the index is updated and written by various commands.
>>>  - more work is done in the working tree without updating the index.
>>>  - the configuration is set to 'no'.
>>>  - more work is done in the working tree without updating the index.
>>>  - the configuration is set to 'yes'.
>>>  - more work is done in the working tree without updating the index.
>>>  - somebody asks "what untracked paths are there?"
>>
>> As far as I understand the UC just stores the mtime of the directories
>> in the working tree to avoid the need of lstat'ing all the files in
>> the directories.
>>
>> When somebody asks "what untracked paths are there", if the UC has not
>> been discarded when the configuration was set to no, then git will
>> just ask for the mtimes of the directories in the working tree and
>> compare them with what is in the UC.
>>
>> I don't see how it can create confusion and bugs, as the work done in
>> the working tree should anyway have changed the mtime of the
>> directories where work has been done.
>
> Any operation that can add or remove an entry from the index may lead
> to UC update. For example, if file "foo" is tracked, then the user
> does "git rm --cached foo", "foo" may become either untracked or
> ignored. So if you disable UC while doing this removal, then re-enable
> UC again, "git-status" may show incorrect output. So, as long as UC
> extension exists in the index, it must be updated, or it may become
> outdated and useless.

When UC is disabled, it is removed from the index, and when it is
(re-)enabled, it is recreated, so I don't think that your example
applies to the code in this patch.

Thanks anyway for this insight,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> The way the "config decides" patch series deals with this is that if
> you have the UC information in the index and the configuration is set
> to core.untrackedCache=false the UC will be removed from the index.
>
> Otherwise you would indeed easily end up with a stale cache,...

Yeah, that's "correctness" thing; it goes without saying that it
would be unacceptable if the series did not get this right.

I still have a problem with the approach from "design cleanliness"
point of view, primarily that the index already has a bit to tell us
if the user already said that she wants to use the feature, but
because you want to make config win, the code needs to always read
the config to allow it to disable the feature in the index, just in
case the data that is already in the index says otherwise, and the
code has to keep doing that even after the data is removed [*1*].
Unfortunately, I do not think you can solve the "design cleanliness"
problem unless you give up "we want /etc/gitconfig override".

In any case I think we already have agreed to disagree on this
point, so there is no use discussing it any longer from my side.  I
am not closing the door to this series, but I am not convinced,
either.  At least not yet.

> Once this series is applied and git is shipped with it, existing
> users that have set "git update-index --untracked-cache" will have
> their UC feature disabled.

Well, the fix to that is merely just a one-shot thing away, so it
would not be too much of a hassle, no [*2*]?  So I do not think it
would be such a big issue.

> We *could* make even that use-case work by detecting the legacy marker
> for the UC in the index (the uname info), then we'd do a one-time "git
> config --local core.untrackedCache true" and remove the marker.

I do not think we want to go there---that would mean you would need
to revamp the repository format version because the old tools would
be now unusable on the index/config combo your version mucked with.


[Footnote]

*1* I also do not want to see that design pattern imitated and used
in other parts of the system, and this gives a precedent for
people to copy.

*2* Yet, those who are broken by this series may say "it is
unacceptable that we have to survey all existing repositories
and selectively add the configuration to the ones that have
enabled the feature before updating.", the same way you complain
against the "index already knows" approach.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Christian Couder
On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
> The primary reason why I do not like your "configuration decides" is
> it will be a huge source of confusions and bugs.  Imagine what
> should happen in this sequence, and when should a stale cached
> information be discarded?
>
>  - the configuration is set to 'yes'.
>  - the index is updated and written by various commands.
>  - more work is done in the working tree without updating the index.
>  - the configuration is set to 'no'.
>  - more work is done in the working tree without updating the index.
>  - the configuration is set to 'yes'.
>  - more work is done in the working tree without updating the index.
>  - somebody asks "what untracked paths are there?"

As far as I understand the UC just stores the mtime of the directories
in the working tree to avoid the need of lstat'ing all the files in
the directories.

When somebody asks "what untracked paths are there", if the UC has not
been discarded when the configuration was set to no, then git will
just ask for the mtimes of the directories in the working tree and
compare them with what is in the UC.

I don't see how it can create confusion and bugs, as the work done in
the working tree should anyway have changed the mtime of the
directories where work has been done.

Maybe as the UC has not been updated for a long time, there will be a
lot of mtimes that are different, so there will not be a big speed up
or it could be even slower than if the UC was not used, but that's
all.

> In the "configuration decides" world, I am not sure how a sane
> implementation efficiently invalidates the cache as needed, without
> the config subsystem having intimate knowledge with the untracked
> cache (so far, the config subsystem is merely a key-value store and
> does not care _what_ it stores; you would want to invalidate the
> cache in the index when somebody sets the variable to 'no', which
> means the config subsystem needs to know that this variable is
> special).

In the current patch series and in the one I am preparing and will
probably send soon, this hunk takes care of removing or addind the UC
if needed:

diff --git a/wt-status.c b/wt-status.c
index 435fc28..3e0fe02 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -586,6 +586,15 @@ static void wt_status_collect_untracked(struct
wt_status *s)
dir.flags |= DIR_SHOW_IGNORED_TOO;
else
dir.untracked = the_index.untracked;
+
+   if (!dir.untracked && use_untracked_cache == 1) {
+   add_untracked_cache();
+   dir.untracked = the_index.untracked;
+   } else if (dir.untracked && use_untracked_cache == 0) {
+   remove_untracked_cache();
+   dir.untracked = NULL;
+   }
+
setup_standard_excludes();

fill_directory(, >pathspec);

So when the config option is changed, the UC is removed or recreated
only the next time "git status" (and maybe also "git commit" and a few
other commands) is called.

And anyway I don't think people will change the UC config very often.
Maybe they will play with it a bit when they discover it, but
afterwards they will just set it or not and be done with it. So I
think it is not worth trying to optimize what happens when the config
is set or unset. We should just make sure that it works and it is well
documented.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Duy Nguyen
On Tue, Dec 15, 2015 at 4:34 PM, Christian Couder
 wrote:
> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>>
>> The primary reason why I do not like your "configuration decides" is
>> it will be a huge source of confusions and bugs.  Imagine what
>> should happen in this sequence, and when should a stale cached
>> information be discarded?
>>
>>  - the configuration is set to 'yes'.
>>  - the index is updated and written by various commands.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'no'.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'yes'.
>>  - more work is done in the working tree without updating the index.
>>  - somebody asks "what untracked paths are there?"
>
> As far as I understand the UC just stores the mtime of the directories
> in the working tree to avoid the need of lstat'ing all the files in
> the directories.
>
> When somebody asks "what untracked paths are there", if the UC has not
> been discarded when the configuration was set to no, then git will
> just ask for the mtimes of the directories in the working tree and
> compare them with what is in the UC.
>
> I don't see how it can create confusion and bugs, as the work done in
> the working tree should anyway have changed the mtime of the
> directories where work has been done.

Any operation that can add or remove an entry from the index may lead
to UC update. For example, if file "foo" is tracked, then the user
does "git rm --cached foo", "foo" may become either untracked or
ignored. So if you disable UC while doing this removal, then re-enable
UC again, "git-status" may show incorrect output. So, as long as UC
extension exists in the index, it must be updated, or it may become
outdated and useless.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Torsten Bögershausen
On 15.12.15 10:34, Christian Couder wrote:
> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>>
>> The primary reason why I do not like your "configuration decides" is
>> it will be a huge source of confusions and bugs.  Imagine what
>> should happen in this sequence, and when should a stale cached
>> information be discarded?
>>
>>  - the configuration is set to 'yes'.
>>  - the index is updated and written by various commands.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'no'.
>>  - more work is done in the working tree without updating the index.
>>  - the configuration is set to 'yes'.
>>  - more work is done in the working tree without updating the index.
>>  - somebody asks "what untracked paths are there?"
> 

> As far as I understand the UC just stores the mtime of the directories
> in the working tree to avoid the need of lstat'ing all the files in
> the directories.

This is what I understand:
UC stores the mtime of the directories in the working tree to avoid the need 
opendir() readdir() closedir() to find new, yet untracked, files.
(including sub-directories)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
> I still have a problem with the approach from "design cleanliness"
> point of view[...]
>
> In any case I think we already have agreed to disagree on this
> point, so there is no use discussing it any longer from my side.  I
> am not closing the door to this series, but I am not convinced,
> either.  At least not yet.

In general the fantastic thing about the git configuration facility is
that it provides both systems administrators and normal users with
what they want. It's possible to configure things system-wide and
override those on a user or repository basis.

Of course hindsight is 20/20, but I think that given what's been
covered in this thread it's been established that it's categorically
better that if we introduce features like these that they be
configured through the normal configuration facility rather than the
configuration being sticky to the index. It gives you everything that
the per-index configuration gives you and more.

So assuming that's the case, how do we migrate something that's
configured via the index towards being configured through git-config?

I think there's no general answer to that, but in this case the worst
case scenario with accepting this series as-is is that we downgrade
some users who've opted in to it to pre-v2.5.0 "git status"
performance.

Since the change in performance really isn't noticeable except on
really large repositories, which are more likely to have someone
involved watching the changelog on upgrades I think that's OK.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Of course hindsight is 20/20, but I think that given what's been
> covered in this thread it's been established that it's categorically
> better that if we introduce features like these that they be
> configured through the normal configuration facility rather than the
> configuration being sticky to the index.

I doubt that any such thing has been established at all in this
thread.  It may be true that you and perhaps Christian loudly
repeated it, but loudly repeating something and establishing
something as a fact are slightly different.

The thing is, I do not necessarily view this as "configuration".
The way I see the feature is that you say "--untracked" when you
want the states of untracked paths be kept track of in the index,
just like you say "git add Makefile" when you want the state of
'Makefile' be kept track of in the index.  Either the index keeps
track of it, or it doesn't, based solely on user's request, and the
bit to tell us which is the case is already in the index, exactly
because that is part of the data that is kept track of in the index.

> Since the change in performance really isn't noticeable except on
> really large repositories, which are more likely to have someone
> involved watching the changelog on upgrades I think that's OK.

Especially it is dubious to me that the trade-off you are making
with this design is a good one.  In order to avoid paying a one-time
cost to run "update-index --untracked-cache" at sites that _do_ want
to use that feature (and after that, if you teach "git init" and
"git clone" to pay attention to the "give you the default"
configuration to run it for you, so that your users won't have to),
you are forcing all codepaths that makes any write to the index (not
just "init"-time) to make an extra check with the configuration all
the time for everybody, because you made the presence of the
untracked cache data in the index not usable as a sign that the user
wants to use that feature.  If the feature is something only those
with really large repositories care about, is it a good trade-off to
make everybody pay the runtime cost and make code more complex and
fragile?  I am not yet convinced.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Junio C Hamano
Jeff King  writes:

>> If the feature is something only those with really large repositories
>> care about, is it a good trade-off to make everybody pay the runtime
>> cost and make code more complex and fragile?  I am not yet convinced.
>
> I'm not sure I understand the runtime and complexity costs of the config
> option. Isn't it just:
>
>   if (core_untracked_cache) {
>   /* do the thing */
>   }
>
> and loading core_untracked_cache in git_default_core_config()? Versus:
>
>   if (the_index.has_untracked_cache) {
> /* do the thing */
>   }

The latter is pretty much so, but the former needs to be a bit more
involved, e.g. more like:

if (core_untracked_cache) {
if (!the_index.has_untracked_cache)
create_and_attach_uc(_index);
/* do the thing */
} else {
if (the_index.has_untracked_cache)
destroy and remove untracked cache;
}

Otherwise, after adding the cache to the index, flipping the config
off, doing things with the index and working tree and then filpping
the config back on, the index would have a stale cache that does not
know what happened to the working tree while config was off, and
your "git status" will start throwing a wrong result.

This is why the_index.has_untracked_cache is not just a simple "Do I
want to use this feature?" boolean configuration.  The index also
stores the real data, and "Am I using this feature?" bit goes hand
in hand with that real data.  Thinking that this is merely a boolean
configuration is the real source of the confusion, and introducing a
config that overrules what the user has stored in the index needs to
add complexity.

The additional complexity may (or may not) be justifiable, but in
any case "all other things being equal, this is a config" feels like
a flawed observation.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Junio C Hamano
Junio C Hamano  writes:

> This is why the_index.has_untracked_cache is not just a simple "Do I
> want to use this feature?" boolean configuration.  The index also
> stores the real data, and "Am I using this feature?" bit goes hand
> in hand with that real data.  Thinking that this is merely a boolean
> configuration is the real source of the confusion, and introducing a
> config that overrules what the user has stored in the index needs to
> add complexity.
>
> The additional complexity may (or may not) be justifiable, but in
> any case "all other things being equal, this is a config" feels like
> a flawed observation.

To put it another way, the "bit" in the index (i.e. the presence of
the cached data) is "Am I using the feature now?".  The effect of
the feature has to (and is designed to) persist, as it is a cache
and you do not want a stale cache to give you wrong answers.  There
is no "Do I want to use the feature?" preference, in other words.

And I do not mind creating such a preference bit as a configuration.

That is why I suggested such a configuration to cause the equivalent
of "update-index --untracked-cache" when a new index is created from
scratch (as opposed to the case where the previously created cache
data is carried forward across read_cache() -> do things to the
index -> write_cache() flow).  Doing it that way will not have to
involve additional complexity that comes from the desire that
setting a single configuration on (or off) has to suddenly change
the behaviour of an index file that is already using (or not using)
the feature.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Christian Couder
On Tue, Dec 15, 2015 at 2:04 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano  wrote:
>
> I'm replying to & quoting from two E-Mails of yours at once here for
> clarity & less noise. I'm working wich Christian on getting this
> integrated, and we both thought it would be good to have some fresh
> input on the matter from me.
>
>> Christian Couder  writes:
>
>>> If you want only some repos to use the UC, you will set
>>> core.untrackedCache in the repo config. Then after cloning such a
>>> repo, you will copy the config file, and this will not be enough to
>>> enable the UC.
>>
>> Surely.  "Does this index file keeps track of the untracked files'
>> states?" is a property of the index.  Cloning does not propagate the
>> configuration and copying or not copying is irrelevant.  If you want
>> to enable, running "update-index --untracked-cache" is a way to do
>> so.  I cannot see what's so hard about it.
>>
>>> And if you have set core.untrackedCache in the global config when you
>>> clone, UC is enabled, but if you have just set it in the repo config
>>> after the clone, it is not enabled.
>>
>> That's fine.  In your patch series, if you set it in the global, you
>> will get the cache in the new one.  With the cleaned-up semantics I
>> suggested, the same thing will happen.
>>
>> And with the cleaned-up semantics, the configuration is *ONLY* used
>> to give the *DEFAULT* before other things happen, i.e. creation of
>> the index file for the first time.  Because the configuration is
>> only the default, an explicit "update-index --[no-]untracked-cache"
>> will defeat it, just like any other config/option interaction.
>
> As you know Christian is working on this for Booking.com to integrate
> features we find useful into git.git in such a way that we don't have
> to maintain some internal fork of Git.
>
> What we're trying to do, and what a lot of other big deployments of
> Git elsewhere would also find useful, is to ship a default sensible
> configuration for all users on the system in /etc/gitconfig.
>
> I'd like to be able to easily enable some feature that aids Git
> performance globally on our thousands of machines and for our hundreds
> of users by just tweaking something in puppet to change
> /etc/gitconfig, and more importantly if that change ends up being bad
> reverting that config in /etc/gitconfig should undo the change.
>
> It's an unacceptable level of complexity for system-level automation
> to have to scour the filesystem for existing Git repositories and run
> "git update-index" on each of them, that's why we're submitting
> patches to make this a config option, so we can simply flip a flag in
> /etc/gitconfig.
>
> It's also unacceptable to have the config simply provide the default
> which'll be frozen either at clone time or after an initial "git
> status".
>
> Let's say I ship a /etc/gitconfig that says "new clones should use the
> untracked cache". Now I roll that out across our fleet of machines and
> it turns out the morning after that the feature doesn't work properly
> for whatever reason. If it's just a "default until clone or status"
> type of thing even if I revert the configuration a lot of users &
> their repositories in the wild will still be broken, and will have to
> be manually fixed. Which again leads to the scouring the filesystem
> problem.
>
> So that gives some more context for why we're pushing for this change.
> I believe this feature breaks no existing use-case and just supports
> new ones, and I think that your objections to it are based on a simple
> misunderstanding as will become apparent if you read on below.
>
>> The biggest issue I had with your patch series, IIRC, is that
>> configuration will defeat the command line option.
>
> I think it's a moot point to focus on configuration v.s. command-line
> option. The important question is whether or not this feature can
> still be configured on a repo-local basis with this series as before.
> That's still the case since --local git configuration overrides
> --global and --system, so users who want to enable/disable this
> per-repo still can.
>
>>> Shouldn't it be nice if they could just enable core.untrackedCache in
>>> the global config files without having to also cd into every repo and
>>> use "git update-index --untracked-cache" there?
>>
>> NO.  It is bad to change the behaviour behind users' back.
>
> I'm not quite sure what the objection here is exactly. If you're a
> normal user you can enable/disable this per-repo just like you can
> now, and enable/disable it for all your repos in ~/.gitconfig.
>
> If you mean that the user's configuration shouldn't be changed by the
> global config in /etc/gitconfig I do think that's a moot point. If
> you're a user on a system where I have root and I want to change your
> Git configuration I'm going to be able to do that whatever the
> mechanism is.
>
> That's indeed 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano  wrote:

I'm replying to & quoting from two E-Mails of yours at once here for
clarity & less noise. I'm working wich Christian on getting this
integrated, and we both thought it would be good to have some fresh
input on the matter from me.

> Christian Couder  writes:

>> If you want only some repos to use the UC, you will set
>> core.untrackedCache in the repo config. Then after cloning such a
>> repo, you will copy the config file, and this will not be enough to
>> enable the UC.
>
> Surely.  "Does this index file keeps track of the untracked files'
> states?" is a property of the index.  Cloning does not propagate the
> configuration and copying or not copying is irrelevant.  If you want
> to enable, running "update-index --untracked-cache" is a way to do
> so.  I cannot see what's so hard about it.
>
>> And if you have set core.untrackedCache in the global config when you
>> clone, UC is enabled, but if you have just set it in the repo config
>> after the clone, it is not enabled.
>
> That's fine.  In your patch series, if you set it in the global, you
> will get the cache in the new one.  With the cleaned-up semantics I
> suggested, the same thing will happen.
>
> And with the cleaned-up semantics, the configuration is *ONLY* used
> to give the *DEFAULT* before other things happen, i.e. creation of
> the index file for the first time.  Because the configuration is
> only the default, an explicit "update-index --[no-]untracked-cache"
> will defeat it, just like any other config/option interaction.

As you know Christian is working on this for Booking.com to integrate
features we find useful into git.git in such a way that we don't have
to maintain some internal fork of Git.

What we're trying to do, and what a lot of other big deployments of
Git elsewhere would also find useful, is to ship a default sensible
configuration for all users on the system in /etc/gitconfig.

I'd like to be able to easily enable some feature that aids Git
performance globally on our thousands of machines and for our hundreds
of users by just tweaking something in puppet to change
/etc/gitconfig, and more importantly if that change ends up being bad
reverting that config in /etc/gitconfig should undo the change.

It's an unacceptable level of complexity for system-level automation
to have to scour the filesystem for existing Git repositories and run
"git update-index" on each of them, that's why we're submitting
patches to make this a config option, so we can simply flip a flag in
/etc/gitconfig.

It's also unacceptable to have the config simply provide the default
which'll be frozen either at clone time or after an initial "git
status".

Let's say I ship a /etc/gitconfig that says "new clones should use the
untracked cache". Now I roll that out across our fleet of machines and
it turns out the morning after that the feature doesn't work properly
for whatever reason. If it's just a "default until clone or status"
type of thing even if I revert the configuration a lot of users &
their repositories in the wild will still be broken, and will have to
be manually fixed. Which again leads to the scouring the filesystem
problem.

So that gives some more context for why we're pushing for this change.
I believe this feature breaks no existing use-case and just supports
new ones, and I think that your objections to it are based on a simple
misunderstanding as will become apparent if you read on below.

> The biggest issue I had with your patch series, IIRC, is that
> configuration will defeat the command line option.

I think it's a moot point to focus on configuration v.s. command-line
option. The important question is whether or not this feature can
still be configured on a repo-local basis with this series as before.
That's still the case since --local git configuration overrides
--global and --system, so users who want to enable/disable this
per-repo still can.

>> Shouldn't it be nice if they could just enable core.untrackedCache in
>> the global config files without having to also cd into every repo and
>> use "git update-index --untracked-cache" there?
>
> NO.  It is bad to change the behaviour behind users' back.

I'm not quite sure what the objection here is exactly. If you're a
normal user you can enable/disable this per-repo just like you can
now, and enable/disable it for all your repos in ~/.gitconfig.

If you mean that the user's configuration shouldn't be changed by the
global config in /etc/gitconfig I do think that's a moot point. If
you're a user on a system where I have root and I want to change your
Git configuration I'm going to be able to do that whatever the
mechanism is.

That's indeed that's what we're doing to enable this at Booking.com
currently, we run a job to find some limited set of common checkouts
and run "git update-index" for users as root. The problem with that is
that it's needlessly complex, hence this 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Wed, Dec 16, 2015 at 12:03 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>> Of course hindsight is 20/20, but I think that given what's been
>> covered in this thread it's been established that it's categorically
>> better that if we introduce features like these that they be
>> configured through the normal configuration facility rather than the
>> configuration being sticky to the index.
>
> I doubt that any such thing has been established at all in this
> thread.  It may be true that you and perhaps Christian loudly
> repeated it, but loudly repeating something and establishing
> something as a fact are slightly different.
>
> The thing is, I do not necessarily view this as "configuration".
> The way I see the feature is that you say "--untracked" when you
> want the states of untracked paths be kept track of in the index.

You probably know this, but the --untracked-cache has no bearing on
what we actually keep track of, it's just an optimization for how
efficiently we execute "git status" commands without the "-uno"
option. We still produce the same output.

> just like you say "git add Makefile" when you want the state of
> 'Makefile' be kept track of in the index.  Either the index keeps
> track of it, or it doesn't, based solely on user's request, and the
> bit to tell us which is the case is already in the index, exactly
> because that is part of the data that is kept track of in the index.

What I mean by "[we've] established that it's categorically better [to
do this via git-config]" is that we can still do all that stuff, we
can just also do more stuff now.

>> Since the change in performance really isn't noticeable except on
>> really large repositories, which are more likely to have someone
>> involved watching the changelog on upgrades I think that's OK.
>
> Especially it is dubious to me that the trade-off you are making
> with this design is a good one.  In order to avoid paying a one-time
> cost to run "update-index --untracked-cache" at sites that _do_ want
> to use that feature (and after that, if you teach "git init" and
> "git clone" to pay attention to the "give you the default"
> configuration to run it for you, so that your users won't have to),

It's not unreasonable to avoid the cost of running "update-index
--untracked-cache", it's the difference between just adjusting
/etc/gitconfig and continually having to traverse the entire /
filesystem if you want to enable this feature on a system-wide basis.
It should be easy to enable any Git feature via the configuration
facility either on a --system, or --global or --local basis.

> you are forcing all codepaths that makes any write to the index (not
> just "init"-time) to make an extra check with the configuration all
> the time for everybody, because you made the presence of the
> untracked cache data in the index not usable as a sign that the user
> wants to use that feature.

Maybe I'm misunderstanding Christian's patches but don't we already
parse the git configuration on any commands that update the index
anyway? See git_default_core_config().
We already parse the git configuration to run "git status".

> If the feature is something only those
> with really large repositories care about, is it a good trade-off to
> make everybody pay the runtime cost and make code more complex and
> fragile?  I am not yet convinced.

I was arguing that only users with really large repositories would
notice if we turned this off because the enabling facility had changed
from per-index to config. But it doesn't follow that the expense of
checking the git configuration which we're parsing anyway for the
index-related commands makes things more complex & fragile.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Jeff King
On Tue, Dec 15, 2015 at 03:03:14PM -0800, Junio C Hamano wrote:

> The thing is, I do not necessarily view this as "configuration".
> The way I see the feature is that you say "--untracked" when you
> want the states of untracked paths be kept track of in the index,
> just like you say "git add Makefile" when you want the state of
> 'Makefile' be kept track of in the index.  Either the index keeps
> track of it, or it doesn't, based solely on user's request, and the
> bit to tell us which is the case is already in the index, exactly
> because that is part of the data that is kept track of in the index.

I know this is a fairly subjective argument, but it feels quite weird
for me for such a config to persist in the index and not be mentioned
anywhere else.

Is there any other user-specified configuration option for which:

  rm -f .git/index
  git read-tree HEAD

will actually _lose_ information?

It seems to me that all other things being equal, we should be in favor
of a config option simply because it reduces the cognitive burden on the
user: it's one fewer place they need to be aware that git is keeping
persistent decisions.

> If the feature is something only those with really large repositories
> care about, is it a good trade-off to make everybody pay the runtime
> cost and make code more complex and fragile?  I am not yet convinced.

I'm not sure I understand the runtime and complexity costs of the config
option. Isn't it just:

  if (core_untracked_cache) {
/* do the thing */
  }

and loading core_untracked_cache in git_default_core_config()? Versus:

  if (the_index.has_untracked_cache) {
/* do the thing */
  }

I ask as somebody who hasn't followed the topic closely. I just don't
see what is that different about this versus other config options. What
am I missing?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-14 Thread Christian Couder
On Tue, Dec 8, 2015 at 11:43 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Christian Couder  writes:
>>
>>> When we know that mtime is fully supported by the environment, we
>>> might want the untracked cache to be always used by default without
>>> any mtime test or kernel version check being performed.
>>>
>>> Also when we know that mtime is not supported by the environment,
>>> for example because the repo is shared over a network file system,
>>> then we might want 'git update-index --untracked-cache' to fail
>>> immediately instead of preforming tests (because it might work on
>>> some systems using the repo over the network file system but not
>>> others).
>>> ...
>> The logic in this paragraph is fuzzy to me.  Shouldn't the config
>> give a sensible default, that is overriden by command line options?

The problem is that "git update-index --[no-|force-]untracked-cache"
is not just changing things for the duration of the current command.
It is in fact changing the configuration of the repository, because it
is adding the UC (untracked cache) in the index and saving the index,
which will persist the use of the UC for the following commands.

So it is very different from something like "git status
--no-untracked-cache" that would perform a "git status" without using
the UC.

In fact "git update-index --[no-|force-]untracked-cache" is very bad
because it means that two repositories can be configured differently
even if they have the same config files.

>> I agree that it is insane to do a runtime check when the user says
>> "update-index --untracked-cache" to enable it, as the user _knows_
>> that enabling it would help (or the user _knows_ that she wants to
>> play with it).  Similarly, shouldn't the config be ignored when the
>> user says "update-index --no-untracked-cache" (hence removing the
>> untracked cache from the resulting index no matter the config is set
>> to)?  ...
>
> As I think about this more, it really seems to me that we shouldn't
> need to make this configuration variable that special.  Because I
> think it is a *BUG* in the current implementation to do the runtime
> check even when the user explicitly tells us to use untracked-cache,
> I'd imagine something along the lines of the following would be a
> lot simpler, easier to understand and a generally more useful
> bugfix:
>
>  1 Add one boolean configuration variable, core.untrackedCache, that
>defaults to 'false'.
>
>  2 When creating an index file in an empty repository, enable the
>untracked cache in the index (even without the user runninng
>"update-index --untracked-cache") iff the configuration is set to
>'true'.  No runtime check necessary.

I guess this means that when cloning a repo, it would not use the UC
unless either "git -c core.untrackedCache=true clone ..." is used, or
core.untrackedCache is set in the global config.

>  3 When working on an existing index file, unless the operation is
>"update-index --[no-]untracked-cache", keep the current setting,
>regardless of this configuration variable.  No runtime check
>necessary.
>
>  4 "update-index --[no-]untracked-cache" should enable or disable
>the untracked cache as the user tells us, regardless of the
>configuration variable.  No runtime check necessary.

If you want only some repos to use the UC, you will set
core.untrackedCache in the repo config. Then after cloning such a
repo, you will copy the config file, and this will not be enough to
enable the UC. The original repo will use the UC but the cloned one
will not, and you might wonder why is "git status" slower in the
cloned repo despite the machines and the config being the same for
both repos?

And if you have set core.untrackedCache in the global config when you
clone, UC is enabled, but if you have just set it in the repo config
after the clone, it is not enabled.

This is quite bad in my opinion.

Also think about system administrators who have a lot of machines with
lots of repos everywhere on these machines and just upgrade Git from
let's say v2.4.0 to v2.8.0. They find out that using UC git status
will be faster and want to provide that to the machine's users knowing
that they only use recent Linux machines where mtime works well.
Shouldn't it be nice if they could just enable core.untrackedCache in
the global config files without having to also cd into every repo and
use "git update-index --untracked-cache" there?

If we consider that "git update-index --[no-|force-]untracked-cache"
should not have been created in the first place, and that the good
mechanism for this is a config variable, and that maybe "git
update-index --[no-|force-]untracked-cache" should just be deprecated,
then the important thing is to make the core.untrackedCache config
variable work in the best possible way.

> It is OK to then add an "auto-detect" on top of the above, that
> would only affect the second bullet 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-14 Thread Junio C Hamano
Junio C Hamano  writes:

> If you stop thinking that "update-index --untracked-cache" is
> somehow a "configuration", things will get clearer to you.
> ...
>> "git update-index --[no-|force-]untracked-cache" is a bad way, so
>> let's make it easy for people to not use it at all.
>
> As I disagree with that basic premise, I have to disagree with the
> conclusion as well.

Having said all that, I do not think an option to "update-index"
must be the _only_ interface to tell the index to use (or ignore)
the untracked cache.  Two obvious places that can also have the same
command line option would be "git init" and "git clone".  If either
the per-user configuration (or the per-site one the administrator
sets) gave the default for these two commands, that would make it
unnecessary to use "update-index", unless you are experimenting or
working around bugs in the implementation.

The primary reason why I do not like your "configuration decides" is
it will be a huge source of confusions and bugs.  Imagine what
should happen in this sequence, and when should a stale cached
information be discarded?

 - the configuration is set to 'yes'.
 - the index is updated and written by various commands.
 - more work is done in the working tree without updating the index.
 - the configuration is set to 'no'.
 - more work is done in the working tree without updating the index.
 - the configuration is set to 'yes'.
 - more work is done in the working tree without updating the index.
 - somebody asks "what untracked paths are there?"

In the "configuration decides" world, I am not sure how a sane
implementation efficiently invalidates the cache as needed, without
the config subsystem having intimate knowledge with the untracked
cache (so far, the config subsystem is merely a key-value store and
does not care _what_ it stores; you would want to invalidate the
cache in the index when somebody sets the variable to 'no', which
means the config subsystem needs to know that this variable is
special).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-14 Thread Junio C Hamano
Christian Couder  writes:

> In fact "git update-index --[no-|force-]untracked-cache" is very bad
> because it means that two repositories can be configured differently
> even if they have the same config files.

If you stop thinking that "update-index --untracked-cache" is
somehow a "configuration", things will get clearer to you.

Imagine there is no configuration whatsoever.  The index primarily
keeps track of what will be in the next "write-tree", but optionally
it can keep track of other kinds of information, such as the last
unmerged states for paths whose conflicts have been resolved.  Duy's
work adds another kind of information to the mix--cached stat bits
for untracked paths to speed up the next "git status", and the option
is to start keeping track of that information in the index.

Because it is a "cache", once you start keeping track of it, you
need to maintain it--otherwise you will be fooled by stale
information still in the cache.  So of course the effect has to
persist.  There is nothing _wrong_ with that.  If you want to stop
maintaining it, you can tell the index "don't use untracked-cache".

So I think the above "is very bad because" is total nonsense.

> If you want only some repos to use the UC, you will set
> core.untrackedCache in the repo config. Then after cloning such a
> repo, you will copy the config file, and this will not be enough to
> enable the UC.

Surely.  "Does this index file keeps track of the untracked files'
states?" is a property of the index.  Cloning does not propagate the
configuration and copying or not copying is irrelevant.  If you want
to enable, running "update-index --untracked-cache" is a way to do
so.  I cannot see what's so hard about it.

> And if you have set core.untrackedCache in the global config when you
> clone, UC is enabled, but if you have just set it in the repo config
> after the clone, it is not enabled.

That's fine.  In your patch series, if you set it in the global, you
will get the cache in the new one.  With the cleaned-up semantics I
suggested, the same thing will happen.

And with the cleaned-up semantics, the configuration is *ONLY* used
to give the *DEFAULT* before other things happen, i.e. creation of
the index file for the first time.  Because the configuration is
only the default, an explicit "update-index --[no-]untracked-cache"
will defeat it, just like any other config/option interaction.

The biggest issue I had with your patch series, IIRC, is that
configuration will defeat the command line option.

> Shouldn't it be nice if they could just enable core.untrackedCache in
> the global config files without having to also cd into every repo and
> use "git update-index --untracked-cache" there?

NO.  It is bad to change the behaviour behind users' back.

Set that once, _future_ repositories their users will create will
use that by default, and tell their users what to do to their
existing repositories.  Problem solved.

> "git update-index --[no-|force-]untracked-cache" is a bad way, so
> let's make it easy for people to not use it at all.

As I disagree with that basic premise, I have to disagree with the
conclusion as well.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-09 Thread Torsten Bögershausen
On 08.12.15 18:15, Christian Couder wrote:
> When we know that mtime is fully supported by the environment, we
> might want the untracked cache to be always used by default without
> any mtime test or kernel version check being performed.
> 
For the people which didn't follow the discussion, or read this in 
a year or 2:
A short description what "mtime is fully supported by the environment" means
would be nice:
When a file system object is added or deleted in a directory, 
and stat(dirname, ) returns an updated st.st_mtime.
> Also when we know that mtime is not supported by the environment,
> for example because the repo is shared over a network file system,
> then we might want 'git update-index --untracked-cache' to fail
> immediately instead of preforming tests (because it might work on
 performing
> some systems using the repo over the network file system but not
> others).
> 
> The normal way to achieve the above in Git is to use a config
> variable. That's why this patch introduces "core.untrackedCache".
> 
> To keep things simple, this variable is a bool which default to
> false.
If this is the case, can we remove some code?
e.g add_untracked_ident() in dir.c, do we still need it?
And probably some other functions as well.
Or would it be better to say 
"false" is false,
"true" is true
"unset" is as before (Where when different machines/OS/mount points
  access the same repo over a network file system, some use the UT, some don't)

> 
> When "git status" is run, it now adds or removes the untracked
> cache in the index to respect the value of this variable.
> 
> This means that `git update-index --[no-|force-]untracked-cache`,
> to be compatible with the new config variable, have to set or
> unset it.
what does unset mean ? Set to false ?
 This new behavior is backward incompatible change, but
> that is deliberate.

> 
> Also `--untracked-cache` used to check that the underlying
> operating system and file system change `st_mtime` field of a
> directory if files are added or deleted in that directory. But
> those tests take a long time and there is now
> `--test-untracked-cache` to perform them.
> 
> So to be more consistent with other git commands, this patch
> prevents `--untracked-cache` to perform tests. This means that
> after this patch there is no difference any more between
> `--untracked-cache` and `--force-untracked-cache`.
> 
> Signed-off-by: Christian Couder 
> ---
>  Documentation/config.txt   |  7 +++
>  Documentation/git-update-index.txt | 28 ++--
>  builtin/update-index.c | 23 +--
>  cache.h|  1 +
>  config.c   |  4 
>  contrib/completion/git-completion.bash |  1 +
>  dir.c  |  2 +-
>  environment.c  |  1 +
>  t/t7063-status-untracked-cache.sh  |  4 +---
>  wt-status.c|  9 +
>  10 files changed, 56 insertions(+), 24 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2d06b11..94820eb 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -308,6 +308,13 @@ core.trustctime::
>   crawlers and some backup systems).
>   See linkgit:git-update-index[1]. True by default.
>  
> +core.untrackedCache::
> + Determines if untracked cache will be enabled. Using
> + 'git update-index --[no-|force-]untracked-cache' will set
> + this variable. 
set to what ? true ? false ?

Before setting it to true, you should check
> + that mtime is working properly on your system.
> + See linkgit:git-update-index[1]. False by default.
> +
isn't this what "git update-index --test-untracked-cached" is good for?

>  core.checkStat::
>   Determines which stat fields to match between the index
>   and work tree. The user can set this to 'default' or
> diff --git a/Documentation/git-update-index.txt 
> b/Documentation/git-update-index.txt
> index 0ff7396..0fb39db 100644
> --- a/Documentation/git-update-index.txt
> +++ b/Documentation/git-update-index.txt
> @@ -175,22 +175,28 @@ may not support it yet.
>  --no-untracked-cache::
>   Enable or disable untracked cache extension. This could speed
>   up for commands that involve determining untracked files such
> + as `git status`.
> ++
> +The underlying operating system and file system must change `st_mtime`
> +field of a directory if files are added or deleted in that
> +directory. You can test that using
> +`--test-untracked-cache`. 

`--untracked-cache` used to test that too
> +but it doesn't anymore.
Do we need this historical comment ?

> ++
> +This sets the `core.untrackedCache` configuration variable to 'true'
> +or 'false' in the repo config file, (see linkgit:git-config[1]), so
> +that the untracked cache stays enabled or disabled.
>  
>  --test-untracked-cache::
>   

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-08 Thread Junio C Hamano
Christian Couder  writes:

> When we know that mtime is fully supported by the environment, we
> might want the untracked cache to be always used by default without
> any mtime test or kernel version check being performed.
>
> Also when we know that mtime is not supported by the environment,
> for example because the repo is shared over a network file system,
> then we might want 'git update-index --untracked-cache' to fail
> immediately instead of preforming tests (because it might work on
> some systems using the repo over the network file system but not
> others).
>
> The normal way to achieve the above in Git is to use a config
> variable. That's why this patch introduces "core.untrackedCache".
>
> To keep things simple, this variable is a bool which default to
> false.
>
> When "git status" is run, it now adds or removes the untracked
> cache in the index to respect the value of this variable.
>
> This means that `git update-index --[no-|force-]untracked-cache`,
> to be compatible with the new config variable, have to set or
> unset it. This new behavior is backward incompatible change, but
> that is deliberate.

The logic in this paragraph is fuzzy to me.  Shouldn't the config
give a sensible default, that is overriden by command line options?
I agree that it is insane to do a runtime check when the user says
"update-index --untracked-cache" to enable it, as the user _knows_
that enabling it would help (or the user _knows_ that she wants to
play with it).  Similarly, shouldn't the config be ignored when the
user says "update-index --no-untracked-cache" (hence removing the
untracked cache from the resulting index no matter the config is set
to)?  Why is it a good idea to allow an explicit operation from the
command line to muck with the configuration?  If the user wants to
change the configuration permanently, "git config" has been there
for the purpose for all the configuration variables to do so for
almost forever.  Why is it a good idea to make this variable so
special that the user does not have to use "git config" but disable
or enable it in some other way?

> Also `--untracked-cache` used to check that the underlying
> operating system and file system change `st_mtime` field of a
> directory if files are added or deleted in that directory. But
> those tests take a long time and there is now
> `--test-untracked-cache` to perform them.
>
> So to be more consistent with other git commands, this patch
> prevents `--untracked-cache` to perform tests.

Good change.

> This means that
> after this patch there is no difference any more between
> `--untracked-cache` and `--force-untracked-cache`.

A tip to write the explanation.  Every time you say "This means
that...", you are (perhaps unconsciously) admitting that what you
wrote immedidately before that may not be understandable and what
comes after it may be a better explanation.  Discard the sentence
before "This means that", and try to formulate your explanation
around what you wrote after it, adding information in the discarded
earlier sentence back to the later one.  Doing this exercise often
helps the result read much better.

>
> Signed-off-by: Christian Couder 
> ---
>  Documentation/config.txt   |  7 +++
>  Documentation/git-update-index.txt | 28 ++--
>  builtin/update-index.c | 23 +--
>  cache.h|  1 +
>  config.c   |  4 
>  contrib/completion/git-completion.bash |  1 +
>  dir.c  |  2 +-
>  environment.c  |  1 +
>  t/t7063-status-untracked-cache.sh  |  4 +---
>  wt-status.c|  9 +
>  10 files changed, 56 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2d06b11..94820eb 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -308,6 +308,13 @@ core.trustctime::
>   crawlers and some backup systems).
>   See linkgit:git-update-index[1]. True by default.
>  
> +core.untrackedCache::
> + Determines if untracked cache will be enabled. Using
> + 'git update-index --[no-|force-]untracked-cache' will set
> + this variable. Before setting it to true, you should check
> + that mtime is working properly on your system.
> + See linkgit:git-update-index[1]. False by default.
> +
>  core.checkStat::
>   Determines which stat fields to match between the index
>   and work tree. The user can set this to 'default' or
> diff --git a/Documentation/git-update-index.txt 
> b/Documentation/git-update-index.txt
> index 0ff7396..0fb39db 100644
> --- a/Documentation/git-update-index.txt
> +++ b/Documentation/git-update-index.txt
> @@ -175,22 +175,28 @@ may not support it yet.
>  --no-untracked-cache::
>   Enable or disable untracked cache extension. This could 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-08 Thread Junio C Hamano
Junio C Hamano  writes:

> Christian Couder  writes:
>
>> When we know that mtime is fully supported by the environment, we
>> might want the untracked cache to be always used by default without
>> any mtime test or kernel version check being performed.
>>
>> Also when we know that mtime is not supported by the environment,
>> for example because the repo is shared over a network file system,
>> then we might want 'git update-index --untracked-cache' to fail
>> immediately instead of preforming tests (because it might work on
>> some systems using the repo over the network file system but not
>> others).
>> ...
> The logic in this paragraph is fuzzy to me.  Shouldn't the config
> give a sensible default, that is overriden by command line options?
> I agree that it is insane to do a runtime check when the user says
> "update-index --untracked-cache" to enable it, as the user _knows_
> that enabling it would help (or the user _knows_ that she wants to
> play with it).  Similarly, shouldn't the config be ignored when the
> user says "update-index --no-untracked-cache" (hence removing the
> untracked cache from the resulting index no matter the config is set
> to)?  ...

As I think about this more, it really seems to me that we shouldn't
need to make this configuration variable that special.  Because I
think it is a *BUG* in the current implementation to do the runtime
check even when the user explicitly tells us to use untracked-cache,
I'd imagine something along the lines of the following would be a
lot simpler, easier to understand and a generally more useful
bugfix:

 1 Add one boolean configuration variable, core.untrackedCache, that
   defaults to 'false'.

 2 When creating an index file in an empty repository, enable the
   untracked cache in the index (even without the user runninng
   "update-index --untracked-cache") iff the configuration is set to
   'true'.  No runtime check necessary.

 3 When working on an existing index file, unless the operation is
   "update-index --[no-]untracked-cache", keep the current setting,
   regardless of this configuration variable.  No runtime check
   necessary.

 4 "update-index --[no-]untracked-cache" should enable or disable
   the untracked cache as the user tells us, regardless of the
   configuration variable.  No runtime check necessary.

It is OK to then add an "auto-detect" on top of the above, that
would only affect the second bullet point, like so:

 2a When creating an index file in an empty repository, if the
configuration is set to 'auto', do the lengthy runtime check and
enable the untracked cache in the index (even without the user
runninng "update-index --untracked-cache").

without changing any other in the first 4-bullet list.

Am I missing some other requirements?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html