Re: [PATCH 7/8] config: add core.untrackedCache
Duy Nguyenwrites: > Sounds good, except.. > >> When write_index() writes out an index that wasn't initialized from >> the filesystem, a new UNTR gets added only when the configuration is >> set to 'true' and there is no in-core UNTR already. > > Do we really need this? We don't; you're right. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Thu, Dec 24, 2015 at 2:56 AM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Duy Nguyen writes: >> >>> In that case we can just check config once in read_index_from and >>> destroy UNTR extension. Or the middle ground, we check config once in >>> that place, make a note in struct index_state, and make invalidate_* >>> check that note instead of config file. The middle ground has an >>> advantage over destroying UNTR: (probably) many operations will touch >>> index but do not add or remove entries. >> >> Or we can even teach read_index_from() to skip reading the extension >> without even parsing when the configuration tells it that the >> feature is force-disabled. It can also add an empty one when the >> configuration tells it that the feature is force-enabled and there >> is no UNTR extension yet. > > Thinking about this a bit more, I am starting to feel that the > config that can be used to optionally override the presence of > in-index UNTR extension does not have to be too bad a thing, > provided if it is done with a few tweaks to the design I read in > Christian & Ævar's messages. Great! > One tweak is to address the following from Ævar's message: > >>> Once this series is applied and git is shipped with it existing >>> users that have set "git update-index --untracked-cache" will have >>> their UC feature disabled. This is because we fall back to "oh no >>> config here? Must have been disabled, rm it from the index" clause >>> which keeps our UC from going stale in the face of config >>> flip-flopping. > > The above would happen only if the configuration is a boolean that > defaults to false. I'd think we can have this a tristate instead. > That is, config.untrackedCache can be explicitly set to 'true', > 'false', or 'keep'. And make a missing config.untrackedCache > default to 'keep'. Ok. The first RFC patch series about this had a tristate and I switched to a boolean as it seemed that people prefered a boolean, but you are right that a tristate is more backward compatible. > When read_index_from() reads an existing index: > > When the configuration is set to 'true': > an existing UNTR is kept, otherwise a new UNTR gets added. > When the configuration is set to 'false: > an existing UNTR is dropped. > When the configuration is set to 'keep': > an existing UNTR is kept, otherwise nothing happens. > > When write_index() writes out an index that wasn't initialized from > the filesystem, a new UNTR gets added only when the configuration is > set to 'true' and there is no in-core UNTR already. My current patch series does these changes in wt_status_collect_untracked() because currently the UNTR is mostly used only in git status, so it feels safer to me to not affect other code paths. > That way, those who use /etc/gitconfig to force the feature over > their users would be able to set it to 'true', those who have been > using the feature in some but not all of their repositories won't > see any different behaviour until they muck with the configuration > (and if they are paranoid and want to opt out of their administrator's > setting, they can set it to 'keep' in their $HOME/.gitconfig to make > sure their repositories are not affected). > > Orthogonal to the "config overrides the existing users' practice" > issue, I still think that [PATCH v2 10/10] goes too far to remove > the safety based on the working tree location. Checking kernel > version and other thing may be too much, but the check based on the > working tree location is a cheap way to make sure that those who do > unusual things (namely, use $GIT_DIR/$GIT_WORK_TREE or their > equivalents to tell Git that the working tree for this invocation is > at a place different from what UNTR data was prepared for) are not > harmed by incorrectly reusing the cached data for an unrelated > location. So another tweak I'd feel better to see is to resurrect > that safety. This has been changed in v3, see patch 09/11, and I think it should now work as you suggest. > I wouldn't have as much issue with such a scheme as I had with the > earlier description of the design in the Christian's series. Great! I am preparing a v4 that I hope to send in a few days. By the way the v3 does not pass its own tests due to a bug introduced in last minute changes. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Thu, Dec 24, 2015 at 8:56 AM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Duy Nguyen writes: >> >>> In that case we can just check config once in read_index_from and >>> destroy UNTR extension. Or the middle ground, we check config once in >>> that place, make a note in struct index_state, and make invalidate_* >>> check that note instead of config file. The middle ground has an >>> advantage over destroying UNTR: (probably) many operations will touch >>> index but do not add or remove entries. >> >> Or we can even teach read_index_from() to skip reading the extension >> without even parsing when the configuration tells it that the >> feature is force-disabled. It can also add an empty one when the >> configuration tells it that the feature is force-enabled and there >> is no UNTR extension yet. > > Thinking about this a bit more, I am starting to feel that the > config that can be used to optionally override the presence of > in-index UNTR extension does not have to be too bad a thing, > provided if it is done with a few tweaks to the design I read in > Christian & Ævar's messages. > > One tweak is to address the following from Ævar's message: > >>> Once this series is applied and git is shipped with it existing >>> users that have set "git update-index --untracked-cache" will have >>> their UC feature disabled. This is because we fall back to "oh no >>> config here? Must have been disabled, rm it from the index" clause >>> which keeps our UC from going stale in the face of config >>> flip-flopping. > > The above would happen only if the configuration is a boolean that > defaults to false. I'd think we can have this a tristate instead. > That is, config.untrackedCache can be explicitly set to 'true', > 'false', or 'keep'. And make a missing config.untrackedCache > default to 'keep'. > > When read_index_from() reads an existing index: > > When the configuration is set to 'true': > an existing UNTR is kept, otherwise a new UNTR gets added. > When the configuration is set to 'false: > an existing UNTR is dropped. > When the configuration is set to 'keep': > an existing UNTR is kept, otherwise nothing happens. Sounds good, except.. > When write_index() writes out an index that wasn't initialized from > the filesystem, a new UNTR gets added only when the configuration is > set to 'true' and there is no in-core UNTR already. Do we really need this? If the index is created fresh from the core, first write will not have UNTR. Next time it's loaded up, UNTR is added if config is true, based on the above rules. The new UNTR extension (should?) marks the index as dirty as writing out is required. > Orthogonal to the "config overrides the existing users' practice" > issue, Yeah it looks orthogonal to me too. There should be a separate config file like /etc/gitconfig.lockdown, that will override user config vars (but we still can't undefine user config vars yet, known missing feature). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Junio C Hamanowrites: > Duy Nguyen writes: > >> In that case we can just check config once in read_index_from and >> destroy UNTR extension. Or the middle ground, we check config once in >> that place, make a note in struct index_state, and make invalidate_* >> check that note instead of config file. The middle ground has an >> advantage over destroying UNTR: (probably) many operations will touch >> index but do not add or remove entries. > > Or we can even teach read_index_from() to skip reading the extension > without even parsing when the configuration tells it that the > feature is force-disabled. It can also add an empty one when the > configuration tells it that the feature is force-enabled and there > is no UNTR extension yet. Thinking about this a bit more, I am starting to feel that the config that can be used to optionally override the presence of in-index UNTR extension does not have to be too bad a thing, provided if it is done with a few tweaks to the design I read in Christian & Ævar's messages. One tweak is to address the following from Ævar's message: >> Once this series is applied and git is shipped with it existing >> users that have set "git update-index --untracked-cache" will have >> their UC feature disabled. This is because we fall back to "oh no >> config here? Must have been disabled, rm it from the index" clause >> which keeps our UC from going stale in the face of config >> flip-flopping. The above would happen only if the configuration is a boolean that defaults to false. I'd think we can have this a tristate instead. That is, config.untrackedCache can be explicitly set to 'true', 'false', or 'keep'. And make a missing config.untrackedCache default to 'keep'. When read_index_from() reads an existing index: When the configuration is set to 'true': an existing UNTR is kept, otherwise a new UNTR gets added. When the configuration is set to 'false: an existing UNTR is dropped. When the configuration is set to 'keep': an existing UNTR is kept, otherwise nothing happens. When write_index() writes out an index that wasn't initialized from the filesystem, a new UNTR gets added only when the configuration is set to 'true' and there is no in-core UNTR already. That way, those who use /etc/gitconfig to force the feature over their users would be able to set it to 'true', those who have been using the feature in some but not all of their repositories won't see any different behaviour until they muck with the configuration (and if they are paranoid and want to opt out of their administrator's setting, they can set it to 'keep' in their $HOME/.gitconfig to make sure their repositories are not affected). Orthogonal to the "config overrides the existing users' practice" issue, I still think that [PATCH v2 10/10] goes too far to remove the safety based on the working tree location. Checking kernel version and other thing may be too much, but the check based on the working tree location is a cheap way to make sure that those who do unusual things (namely, use $GIT_DIR/$GIT_WORK_TREE or their equivalents to tell Git that the working tree for this invocation is at a place different from what UNTR data was prepared for) are not harmed by incorrectly reusing the cached data for an unrelated location. So another tweak I'd feel better to see is to resurrect that safety. I wouldn't have as much issue with such a scheme as I had with the earlier description of the design in the Christian's series. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 22, 2015 at 1:30 AM, Junio C Hamanowrote: > Duy Nguyen writes: > >> On Thu, Dec 17, 2015 at 2:44 PM, Jeff King wrote: >>> I think we may actually be thinking of the same thing. Naively, I would >>> expect: >>> >>> .. >>> - if there is cache data in the index but that config flag is not set, >>> presumably we would not update it (we could even explicitly drop it, >>> but my understanding is that is not necessary for correctness, but >>> only as a possible optimization). >> >> No, if somebody adds or removes something from the index, we either >> update or drop it, or it's stale. There's the invalidate_untracked() >> or something in dir.c that we can hook in, check config var and do >> that. And because config is cached recently, it should be a cheap >> operation. > > Checking the config may be cheap, but it bothers me a lot that we > have to call that "invalidate" thing every time we go into the > codepath to deal with the index, from code cleanliness point of > view. In that case we can just check config once in read_index_from and destroy UNTR extension. Or the middle ground, we check config once in that place, make a note in struct index_state, and make invalidate_* check that note instead of config file. The middle ground has an advantage over destroying UNTR: (probably) many operations will touch index but do not add or remove entries. Though I may be wrong, replacing an entry may be implemented as delete then add, I haven't checked the code. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Duy Nguyenwrites: > In that case we can just check config once in read_index_from and > destroy UNTR extension. Or the middle ground, we check config once in > that place, make a note in struct index_state, and make invalidate_* > check that note instead of config file. The middle ground has an > advantage over destroying UNTR: (probably) many operations will touch > index but do not add or remove entries. Or we can even teach read_index_from() to skip reading the extension without even parsing when the configuration tells it that the feature is force-disabled. It can also add an empty one when the configuration tells it that the feature is force-enabled and there is no UNTR extension yet. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Duy Nguyenwrites: > On Thu, Dec 17, 2015 at 2:44 PM, Jeff King wrote: >> I think we may actually be thinking of the same thing. Naively, I would >> expect: >> >> .. >> - if there is cache data in the index but that config flag is not set, >> presumably we would not update it (we could even explicitly drop it, >> but my understanding is that is not necessary for correctness, but >> only as a possible optimization). > > No, if somebody adds or removes something from the index, we either > update or drop it, or it's stale. There's the invalidate_untracked() > or something in dir.c that we can hook in, check config var and do > that. And because config is cached recently, it should be a cheap > operation. Checking the config may be cheap, but it bothers me a lot that we have to call that "invalidate" thing every time we go into the codepath to deal with the index, from code cleanliness point of view. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Thu, Dec 17, 2015 at 1:36 PM, Duy Nguyenwrote: > On Wed, Dec 16, 2015 at 4:53 AM, Ævar Arnfjörð Bjarmason > wrote: >> On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano wrote: >>> Ævar Arnfjörð Bjarmason writes: >>> I still have a problem with the approach from "design cleanliness" >>> point of view[...] >>> >>> In any case I think we already have agreed to disagree on this >>> point, so there is no use discussing it any longer from my side. I >>> am not closing the door to this series, but I am not convinced, >>> either. At least not yet. >> >> In general the fantastic thing about the git configuration facility is >> that it provides both systems administrators and normal users with >> what they want. It's possible to configure things system-wide and >> override those on a user or repository basis. >> >> Of course hindsight is 20/20, but I think that given what's been >> covered in this thread it's been established that it's categorically >> better that if we introduce features like these that they be >> configured through the normal configuration facility rather than the >> configuration being sticky to the index. > > A minor note for implementers. We need to check that config is loaded > first. read-cache.c, being part of the core, does not bother itself > with config loading. And I think so far it has not used any config > vars. If a command forgets (*) to load the config, the cache may be > deleted (if we do it the safe way). > > (*) is there any command deliberately avoid loading config? git-clone > and git-init are special, but for this particular case it's probably > fine. Thanks for this note. Looking at the current patch, the global variable in which the value of the core.untrackedCache config var is stored is "use_untracked_cache". It is used in the following places: - wt_status_collect_untracked() in wt-status.c which is called only by "git status" and "git commit" after the config has been loaded. - cmd_update_index() in builtin/update-index.c which loads the config before using it - validate_untracked_cache() in dir.c where it is used in: if (use_untracked_cache != 1 && !ident_in_untracked(dir->untracked)) { warning(_("Untracked cache is disabled on this system.")); return NULL; } but this "if" and its contents are removed by patch 10/10 in v2. So at the end of this patch series, there is no risk of use_untracked_cache not being properly setup. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Wed, Dec 16, 2015 at 4:53 AM, Ævar Arnfjörð Bjarmasonwrote: > On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> I still have a problem with the approach from "design cleanliness" >> point of view[...] >> >> In any case I think we already have agreed to disagree on this >> point, so there is no use discussing it any longer from my side. I >> am not closing the door to this series, but I am not convinced, >> either. At least not yet. > > In general the fantastic thing about the git configuration facility is > that it provides both systems administrators and normal users with > what they want. It's possible to configure things system-wide and > override those on a user or repository basis. > > Of course hindsight is 20/20, but I think that given what's been > covered in this thread it's been established that it's categorically > better that if we introduce features like these that they be > configured through the normal configuration facility rather than the > configuration being sticky to the index. A minor note for implementers. We need to check that config is loaded first. read-cache.c, being part of the core, does not bother itself with config loading. And I think so far it has not used any config vars. If a command forgets (*) to load the config, the cache may be deleted (if we do it the safe way). (*) is there any command deliberately avoid loading config? git-clone and git-init are special, but for this particular case it's probably fine. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Thu, Dec 17, 2015 at 2:44 PM, Jeff Kingwrote: > I think we may actually be thinking of the same thing. Naively, I would > expect: > > .. > - if there is cache data in the index but that config flag is not set, > presumably we would not update it (we could even explicitly drop it, > but my understanding is that is not necessary for correctness, but > only as a possible optimization). No, if somebody adds or removes something from the index, we either update or drop it, or it's stale. There's the invalidate_untracked() or something in dir.c that we can hook in, check config var and do that. And because config is cached recently, it should be a cheap operation. Apart from that the idea is fine. > You could have a config option for "if there is a cache there, pretend > it isn't and ignore it", but I don't see much point. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 10:05:57PM -0800, Junio C Hamano wrote: > Junio C Hamanowrites: > > > This is why the_index.has_untracked_cache is not just a simple "Do I > > want to use this feature?" boolean configuration. The index also > > stores the real data, and "Am I using this feature?" bit goes hand > > in hand with that real data. Thinking that this is merely a boolean > > configuration is the real source of the confusion, and introducing a > > config that overrules what the user has stored in the index needs to > > add complexity. > > > > The additional complexity may (or may not) be justifiable, but in > > any case "all other things being equal, this is a config" feels like > > a flawed observation. > > To put it another way, the "bit" in the index (i.e. the presence of > the cached data) is "Am I using the feature now?". The effect of > the feature has to (and is designed to) persist, as it is a cache > and you do not want a stale cache to give you wrong answers. There > is no "Do I want to use the feature?" preference, in other words. > > And I do not mind creating such a preference bit as a configuration. > > That is why I suggested such a configuration to cause the equivalent > of "update-index --untracked-cache" when a new index is created from > scratch (as opposed to the case where the previously created cache > data is carried forward across read_cache() -> do things to the > index -> write_cache() flow). Doing it that way will not have to > involve additional complexity that comes from the desire that > setting a single configuration on (or off) has to suddenly change > the behaviour of an index file that is already using (or not using) > the feature. I think we may actually be thinking of the same thing. Naively, I would expect: - if there is untracked cache data in the index, we will make use of it when enumerating untracked files (and my understanding is that if it is stale, we can detect that) - if core.untrackedCache is set, we will update and write out an untracked cache when we are enumerating the untracked files - if there is cache data in the index but that config flag is not set, presumably we would not update it (we could even explicitly drop it, but my understanding is that is not necessary for correctness, but only as a possible optimization). You could have a config option for "if there is a cache there, pretend it isn't and ignore it", but I don't see much point. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 10:49 AM, Torsten Bögershausenwrote: > On 15.12.15 10:34, Christian Couder wrote: >> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano wrote: >>> Junio C Hamano writes: >>> >>> The primary reason why I do not like your "configuration decides" is >>> it will be a huge source of confusions and bugs. Imagine what >>> should happen in this sequence, and when should a stale cached >>> information be discarded? >>> >>> - the configuration is set to 'yes'. >>> - the index is updated and written by various commands. >>> - more work is done in the working tree without updating the index. >>> - the configuration is set to 'no'. >>> - more work is done in the working tree without updating the index. >>> - the configuration is set to 'yes'. >>> - more work is done in the working tree without updating the index. >>> - somebody asks "what untracked paths are there?" >> > >> As far as I understand the UC just stores the mtime of the directories >> in the working tree to avoid the need of lstat'ing all the files in >> the directories. > > This is what I understand: > UC stores the mtime of the directories in the working tree to avoid the need > opendir() readdir() closedir() to find new, yet untracked, files. > (including sub-directories) I think you are probably right too. In the v2 patch series I just sent, there is: +This feature works by recording the mtime of the working tree +directories and then omitting reading directories and stat calls +against files in those directories whose mtime hasn't changed. I hope it is better. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 11:02 AM, Duy Nguyenwrote: > On Tue, Dec 15, 2015 at 4:34 PM, Christian Couder > wrote: >> On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano wrote: >>> Junio C Hamano writes: >>> >>> The primary reason why I do not like your "configuration decides" is >>> it will be a huge source of confusions and bugs. Imagine what >>> should happen in this sequence, and when should a stale cached >>> information be discarded? >>> >>> - the configuration is set to 'yes'. >>> - the index is updated and written by various commands. >>> - more work is done in the working tree without updating the index. >>> - the configuration is set to 'no'. >>> - more work is done in the working tree without updating the index. >>> - the configuration is set to 'yes'. >>> - more work is done in the working tree without updating the index. >>> - somebody asks "what untracked paths are there?" >> >> As far as I understand the UC just stores the mtime of the directories >> in the working tree to avoid the need of lstat'ing all the files in >> the directories. >> >> When somebody asks "what untracked paths are there", if the UC has not >> been discarded when the configuration was set to no, then git will >> just ask for the mtimes of the directories in the working tree and >> compare them with what is in the UC. >> >> I don't see how it can create confusion and bugs, as the work done in >> the working tree should anyway have changed the mtime of the >> directories where work has been done. > > Any operation that can add or remove an entry from the index may lead > to UC update. For example, if file "foo" is tracked, then the user > does "git rm --cached foo", "foo" may become either untracked or > ignored. So if you disable UC while doing this removal, then re-enable > UC again, "git-status" may show incorrect output. So, as long as UC > extension exists in the index, it must be updated, or it may become > outdated and useless. When UC is disabled, it is removed from the index, and when it is (re-)enabled, it is recreated, so I don't think that your example applies to the code in this patch. Thanks anyway for this insight, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Ævar Arnfjörð Bjarmasonwrites: > The way the "config decides" patch series deals with this is that if > you have the UC information in the index and the configuration is set > to core.untrackedCache=false the UC will be removed from the index. > > Otherwise you would indeed easily end up with a stale cache,... Yeah, that's "correctness" thing; it goes without saying that it would be unacceptable if the series did not get this right. I still have a problem with the approach from "design cleanliness" point of view, primarily that the index already has a bit to tell us if the user already said that she wants to use the feature, but because you want to make config win, the code needs to always read the config to allow it to disable the feature in the index, just in case the data that is already in the index says otherwise, and the code has to keep doing that even after the data is removed [*1*]. Unfortunately, I do not think you can solve the "design cleanliness" problem unless you give up "we want /etc/gitconfig override". In any case I think we already have agreed to disagree on this point, so there is no use discussing it any longer from my side. I am not closing the door to this series, but I am not convinced, either. At least not yet. > Once this series is applied and git is shipped with it, existing > users that have set "git update-index --untracked-cache" will have > their UC feature disabled. Well, the fix to that is merely just a one-shot thing away, so it would not be too much of a hassle, no [*2*]? So I do not think it would be such a big issue. > We *could* make even that use-case work by detecting the legacy marker > for the UC in the index (the uname info), then we'd do a one-time "git > config --local core.untrackedCache true" and remove the marker. I do not think we want to go there---that would mean you would need to revamp the repository format version because the old tools would be now unusable on the index/config combo your version mucked with. [Footnote] *1* I also do not want to see that design pattern imitated and used in other parts of the system, and this gives a precedent for people to copy. *2* Yet, those who are broken by this series may say "it is unacceptable that we have to survey all existing repositories and selectively add the configuration to the ones that have enabled the feature before updating.", the same way you complain against the "index already knows" approach. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamanowrote: > Junio C Hamano writes: > > The primary reason why I do not like your "configuration decides" is > it will be a huge source of confusions and bugs. Imagine what > should happen in this sequence, and when should a stale cached > information be discarded? > > - the configuration is set to 'yes'. > - the index is updated and written by various commands. > - more work is done in the working tree without updating the index. > - the configuration is set to 'no'. > - more work is done in the working tree without updating the index. > - the configuration is set to 'yes'. > - more work is done in the working tree without updating the index. > - somebody asks "what untracked paths are there?" As far as I understand the UC just stores the mtime of the directories in the working tree to avoid the need of lstat'ing all the files in the directories. When somebody asks "what untracked paths are there", if the UC has not been discarded when the configuration was set to no, then git will just ask for the mtimes of the directories in the working tree and compare them with what is in the UC. I don't see how it can create confusion and bugs, as the work done in the working tree should anyway have changed the mtime of the directories where work has been done. Maybe as the UC has not been updated for a long time, there will be a lot of mtimes that are different, so there will not be a big speed up or it could be even slower than if the UC was not used, but that's all. > In the "configuration decides" world, I am not sure how a sane > implementation efficiently invalidates the cache as needed, without > the config subsystem having intimate knowledge with the untracked > cache (so far, the config subsystem is merely a key-value store and > does not care _what_ it stores; you would want to invalidate the > cache in the index when somebody sets the variable to 'no', which > means the config subsystem needs to know that this variable is > special). In the current patch series and in the one I am preparing and will probably send soon, this hunk takes care of removing or addind the UC if needed: diff --git a/wt-status.c b/wt-status.c index 435fc28..3e0fe02 100644 --- a/wt-status.c +++ b/wt-status.c @@ -586,6 +586,15 @@ static void wt_status_collect_untracked(struct wt_status *s) dir.flags |= DIR_SHOW_IGNORED_TOO; else dir.untracked = the_index.untracked; + + if (!dir.untracked && use_untracked_cache == 1) { + add_untracked_cache(); + dir.untracked = the_index.untracked; + } else if (dir.untracked && use_untracked_cache == 0) { + remove_untracked_cache(); + dir.untracked = NULL; + } + setup_standard_excludes(); fill_directory(, >pathspec); So when the config option is changed, the UC is removed or recreated only the next time "git status" (and maybe also "git commit" and a few other commands) is called. And anyway I don't think people will change the UC config very often. Maybe they will play with it a bit when they discover it, but afterwards they will just set it or not and be done with it. So I think it is not worth trying to optimize what happens when the config is set or unset. We should just make sure that it works and it is well documented. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 4:34 PM, Christian Couderwrote: > On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamano wrote: >> Junio C Hamano writes: >> >> The primary reason why I do not like your "configuration decides" is >> it will be a huge source of confusions and bugs. Imagine what >> should happen in this sequence, and when should a stale cached >> information be discarded? >> >> - the configuration is set to 'yes'. >> - the index is updated and written by various commands. >> - more work is done in the working tree without updating the index. >> - the configuration is set to 'no'. >> - more work is done in the working tree without updating the index. >> - the configuration is set to 'yes'. >> - more work is done in the working tree without updating the index. >> - somebody asks "what untracked paths are there?" > > As far as I understand the UC just stores the mtime of the directories > in the working tree to avoid the need of lstat'ing all the files in > the directories. > > When somebody asks "what untracked paths are there", if the UC has not > been discarded when the configuration was set to no, then git will > just ask for the mtimes of the directories in the working tree and > compare them with what is in the UC. > > I don't see how it can create confusion and bugs, as the work done in > the working tree should anyway have changed the mtime of the > directories where work has been done. Any operation that can add or remove an entry from the index may lead to UC update. For example, if file "foo" is tracked, then the user does "git rm --cached foo", "foo" may become either untracked or ignored. So if you disable UC while doing this removal, then re-enable UC again, "git-status" may show incorrect output. So, as long as UC extension exists in the index, it must be updated, or it may become outdated and useless. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On 15.12.15 10:34, Christian Couder wrote: > On Mon, Dec 14, 2015 at 10:30 PM, Junio C Hamanowrote: >> Junio C Hamano writes: >> >> The primary reason why I do not like your "configuration decides" is >> it will be a huge source of confusions and bugs. Imagine what >> should happen in this sequence, and when should a stale cached >> information be discarded? >> >> - the configuration is set to 'yes'. >> - the index is updated and written by various commands. >> - more work is done in the working tree without updating the index. >> - the configuration is set to 'no'. >> - more work is done in the working tree without updating the index. >> - the configuration is set to 'yes'. >> - more work is done in the working tree without updating the index. >> - somebody asks "what untracked paths are there?" > > As far as I understand the UC just stores the mtime of the directories > in the working tree to avoid the need of lstat'ing all the files in > the directories. This is what I understand: UC stores the mtime of the directories in the working tree to avoid the need opendir() readdir() closedir() to find new, yet untracked, files. (including sub-directories) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > I still have a problem with the approach from "design cleanliness" > point of view[...] > > In any case I think we already have agreed to disagree on this > point, so there is no use discussing it any longer from my side. I > am not closing the door to this series, but I am not convinced, > either. At least not yet. In general the fantastic thing about the git configuration facility is that it provides both systems administrators and normal users with what they want. It's possible to configure things system-wide and override those on a user or repository basis. Of course hindsight is 20/20, but I think that given what's been covered in this thread it's been established that it's categorically better that if we introduce features like these that they be configured through the normal configuration facility rather than the configuration being sticky to the index. It gives you everything that the per-index configuration gives you and more. So assuming that's the case, how do we migrate something that's configured via the index towards being configured through git-config? I think there's no general answer to that, but in this case the worst case scenario with accepting this series as-is is that we downgrade some users who've opted in to it to pre-v2.5.0 "git status" performance. Since the change in performance really isn't noticeable except on really large repositories, which are more likely to have someone involved watching the changelog on upgrades I think that's OK. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Ævar Arnfjörð Bjarmasonwrites: > Of course hindsight is 20/20, but I think that given what's been > covered in this thread it's been established that it's categorically > better that if we introduce features like these that they be > configured through the normal configuration facility rather than the > configuration being sticky to the index. I doubt that any such thing has been established at all in this thread. It may be true that you and perhaps Christian loudly repeated it, but loudly repeating something and establishing something as a fact are slightly different. The thing is, I do not necessarily view this as "configuration". The way I see the feature is that you say "--untracked" when you want the states of untracked paths be kept track of in the index, just like you say "git add Makefile" when you want the state of 'Makefile' be kept track of in the index. Either the index keeps track of it, or it doesn't, based solely on user's request, and the bit to tell us which is the case is already in the index, exactly because that is part of the data that is kept track of in the index. > Since the change in performance really isn't noticeable except on > really large repositories, which are more likely to have someone > involved watching the changelog on upgrades I think that's OK. Especially it is dubious to me that the trade-off you are making with this design is a good one. In order to avoid paying a one-time cost to run "update-index --untracked-cache" at sites that _do_ want to use that feature (and after that, if you teach "git init" and "git clone" to pay attention to the "give you the default" configuration to run it for you, so that your users won't have to), you are forcing all codepaths that makes any write to the index (not just "init"-time) to make an extra check with the configuration all the time for everybody, because you made the presence of the untracked cache data in the index not usable as a sign that the user wants to use that feature. If the feature is something only those with really large repositories care about, is it a good trade-off to make everybody pay the runtime cost and make code more complex and fragile? I am not yet convinced. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Jeff Kingwrites: >> If the feature is something only those with really large repositories >> care about, is it a good trade-off to make everybody pay the runtime >> cost and make code more complex and fragile? I am not yet convinced. > > I'm not sure I understand the runtime and complexity costs of the config > option. Isn't it just: > > if (core_untracked_cache) { > /* do the thing */ > } > > and loading core_untracked_cache in git_default_core_config()? Versus: > > if (the_index.has_untracked_cache) { > /* do the thing */ > } The latter is pretty much so, but the former needs to be a bit more involved, e.g. more like: if (core_untracked_cache) { if (!the_index.has_untracked_cache) create_and_attach_uc(_index); /* do the thing */ } else { if (the_index.has_untracked_cache) destroy and remove untracked cache; } Otherwise, after adding the cache to the index, flipping the config off, doing things with the index and working tree and then filpping the config back on, the index would have a stale cache that does not know what happened to the working tree while config was off, and your "git status" will start throwing a wrong result. This is why the_index.has_untracked_cache is not just a simple "Do I want to use this feature?" boolean configuration. The index also stores the real data, and "Am I using this feature?" bit goes hand in hand with that real data. Thinking that this is merely a boolean configuration is the real source of the confusion, and introducing a config that overrules what the user has stored in the index needs to add complexity. The additional complexity may (or may not) be justifiable, but in any case "all other things being equal, this is a config" feels like a flawed observation. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Junio C Hamanowrites: > This is why the_index.has_untracked_cache is not just a simple "Do I > want to use this feature?" boolean configuration. The index also > stores the real data, and "Am I using this feature?" bit goes hand > in hand with that real data. Thinking that this is merely a boolean > configuration is the real source of the confusion, and introducing a > config that overrules what the user has stored in the index needs to > add complexity. > > The additional complexity may (or may not) be justifiable, but in > any case "all other things being equal, this is a config" feels like > a flawed observation. To put it another way, the "bit" in the index (i.e. the presence of the cached data) is "Am I using the feature now?". The effect of the feature has to (and is designed to) persist, as it is a cache and you do not want a stale cache to give you wrong answers. There is no "Do I want to use the feature?" preference, in other words. And I do not mind creating such a preference bit as a configuration. That is why I suggested such a configuration to cause the equivalent of "update-index --untracked-cache" when a new index is created from scratch (as opposed to the case where the previously created cache data is carried forward across read_cache() -> do things to the index -> write_cache() flow). Doing it that way will not have to involve additional complexity that comes from the desire that setting a single configuration on (or off) has to suddenly change the behaviour of an index file that is already using (or not using) the feature. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 2:04 PM, Ævar Arnfjörð Bjarmasonwrote: > On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano wrote: > > I'm replying to & quoting from two E-Mails of yours at once here for > clarity & less noise. I'm working wich Christian on getting this > integrated, and we both thought it would be good to have some fresh > input on the matter from me. > >> Christian Couder writes: > >>> If you want only some repos to use the UC, you will set >>> core.untrackedCache in the repo config. Then after cloning such a >>> repo, you will copy the config file, and this will not be enough to >>> enable the UC. >> >> Surely. "Does this index file keeps track of the untracked files' >> states?" is a property of the index. Cloning does not propagate the >> configuration and copying or not copying is irrelevant. If you want >> to enable, running "update-index --untracked-cache" is a way to do >> so. I cannot see what's so hard about it. >> >>> And if you have set core.untrackedCache in the global config when you >>> clone, UC is enabled, but if you have just set it in the repo config >>> after the clone, it is not enabled. >> >> That's fine. In your patch series, if you set it in the global, you >> will get the cache in the new one. With the cleaned-up semantics I >> suggested, the same thing will happen. >> >> And with the cleaned-up semantics, the configuration is *ONLY* used >> to give the *DEFAULT* before other things happen, i.e. creation of >> the index file for the first time. Because the configuration is >> only the default, an explicit "update-index --[no-]untracked-cache" >> will defeat it, just like any other config/option interaction. > > As you know Christian is working on this for Booking.com to integrate > features we find useful into git.git in such a way that we don't have > to maintain some internal fork of Git. > > What we're trying to do, and what a lot of other big deployments of > Git elsewhere would also find useful, is to ship a default sensible > configuration for all users on the system in /etc/gitconfig. > > I'd like to be able to easily enable some feature that aids Git > performance globally on our thousands of machines and for our hundreds > of users by just tweaking something in puppet to change > /etc/gitconfig, and more importantly if that change ends up being bad > reverting that config in /etc/gitconfig should undo the change. > > It's an unacceptable level of complexity for system-level automation > to have to scour the filesystem for existing Git repositories and run > "git update-index" on each of them, that's why we're submitting > patches to make this a config option, so we can simply flip a flag in > /etc/gitconfig. > > It's also unacceptable to have the config simply provide the default > which'll be frozen either at clone time or after an initial "git > status". > > Let's say I ship a /etc/gitconfig that says "new clones should use the > untracked cache". Now I roll that out across our fleet of machines and > it turns out the morning after that the feature doesn't work properly > for whatever reason. If it's just a "default until clone or status" > type of thing even if I revert the configuration a lot of users & > their repositories in the wild will still be broken, and will have to > be manually fixed. Which again leads to the scouring the filesystem > problem. > > So that gives some more context for why we're pushing for this change. > I believe this feature breaks no existing use-case and just supports > new ones, and I think that your objections to it are based on a simple > misunderstanding as will become apparent if you read on below. > >> The biggest issue I had with your patch series, IIRC, is that >> configuration will defeat the command line option. > > I think it's a moot point to focus on configuration v.s. command-line > option. The important question is whether or not this feature can > still be configured on a repo-local basis with this series as before. > That's still the case since --local git configuration overrides > --global and --system, so users who want to enable/disable this > per-repo still can. > >>> Shouldn't it be nice if they could just enable core.untrackedCache in >>> the global config files without having to also cd into every repo and >>> use "git update-index --untracked-cache" there? >> >> NO. It is bad to change the behaviour behind users' back. > > I'm not quite sure what the objection here is exactly. If you're a > normal user you can enable/disable this per-repo just like you can > now, and enable/disable it for all your repos in ~/.gitconfig. > > If you mean that the user's configuration shouldn't be changed by the > global config in /etc/gitconfig I do think that's a moot point. If > you're a user on a system where I have root and I want to change your > Git configuration I'm going to be able to do that whatever the > mechanism is. > > That's indeed
Re: [PATCH 7/8] config: add core.untrackedCache
On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamanowrote: I'm replying to & quoting from two E-Mails of yours at once here for clarity & less noise. I'm working wich Christian on getting this integrated, and we both thought it would be good to have some fresh input on the matter from me. > Christian Couder writes: >> If you want only some repos to use the UC, you will set >> core.untrackedCache in the repo config. Then after cloning such a >> repo, you will copy the config file, and this will not be enough to >> enable the UC. > > Surely. "Does this index file keeps track of the untracked files' > states?" is a property of the index. Cloning does not propagate the > configuration and copying or not copying is irrelevant. If you want > to enable, running "update-index --untracked-cache" is a way to do > so. I cannot see what's so hard about it. > >> And if you have set core.untrackedCache in the global config when you >> clone, UC is enabled, but if you have just set it in the repo config >> after the clone, it is not enabled. > > That's fine. In your patch series, if you set it in the global, you > will get the cache in the new one. With the cleaned-up semantics I > suggested, the same thing will happen. > > And with the cleaned-up semantics, the configuration is *ONLY* used > to give the *DEFAULT* before other things happen, i.e. creation of > the index file for the first time. Because the configuration is > only the default, an explicit "update-index --[no-]untracked-cache" > will defeat it, just like any other config/option interaction. As you know Christian is working on this for Booking.com to integrate features we find useful into git.git in such a way that we don't have to maintain some internal fork of Git. What we're trying to do, and what a lot of other big deployments of Git elsewhere would also find useful, is to ship a default sensible configuration for all users on the system in /etc/gitconfig. I'd like to be able to easily enable some feature that aids Git performance globally on our thousands of machines and for our hundreds of users by just tweaking something in puppet to change /etc/gitconfig, and more importantly if that change ends up being bad reverting that config in /etc/gitconfig should undo the change. It's an unacceptable level of complexity for system-level automation to have to scour the filesystem for existing Git repositories and run "git update-index" on each of them, that's why we're submitting patches to make this a config option, so we can simply flip a flag in /etc/gitconfig. It's also unacceptable to have the config simply provide the default which'll be frozen either at clone time or after an initial "git status". Let's say I ship a /etc/gitconfig that says "new clones should use the untracked cache". Now I roll that out across our fleet of machines and it turns out the morning after that the feature doesn't work properly for whatever reason. If it's just a "default until clone or status" type of thing even if I revert the configuration a lot of users & their repositories in the wild will still be broken, and will have to be manually fixed. Which again leads to the scouring the filesystem problem. So that gives some more context for why we're pushing for this change. I believe this feature breaks no existing use-case and just supports new ones, and I think that your objections to it are based on a simple misunderstanding as will become apparent if you read on below. > The biggest issue I had with your patch series, IIRC, is that > configuration will defeat the command line option. I think it's a moot point to focus on configuration v.s. command-line option. The important question is whether or not this feature can still be configured on a repo-local basis with this series as before. That's still the case since --local git configuration overrides --global and --system, so users who want to enable/disable this per-repo still can. >> Shouldn't it be nice if they could just enable core.untrackedCache in >> the global config files without having to also cd into every repo and >> use "git update-index --untracked-cache" there? > > NO. It is bad to change the behaviour behind users' back. I'm not quite sure what the objection here is exactly. If you're a normal user you can enable/disable this per-repo just like you can now, and enable/disable it for all your repos in ~/.gitconfig. If you mean that the user's configuration shouldn't be changed by the global config in /etc/gitconfig I do think that's a moot point. If you're a user on a system where I have root and I want to change your Git configuration I'm going to be able to do that whatever the mechanism is. That's indeed that's what we're doing to enable this at Booking.com currently, we run a job to find some limited set of common checkouts and run "git update-index" for users as root. The problem with that is that it's needlessly complex, hence this
Re: [PATCH 7/8] config: add core.untrackedCache
On Wed, Dec 16, 2015 at 12:03 AM, Junio C Hamanowrote: > Ævar Arnfjörð Bjarmason writes: > >> Of course hindsight is 20/20, but I think that given what's been >> covered in this thread it's been established that it's categorically >> better that if we introduce features like these that they be >> configured through the normal configuration facility rather than the >> configuration being sticky to the index. > > I doubt that any such thing has been established at all in this > thread. It may be true that you and perhaps Christian loudly > repeated it, but loudly repeating something and establishing > something as a fact are slightly different. > > The thing is, I do not necessarily view this as "configuration". > The way I see the feature is that you say "--untracked" when you > want the states of untracked paths be kept track of in the index. You probably know this, but the --untracked-cache has no bearing on what we actually keep track of, it's just an optimization for how efficiently we execute "git status" commands without the "-uno" option. We still produce the same output. > just like you say "git add Makefile" when you want the state of > 'Makefile' be kept track of in the index. Either the index keeps > track of it, or it doesn't, based solely on user's request, and the > bit to tell us which is the case is already in the index, exactly > because that is part of the data that is kept track of in the index. What I mean by "[we've] established that it's categorically better [to do this via git-config]" is that we can still do all that stuff, we can just also do more stuff now. >> Since the change in performance really isn't noticeable except on >> really large repositories, which are more likely to have someone >> involved watching the changelog on upgrades I think that's OK. > > Especially it is dubious to me that the trade-off you are making > with this design is a good one. In order to avoid paying a one-time > cost to run "update-index --untracked-cache" at sites that _do_ want > to use that feature (and after that, if you teach "git init" and > "git clone" to pay attention to the "give you the default" > configuration to run it for you, so that your users won't have to), It's not unreasonable to avoid the cost of running "update-index --untracked-cache", it's the difference between just adjusting /etc/gitconfig and continually having to traverse the entire / filesystem if you want to enable this feature on a system-wide basis. It should be easy to enable any Git feature via the configuration facility either on a --system, or --global or --local basis. > you are forcing all codepaths that makes any write to the index (not > just "init"-time) to make an extra check with the configuration all > the time for everybody, because you made the presence of the > untracked cache data in the index not usable as a sign that the user > wants to use that feature. Maybe I'm misunderstanding Christian's patches but don't we already parse the git configuration on any commands that update the index anyway? See git_default_core_config(). We already parse the git configuration to run "git status". > If the feature is something only those > with really large repositories care about, is it a good trade-off to > make everybody pay the runtime cost and make code more complex and > fragile? I am not yet convinced. I was arguing that only users with really large repositories would notice if we turned this off because the enabling facility had changed from per-index to config. But it doesn't follow that the expense of checking the git configuration which we're parsing anyway for the index-related commands makes things more complex & fragile. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 03:03:14PM -0800, Junio C Hamano wrote: > The thing is, I do not necessarily view this as "configuration". > The way I see the feature is that you say "--untracked" when you > want the states of untracked paths be kept track of in the index, > just like you say "git add Makefile" when you want the state of > 'Makefile' be kept track of in the index. Either the index keeps > track of it, or it doesn't, based solely on user's request, and the > bit to tell us which is the case is already in the index, exactly > because that is part of the data that is kept track of in the index. I know this is a fairly subjective argument, but it feels quite weird for me for such a config to persist in the index and not be mentioned anywhere else. Is there any other user-specified configuration option for which: rm -f .git/index git read-tree HEAD will actually _lose_ information? It seems to me that all other things being equal, we should be in favor of a config option simply because it reduces the cognitive burden on the user: it's one fewer place they need to be aware that git is keeping persistent decisions. > If the feature is something only those with really large repositories > care about, is it a good trade-off to make everybody pay the runtime > cost and make code more complex and fragile? I am not yet convinced. I'm not sure I understand the runtime and complexity costs of the config option. Isn't it just: if (core_untracked_cache) { /* do the thing */ } and loading core_untracked_cache in git_default_core_config()? Versus: if (the_index.has_untracked_cache) { /* do the thing */ } I ask as somebody who hasn't followed the topic closely. I just don't see what is that different about this versus other config options. What am I missing? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 8, 2015 at 11:43 PM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Christian Couder writes: >> >>> When we know that mtime is fully supported by the environment, we >>> might want the untracked cache to be always used by default without >>> any mtime test or kernel version check being performed. >>> >>> Also when we know that mtime is not supported by the environment, >>> for example because the repo is shared over a network file system, >>> then we might want 'git update-index --untracked-cache' to fail >>> immediately instead of preforming tests (because it might work on >>> some systems using the repo over the network file system but not >>> others). >>> ... >> The logic in this paragraph is fuzzy to me. Shouldn't the config >> give a sensible default, that is overriden by command line options? The problem is that "git update-index --[no-|force-]untracked-cache" is not just changing things for the duration of the current command. It is in fact changing the configuration of the repository, because it is adding the UC (untracked cache) in the index and saving the index, which will persist the use of the UC for the following commands. So it is very different from something like "git status --no-untracked-cache" that would perform a "git status" without using the UC. In fact "git update-index --[no-|force-]untracked-cache" is very bad because it means that two repositories can be configured differently even if they have the same config files. >> I agree that it is insane to do a runtime check when the user says >> "update-index --untracked-cache" to enable it, as the user _knows_ >> that enabling it would help (or the user _knows_ that she wants to >> play with it). Similarly, shouldn't the config be ignored when the >> user says "update-index --no-untracked-cache" (hence removing the >> untracked cache from the resulting index no matter the config is set >> to)? ... > > As I think about this more, it really seems to me that we shouldn't > need to make this configuration variable that special. Because I > think it is a *BUG* in the current implementation to do the runtime > check even when the user explicitly tells us to use untracked-cache, > I'd imagine something along the lines of the following would be a > lot simpler, easier to understand and a generally more useful > bugfix: > > 1 Add one boolean configuration variable, core.untrackedCache, that >defaults to 'false'. > > 2 When creating an index file in an empty repository, enable the >untracked cache in the index (even without the user runninng >"update-index --untracked-cache") iff the configuration is set to >'true'. No runtime check necessary. I guess this means that when cloning a repo, it would not use the UC unless either "git -c core.untrackedCache=true clone ..." is used, or core.untrackedCache is set in the global config. > 3 When working on an existing index file, unless the operation is >"update-index --[no-]untracked-cache", keep the current setting, >regardless of this configuration variable. No runtime check >necessary. > > 4 "update-index --[no-]untracked-cache" should enable or disable >the untracked cache as the user tells us, regardless of the >configuration variable. No runtime check necessary. If you want only some repos to use the UC, you will set core.untrackedCache in the repo config. Then after cloning such a repo, you will copy the config file, and this will not be enough to enable the UC. The original repo will use the UC but the cloned one will not, and you might wonder why is "git status" slower in the cloned repo despite the machines and the config being the same for both repos? And if you have set core.untrackedCache in the global config when you clone, UC is enabled, but if you have just set it in the repo config after the clone, it is not enabled. This is quite bad in my opinion. Also think about system administrators who have a lot of machines with lots of repos everywhere on these machines and just upgrade Git from let's say v2.4.0 to v2.8.0. They find out that using UC git status will be faster and want to provide that to the machine's users knowing that they only use recent Linux machines where mtime works well. Shouldn't it be nice if they could just enable core.untrackedCache in the global config files without having to also cd into every repo and use "git update-index --untracked-cache" there? If we consider that "git update-index --[no-|force-]untracked-cache" should not have been created in the first place, and that the good mechanism for this is a config variable, and that maybe "git update-index --[no-|force-]untracked-cache" should just be deprecated, then the important thing is to make the core.untrackedCache config variable work in the best possible way. > It is OK to then add an "auto-detect" on top of the above, that > would only affect the second bullet
Re: [PATCH 7/8] config: add core.untrackedCache
Junio C Hamanowrites: > If you stop thinking that "update-index --untracked-cache" is > somehow a "configuration", things will get clearer to you. > ... >> "git update-index --[no-|force-]untracked-cache" is a bad way, so >> let's make it easy for people to not use it at all. > > As I disagree with that basic premise, I have to disagree with the > conclusion as well. Having said all that, I do not think an option to "update-index" must be the _only_ interface to tell the index to use (or ignore) the untracked cache. Two obvious places that can also have the same command line option would be "git init" and "git clone". If either the per-user configuration (or the per-site one the administrator sets) gave the default for these two commands, that would make it unnecessary to use "update-index", unless you are experimenting or working around bugs in the implementation. The primary reason why I do not like your "configuration decides" is it will be a huge source of confusions and bugs. Imagine what should happen in this sequence, and when should a stale cached information be discarded? - the configuration is set to 'yes'. - the index is updated and written by various commands. - more work is done in the working tree without updating the index. - the configuration is set to 'no'. - more work is done in the working tree without updating the index. - the configuration is set to 'yes'. - more work is done in the working tree without updating the index. - somebody asks "what untracked paths are there?" In the "configuration decides" world, I am not sure how a sane implementation efficiently invalidates the cache as needed, without the config subsystem having intimate knowledge with the untracked cache (so far, the config subsystem is merely a key-value store and does not care _what_ it stores; you would want to invalidate the cache in the index when somebody sets the variable to 'no', which means the config subsystem needs to know that this variable is special). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
Christian Couderwrites: > In fact "git update-index --[no-|force-]untracked-cache" is very bad > because it means that two repositories can be configured differently > even if they have the same config files. If you stop thinking that "update-index --untracked-cache" is somehow a "configuration", things will get clearer to you. Imagine there is no configuration whatsoever. The index primarily keeps track of what will be in the next "write-tree", but optionally it can keep track of other kinds of information, such as the last unmerged states for paths whose conflicts have been resolved. Duy's work adds another kind of information to the mix--cached stat bits for untracked paths to speed up the next "git status", and the option is to start keeping track of that information in the index. Because it is a "cache", once you start keeping track of it, you need to maintain it--otherwise you will be fooled by stale information still in the cache. So of course the effect has to persist. There is nothing _wrong_ with that. If you want to stop maintaining it, you can tell the index "don't use untracked-cache". So I think the above "is very bad because" is total nonsense. > If you want only some repos to use the UC, you will set > core.untrackedCache in the repo config. Then after cloning such a > repo, you will copy the config file, and this will not be enough to > enable the UC. Surely. "Does this index file keeps track of the untracked files' states?" is a property of the index. Cloning does not propagate the configuration and copying or not copying is irrelevant. If you want to enable, running "update-index --untracked-cache" is a way to do so. I cannot see what's so hard about it. > And if you have set core.untrackedCache in the global config when you > clone, UC is enabled, but if you have just set it in the repo config > after the clone, it is not enabled. That's fine. In your patch series, if you set it in the global, you will get the cache in the new one. With the cleaned-up semantics I suggested, the same thing will happen. And with the cleaned-up semantics, the configuration is *ONLY* used to give the *DEFAULT* before other things happen, i.e. creation of the index file for the first time. Because the configuration is only the default, an explicit "update-index --[no-]untracked-cache" will defeat it, just like any other config/option interaction. The biggest issue I had with your patch series, IIRC, is that configuration will defeat the command line option. > Shouldn't it be nice if they could just enable core.untrackedCache in > the global config files without having to also cd into every repo and > use "git update-index --untracked-cache" there? NO. It is bad to change the behaviour behind users' back. Set that once, _future_ repositories their users will create will use that by default, and tell their users what to do to their existing repositories. Problem solved. > "git update-index --[no-|force-]untracked-cache" is a bad way, so > let's make it easy for people to not use it at all. As I disagree with that basic premise, I have to disagree with the conclusion as well. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On 08.12.15 18:15, Christian Couder wrote: > When we know that mtime is fully supported by the environment, we > might want the untracked cache to be always used by default without > any mtime test or kernel version check being performed. > For the people which didn't follow the discussion, or read this in a year or 2: A short description what "mtime is fully supported by the environment" means would be nice: When a file system object is added or deleted in a directory, and stat(dirname, ) returns an updated st.st_mtime. > Also when we know that mtime is not supported by the environment, > for example because the repo is shared over a network file system, > then we might want 'git update-index --untracked-cache' to fail > immediately instead of preforming tests (because it might work on performing > some systems using the repo over the network file system but not > others). > > The normal way to achieve the above in Git is to use a config > variable. That's why this patch introduces "core.untrackedCache". > > To keep things simple, this variable is a bool which default to > false. If this is the case, can we remove some code? e.g add_untracked_ident() in dir.c, do we still need it? And probably some other functions as well. Or would it be better to say "false" is false, "true" is true "unset" is as before (Where when different machines/OS/mount points access the same repo over a network file system, some use the UT, some don't) > > When "git status" is run, it now adds or removes the untracked > cache in the index to respect the value of this variable. > > This means that `git update-index --[no-|force-]untracked-cache`, > to be compatible with the new config variable, have to set or > unset it. what does unset mean ? Set to false ? This new behavior is backward incompatible change, but > that is deliberate. > > Also `--untracked-cache` used to check that the underlying > operating system and file system change `st_mtime` field of a > directory if files are added or deleted in that directory. But > those tests take a long time and there is now > `--test-untracked-cache` to perform them. > > So to be more consistent with other git commands, this patch > prevents `--untracked-cache` to perform tests. This means that > after this patch there is no difference any more between > `--untracked-cache` and `--force-untracked-cache`. > > Signed-off-by: Christian Couder> --- > Documentation/config.txt | 7 +++ > Documentation/git-update-index.txt | 28 ++-- > builtin/update-index.c | 23 +-- > cache.h| 1 + > config.c | 4 > contrib/completion/git-completion.bash | 1 + > dir.c | 2 +- > environment.c | 1 + > t/t7063-status-untracked-cache.sh | 4 +--- > wt-status.c| 9 + > 10 files changed, 56 insertions(+), 24 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 2d06b11..94820eb 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -308,6 +308,13 @@ core.trustctime:: > crawlers and some backup systems). > See linkgit:git-update-index[1]. True by default. > > +core.untrackedCache:: > + Determines if untracked cache will be enabled. Using > + 'git update-index --[no-|force-]untracked-cache' will set > + this variable. set to what ? true ? false ? Before setting it to true, you should check > + that mtime is working properly on your system. > + See linkgit:git-update-index[1]. False by default. > + isn't this what "git update-index --test-untracked-cached" is good for? > core.checkStat:: > Determines which stat fields to match between the index > and work tree. The user can set this to 'default' or > diff --git a/Documentation/git-update-index.txt > b/Documentation/git-update-index.txt > index 0ff7396..0fb39db 100644 > --- a/Documentation/git-update-index.txt > +++ b/Documentation/git-update-index.txt > @@ -175,22 +175,28 @@ may not support it yet. > --no-untracked-cache:: > Enable or disable untracked cache extension. This could speed > up for commands that involve determining untracked files such > + as `git status`. > ++ > +The underlying operating system and file system must change `st_mtime` > +field of a directory if files are added or deleted in that > +directory. You can test that using > +`--test-untracked-cache`. `--untracked-cache` used to test that too > +but it doesn't anymore. Do we need this historical comment ? > ++ > +This sets the `core.untrackedCache` configuration variable to 'true' > +or 'false' in the repo config file, (see linkgit:git-config[1]), so > +that the untracked cache stays enabled or disabled. > > --test-untracked-cache:: >
Re: [PATCH 7/8] config: add core.untrackedCache
Christian Couderwrites: > When we know that mtime is fully supported by the environment, we > might want the untracked cache to be always used by default without > any mtime test or kernel version check being performed. > > Also when we know that mtime is not supported by the environment, > for example because the repo is shared over a network file system, > then we might want 'git update-index --untracked-cache' to fail > immediately instead of preforming tests (because it might work on > some systems using the repo over the network file system but not > others). > > The normal way to achieve the above in Git is to use a config > variable. That's why this patch introduces "core.untrackedCache". > > To keep things simple, this variable is a bool which default to > false. > > When "git status" is run, it now adds or removes the untracked > cache in the index to respect the value of this variable. > > This means that `git update-index --[no-|force-]untracked-cache`, > to be compatible with the new config variable, have to set or > unset it. This new behavior is backward incompatible change, but > that is deliberate. The logic in this paragraph is fuzzy to me. Shouldn't the config give a sensible default, that is overriden by command line options? I agree that it is insane to do a runtime check when the user says "update-index --untracked-cache" to enable it, as the user _knows_ that enabling it would help (or the user _knows_ that she wants to play with it). Similarly, shouldn't the config be ignored when the user says "update-index --no-untracked-cache" (hence removing the untracked cache from the resulting index no matter the config is set to)? Why is it a good idea to allow an explicit operation from the command line to muck with the configuration? If the user wants to change the configuration permanently, "git config" has been there for the purpose for all the configuration variables to do so for almost forever. Why is it a good idea to make this variable so special that the user does not have to use "git config" but disable or enable it in some other way? > Also `--untracked-cache` used to check that the underlying > operating system and file system change `st_mtime` field of a > directory if files are added or deleted in that directory. But > those tests take a long time and there is now > `--test-untracked-cache` to perform them. > > So to be more consistent with other git commands, this patch > prevents `--untracked-cache` to perform tests. Good change. > This means that > after this patch there is no difference any more between > `--untracked-cache` and `--force-untracked-cache`. A tip to write the explanation. Every time you say "This means that...", you are (perhaps unconsciously) admitting that what you wrote immedidately before that may not be understandable and what comes after it may be a better explanation. Discard the sentence before "This means that", and try to formulate your explanation around what you wrote after it, adding information in the discarded earlier sentence back to the later one. Doing this exercise often helps the result read much better. > > Signed-off-by: Christian Couder > --- > Documentation/config.txt | 7 +++ > Documentation/git-update-index.txt | 28 ++-- > builtin/update-index.c | 23 +-- > cache.h| 1 + > config.c | 4 > contrib/completion/git-completion.bash | 1 + > dir.c | 2 +- > environment.c | 1 + > t/t7063-status-untracked-cache.sh | 4 +--- > wt-status.c| 9 + > 10 files changed, 56 insertions(+), 24 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 2d06b11..94820eb 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -308,6 +308,13 @@ core.trustctime:: > crawlers and some backup systems). > See linkgit:git-update-index[1]. True by default. > > +core.untrackedCache:: > + Determines if untracked cache will be enabled. Using > + 'git update-index --[no-|force-]untracked-cache' will set > + this variable. Before setting it to true, you should check > + that mtime is working properly on your system. > + See linkgit:git-update-index[1]. False by default. > + > core.checkStat:: > Determines which stat fields to match between the index > and work tree. The user can set this to 'default' or > diff --git a/Documentation/git-update-index.txt > b/Documentation/git-update-index.txt > index 0ff7396..0fb39db 100644 > --- a/Documentation/git-update-index.txt > +++ b/Documentation/git-update-index.txt > @@ -175,22 +175,28 @@ may not support it yet. > --no-untracked-cache:: > Enable or disable untracked cache extension. This could
Re: [PATCH 7/8] config: add core.untrackedCache
Junio C Hamanowrites: > Christian Couder writes: > >> When we know that mtime is fully supported by the environment, we >> might want the untracked cache to be always used by default without >> any mtime test or kernel version check being performed. >> >> Also when we know that mtime is not supported by the environment, >> for example because the repo is shared over a network file system, >> then we might want 'git update-index --untracked-cache' to fail >> immediately instead of preforming tests (because it might work on >> some systems using the repo over the network file system but not >> others). >> ... > The logic in this paragraph is fuzzy to me. Shouldn't the config > give a sensible default, that is overriden by command line options? > I agree that it is insane to do a runtime check when the user says > "update-index --untracked-cache" to enable it, as the user _knows_ > that enabling it would help (or the user _knows_ that she wants to > play with it). Similarly, shouldn't the config be ignored when the > user says "update-index --no-untracked-cache" (hence removing the > untracked cache from the resulting index no matter the config is set > to)? ... As I think about this more, it really seems to me that we shouldn't need to make this configuration variable that special. Because I think it is a *BUG* in the current implementation to do the runtime check even when the user explicitly tells us to use untracked-cache, I'd imagine something along the lines of the following would be a lot simpler, easier to understand and a generally more useful bugfix: 1 Add one boolean configuration variable, core.untrackedCache, that defaults to 'false'. 2 When creating an index file in an empty repository, enable the untracked cache in the index (even without the user runninng "update-index --untracked-cache") iff the configuration is set to 'true'. No runtime check necessary. 3 When working on an existing index file, unless the operation is "update-index --[no-]untracked-cache", keep the current setting, regardless of this configuration variable. No runtime check necessary. 4 "update-index --[no-]untracked-cache" should enable or disable the untracked cache as the user tells us, regardless of the configuration variable. No runtime check necessary. It is OK to then add an "auto-detect" on top of the above, that would only affect the second bullet point, like so: 2a When creating an index file in an empty repository, if the configuration is set to 'auto', do the lengthy runtime check and enable the untracked cache in the index (even without the user runninng "update-index --untracked-cache"). without changing any other in the first 4-bullet list. Am I missing some other requirements? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html