Re: [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators

2017-04-12 Thread Stefan Beller
On Wed, Apr 12, 2017 at 5:06 PM, Daniel Ferreira (theiostream)
 wrote:
> Hey there! I'm sorry for bothering you, but any chance you might have
> overlooked this patch for a review?

Sort of. I read through them and found no issue back then, but did
not reply anticipating someone else would .

>  (I'm just not familiar with how
> long patches usually take to be reviewed, and since I always got an
> answer within two days of sending it I wondered if you may have just
> not noticed it.)

I reviewed it again and found just a very minor nit, no need for a resend
if that is the only nit.

Thanks,
Stefan


Re: [PATCH v8 1/5] dir_iterator: add tests for dir_iterator API

2017-04-12 Thread Stefan Beller
On Wed, Apr 5, 2017 at 6:39 PM, Daniel Ferreira  wrote:

> +int cmd_main(int argc, const char **argv) {

Style (minor nit, in case resend is needed):

We only put the { brace on the same line for statements
(if, for, while, switch), not for function declarations.

I tried to find a good rationale for it, this is the most sufficient I
could find
https://www.kernel.org/doc/Documentation/process/coding-style.rst
(section 3):

Heretic people all over the world have claimed that this inconsistency
is ...  well ...  inconsistent, but all right-thinking people know that
(a) K are **right** and (b) K are right.  Besides, functions are
special anyway (you can't nest them in C).


Re: [PATCH v8 0/5] [GSoC] remove_subtree(): reimplement using iterators

2017-04-12 Thread Daniel Ferreira (theiostream)
Hey there! I'm sorry for bothering you, but any chance you might have
overlooked this patch for a review? (I'm just not familiar with how
long patches usually take to be reviewed, and since I always got an
answer within two days of sending it I wondered if you may have just
not noticed it.)

-- Daniel.

On Wed, Apr 5, 2017 at 10:39 PM, Daniel Ferreira  wrote:
> This is the seventh version of a patch series that implements the GSoC
> microproject of converting a recursive call to readdir() to use dir_iterator.
>
> v1: 
> https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#t
> v2: 
> https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t
> v3: 
> https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3jbfjkhwcdv_lphkcd71zrwm...@mail.gmail.com/T/#t
> v4: 
> https://public-inbox.org/git/1490747533-89143-1-git-send-email-bnm...@gmail.com/T/#e437a63e0c22c00c69b5d92977c9b438ed2b9fd3a
> v5: 
> https://public-inbox.org/git/1490844730-47634-1-git-send-email-bnm...@gmail.com/T/#m2323f15e45de699f2e09364f40a62e17047cf453
> v6: 
> https://public-inbox.org/git/1491107726-21504-1-git-send-email-bnm...@gmail.com/T/#t
> v7: 
> https://public-inbox.org/git/1491163388-41255-1-git-send-email-bnm...@gmail.com/T/#t
>
> Travis CI build: https://travis-ci.org/theiostream/git/jobs/21982
>
> In this version, I applied pretty much all suggestions Michael and Stefan had
> for the patch.
>
> Michael, regarding the patch you sent for parsing the arguments on the
> test-dir-iterator helper, I did not squash because it'd generate too many
> merge conflicts. I just preferred add your code manually. Let me know how I
> can properly credit you for it.
>
> The only "controversial" part of this code is how I ended up caching the 
> result
> of lstat() on the dir_iterator_level struct. Having to handle the case of the
> root directory ended up making set_iterator_data() way more complicated 
> (having
> to handle the case of level->stat not being set by push_dir_level()), as well
> as required the introduction of st_is_set member in the level struct. This 
> issue
> would be solved if we could lstat() the root dir on dir_iterator_begin() and
> possibly return an error there. Regardless, I appreciate other suggestions to
> make this less complex.
>
> Daniel Ferreira (5):
>   dir_iterator: add tests for dir_iterator API
>   remove_subtree(): test removing nested directories
>   dir_iterator: add helpers to dir_iterator_advance
>   dir_iterator: refactor state machine model
>   remove_subtree(): reimplement using iterators
>
>  Makefile|   1 +
>  dir-iterator.c  | 247 
> +---
>  dir-iterator.h  |  35 --
>  entry.c |  39 +++
>  refs/files-backend.c|   2 +-
>  t/helper/.gitignore |   1 +
>  t/helper/test-dir-iterator.c|  48 
>  t/t0065-dir-iterator.sh |  93 +++
>  t/t2000-checkout-cache-clash.sh |  11 ++
>  9 files changed, 373 insertions(+), 104 deletions(-)
>  create mode 100644 t/helper/test-dir-iterator.c
>  create mode 100755 t/t0065-dir-iterator.sh
>
> --
> 2.7.4 (Apple Git-66)
>


Re: Rebase sequencer changes prevent exec commands from modifying the todo file?

2017-04-12 Thread Stephen Hicks
On Wed, Apr 12, 2017 at 3:05 PM, Johannes Schindelin
 wrote:
>
> Hi Stephen,
>
> On Tue, 11 Apr 2017, Stephen Hicks wrote:
>
> > Thanks for the tips.  I think I have an approach that works, by simply
> > returning sequencer_continue() immediately after a successful exec.
>
> I am not sure that that works really as expected, as you re-enter the
> sequencer_continue() and neither the original author nor I expected nested
> calls.

Alright, I can reproduce the longer version in place here.

> > I'm hesitant to only use mtime, size, and inode, since it's quite likely
> > that these are all identical even if the file has changed.
>
> Not at all. The mtime and the size will most likely be different.

In my experience, mtime is almost always the same, since the file is
written pretty much immediately before the exec - as long as the
command takes a small fraction of a second (which it usually does) the
mtime (which is in seconds, not millis or micros) will look the same.
For shell redirects, the inode stays the same, though the size will
usually be different, but this is not guaranteed.

> I am reluctant to take your wholesale approach, as I perform literally
> dozens of rebases with >100 commits, including plenty of exec calls, and I
> want the rebase to become faster instead of slower.

I'm in favor of that, but not sure of a reliable way to tell whether
the file is modified.  As mentioned above, mtime, size, and inode only
get us *most* of the way there.  This leaves it open to very
surprising issues when suddenly one edit out of a thousand
accidentally doesn't change any of these things and so doesn't get
picked up.  This would be a very difficult to debug issue as well,
since everything would look very reasonable from the outside.
Potentially it could be mitigated by actually writing a message when
the file is reloaded, making it more obvious when that doesn't occur
(though it's still not the most ergonomic situation).

> > Say, the command is simply a `sed -i 's/^exec / /g'`, then the
> > timestamp (in seconds) will almost definitely be the same, and the size
> > and inode will be the same as well.
>
> Try it. The inode is different.

You are correct that sed -i does change the inode.  `c=$(sed 's/^exec
/ /g' git-rebase-todo); echo "$c" > git-rebase-todo` is a better
example, albeit a bit more contrived.  I've added a failing test case
to demonstrate the problem.

> > Granted this is a contrived example, but it would be unfortunate if
> > accidentally keeping the size the same were to cause the change to not
> > be picked up.
> >
> > Another option would be to hash the contents, but at that point, I'm not
> > sure it's any better than simply unconditionally re-parsing the TODO.
>
> Again, my intent is to make rebase faster, not slower. Hashing the
> contents would make it slower. So would re-reading it.
>
> > https://github.com/git/git/pull/343
>
> Thank you for starting to work on this. I left a couple of comments.
> Please do not be offended by their terseness, I really wanted to point out
> a couple of things I think we can improve together, but I am also way past
> my bedtime.
>
> Ciao,
> Johannes


Re: [PATCH] ls-files: properly prepare submodule environment

2017-04-12 Thread Jacob Keller
On Wed, Apr 12, 2017 at 9:58 AM, Brandon Williams  wrote:
> On 04/11, Jacob Keller wrote:
>> From: Jacob Keller 
>>
>> Since commit e77aa336f116 ("ls-files: optionally recurse into
>> submodules", 2016-10-07) ls-files has known how to recurse into
>> submodules when displaying files.
>>
>> Unfortunately this code does not work as expected. Initially, it
>> produced results indicating that a --super-prefix can't be used from
>> a subdirectory:
>>
>>   fatal: can't use --super-prefix from a subdirectory
>>
>> After commit b58a68c1c187 ("setup: allow for prefix to be passed to git
>> commands", 2017-03-17) this behavior changed and resulted in repeated
>> calls of ls-files with an expanding super-prefix.
>>
>> This recursive expanding super-prefix appears to be the result of
>> ls-files acting on the super-project instead of on the submodule files.
>>
>> We can fix this by properly preparing the submodule environment when
>> setting up the submodule process. This ensures that the command we
>> execute properly reads the directory and ensures that we correctly
>> operate in the submodule instead of repeating oureslves on the
>> super-project contents forever.
>>
>> While we're here lets also add some tests to ensure that ls-files works
>> with recurse-submodules to catch these issues in the future.
>>
>> Signed-off-by: Jacob Keller 
>> ---
>> I found this fix on accident after looking at git-grep code and
>> comparing how ls-files instantiated the submodule. Can someone who knows
>> more about submodules explain the reasoning better for this fix?
>> Essentially we "recurse" into the submodule folder, but still operate as
>> if we're at the root of the project but with a new prefix. So then we
>> keep recursing into the submodule forever.
>
> The reason why this fix is required is that the env var GIT_DIR is set
> to be the parents gitdir.  When recursing the childprocess just uses the
> parents gitdir instead of its own causing it to recurse into itself
> again and again.  The child process's environment needs to have the
> GIT_DIR var cleared so that the child will do discovery and actually
> find its own gitdir.

Right. That makes sense, but that raises the question of how or where
this worked in the first place?

>
>>
>> I also added some tests here, and I *definitely* think this should be a
>> maintenance backport into any place which supports ls-files
>> --recurse-submodules, since as far as I can tell it is otherwise
>> useless.
>>
>> There were no tests for it, so I added some based on git-grep tests. I
>> did not try them against the broken setups, because of the endless
>> recursion.
>
> There are tests for ls-files --recurse-submodules in
> t3007-ls-files-recurse-submodules.sh, though it looks like this
> particular case (which triggers this bug) maybe didn't have any tests.
> I'm actually unsure of why the existing tests didn't catch this (I'm
> probably just bad at writing tests).

It seems to me like this would be a problem for any setup with
submodules, no? Or is it specific case for me? I have a submodule
within a submodule and I'm not setting GIT_DIR manually myself. I want
to isolate exactly what scenario fails here so that the commit
description can be accurate and we know for sure the test cases cover
it.

Thanks,
Jake

>
>
>>
>>  builtin/ls-files.c |   4 +
>>  t/t3080-ls-files-recurse-submodules.sh | 162 
>> +
>>  2 files changed, 166 insertions(+)
>>  create mode 100755 t/t3080-ls-files-recurse-submodules.sh
>>
>> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
>> index d449e46db551..e9b3546ca053 100644
>> --- a/builtin/ls-files.c
>> +++ b/builtin/ls-files.c
>> @@ -15,6 +15,7 @@
>>  #include "string-list.h"
>>  #include "pathspec.h"
>>  #include "run-command.h"
>> +#include "submodule.h"
>>
>>  static int abbrev;
>>  static int show_deleted;
>> @@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce)
>>   struct child_process cp = CHILD_PROCESS_INIT;
>>   int status;
>>
>> + prepare_submodule_repo_env(_array);
>> + argv_array_push(_array, GIT_DIR_ENVIRONMENT);
>
> Yes these lines need to be included in order to prepare the environment.
> Thanks for catching that.
>
>> +
>>   if (prefix_len)
>>   argv_array_pushf(_array, "%s=%s",
>>GIT_TOPLEVEL_PREFIX_ENVIRONMENT,
>> diff --git a/t/t3080-ls-files-recurse-submodules.sh 
>> b/t/t3080-ls-files-recurse-submodules.sh
>> new file mode 100755
>> index ..6788a8f09635
>> --- /dev/null
>> +++ b/t/t3080-ls-files-recurse-submodules.sh
>> @@ -0,0 +1,162 @@
>> +#!/bin/sh
>> +
>> +test_description='Test ls-files recurse-submodules feature
>> +
>> +This test verifies the recurse-submodules feature correctly lists files 
>> across
>> +submodules.
>> +'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'setup directory structure and 

[PATCHv2] t6500: don't run detached auto gc at the end of the test script

2017-04-12 Thread SZEDER Gábor
The last test in 't6500-gc', 'background auto gc does not run if
gc.log is present and recent but does if it is old', added in
a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically
trigger an error message from the test harness:

  rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not empty

The test in question ends with executing an auto gc in the backround,
which occasionally takes so long that it's still running when
'test_done' is about to remove the trash directory.  This 'rm -rf
$trash' in the foreground might race with the detached auto gc to
create and delete files and directories, and gc might (re-)create a
path that 'rm' already visited and removed, triggering the above error
message when 'rm' attempts to remove its parent directory.

Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01)
fixed the same problem in a different test script by simply
disallowing background gc.  Unfortunately, what worked there is not
applicable here, because the purpose of this test is to check the
behavior of a detached auto gc.

Make sure that the test doesn't continue before the gc is finished in
the background with a clever bit of shell trickery:

  - Open fd 9 in the shell, to be inherited by the background gc
process, because our daemonize() only closes the standard fds 0,
1 and 2.
  - Duplicate this fd 9 to stdout.
  - Read 'git gc's stdout, and thus fd 9, through a command
substitution.  We don't actually care about gc's output, but this
construct has two useful properties:
  - This read blocks until stdout or fd 9 are open.  While stdout is
closed after the main gc process creates the background process
and exits, fd 9 remains open until the backround process exits.
  - The variable assignment from the command substitution gets its
exit status from the command executed within the command
substitution, i.e. a failing main gc process will cause the test
to fail.

Note, that this fd trickery doesn't work on Windows, because due to
MSYS limitations the git process only inherits the standard fds 0, 1
and 2 from the shell.  Luckily, it doesn't matter in this case,
because on Windows daemonize() is basically a noop, thus 'git gc
--auto' always runs in the foreground.

And since we can now continue the test reliably after the detached gc
finished, check that there is only a single packfile left at the end,
i.e. that the detached gc actually did what it was supposed to do.
Also add a comment at the end of the test script to warn developers of
future tests about this issue of long running detached gc processes.

Helped-by: Jeff King 
Helped-by: Johannes Sixt 
Signed-off-by: SZEDER Gábor 
---
 t/t6500-gc.sh | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 08de2e8ab..cc7acd101 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,6 +67,16 @@ test_expect_success 'auto gc with too many loose objects 
does not attempt to cre
test_line_count = 2 new # There is one new pack and its .idx
 '
 
+run_and_wait_for_auto_gc () {
+   # We read stdout from gc for the side effect of waiting until the
+   # background gc process exits, closing its fd 9.  Furthermore, the
+   # variable assignment from a command substitution preserves the
+   # exit status of the main gc process.
+   # Note: this fd trickery doesn't work on Windows, but there is no
+   # need to, because on Win the auto gc always runs in the foreground.
+   doesnt_matter=$(git gc --auto 9>&1)
+}
+
 test_expect_success 'background auto gc does not run if gc.log is present and 
recent but does if it is old' '
test_commit foo &&
test_commit bar &&
@@ -80,7 +90,13 @@ test_expect_success 'background auto gc does not run if 
gc.log is present and re
test-chmtime =-345600 .git/gc.log &&
test_must_fail git gc --auto &&
test_config gc.logexpiry 2.days &&
-   git gc --auto
+   run_and_wait_for_auto_gc &&
+   ls .git/objects/pack/pack-*.pack >packs &&
+   test_line_count = 1 packs
 '
 
+# DO NOT leave a detached auto gc process running near the end of the
+# test script: it can run long enough in the background to racily
+# interfere with the cleanup in 'test_done'.
+
 test_done
-- 
2.12.2.613.g9c5b79913



Re: Rebase sequencer changes prevent exec commands from modifying the todo file?

2017-04-12 Thread Johannes Schindelin
Hi Stephen,

On Tue, 11 Apr 2017, Stephen Hicks wrote:

> Thanks for the tips.  I think I have an approach that works, by simply
> returning sequencer_continue() immediately after a successful exec.

I am not sure that that works really as expected, as you re-enter the
sequencer_continue() and neither the original author nor I expected nested
calls.

> I'm hesitant to only use mtime, size, and inode, since it's quite likely
> that these are all identical even if the file has changed.

Not at all. The mtime and the size will most likely be different.

I am reluctant to take your wholesale approach, as I perform literally
dozens of rebases with >100 commits, including plenty of exec calls, and I
want the rebase to become faster instead of slower.

> Say, the command is simply a `sed -i 's/^exec / /g'`, then the
> timestamp (in seconds) will almost definitely be the same, and the size
> and inode will be the same as well.

Try it. The inode is different.

> Granted this is a contrived example, but it would be unfortunate if
> accidentally keeping the size the same were to cause the change to not
> be picked up.
> 
> Another option would be to hash the contents, but at that point, I'm not
> sure it's any better than simply unconditionally re-parsing the TODO.

Again, my intent is to make rebase faster, not slower. Hashing the
contents would make it slower. So would re-reading it.

> https://github.com/git/git/pull/343

Thank you for starting to work on this. I left a couple of comments.
Please do not be offended by their terseness, I really wanted to point out
a couple of things I think we can improve together, but I am also way past
my bedtime.

Ciao,
Johannes


Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-12 Thread SZEDER Gábor
On Wed, Apr 12, 2017 at 2:50 AM, Jeff King  wrote:
> On Wed, Apr 12, 2017 at 02:27:05AM +0200, SZEDER Gábor wrote:
>
>> >> I wonder if you could make it a general test-lib function, like:
>> >>
>> >>   run_and_wait () {
>> >> # we read stdout from the child only for the side effect
>> >> # of waiting until all child sub-processes exit, closing their
>> >> # fd 9.
>> >> does_not_matter=$("$@" 9>&1)
>> >
>> >
>> > I'm afraid this won't work on Windows when the invoked command is git. FD
>> > inheritance between MSYS (bash) and non-MSYS programs (git) is only
>> > implemented for FDs 0,1,2. That's a deficiency of MSYS, and I don't think
>> > that was improved in MSYS2.
>>
>> Oh, that's a pity, I was almost ready with v2...
>>
>> Unfortunately, this makes the general helper function unworkable, of
>> course.  Though in this particular case it wouldn't matter, because on
>> Windows daemonize() is basically a noop and 'git gc --auto' remains in
>> the foreground anyway.
>
> That makes it tempting to use in this scenario as a one-off with a
> comment.

Ok, I'll send it out in a minute, and Junio can then take his pick.

>> I think we should stick with my initial patch, then.
>
> I'm not entirely opposed, but my understanding was that it didn't
> actually fix the race, it just made it a bit bigger. Which is sort of
> unsatisfying.
>
> I couldn't get the original to show a failure, though, even under heavy
> load. So maybe widening the race is enough.

Just to be clear: it's only an occasionally appearing error message.
There is no failure in the sense of test failure, because 'rm -rf
$trash' erroring out during housekeeping does not fail the test suite.


RE: Proposal for "fetch-any-blob Git protocol" and server design

2017-04-12 Thread Kevin David
Hi Jonathan, 

I work on the network protocols for the GVFS project at Microsoft.
I shared a couple thoughts and questions below.

Thanks,

Kevin

-Original Message-
From: Stefan Beller [mailto:sbel...@google.com]
Sent: Tuesday, March 28, 2017 7:20 PM
To: Jonathan Tan ; Ben Peart 
Cc: git@vger.kernel.org; Mark Thomas ; Jeff Hostetler 

Subject: Re: Proposal for "fetch-any-blob Git protocol" and server design

+cc Ben Peart, who sent
"[RFC] Add support for downloading blobs on demand" to the list recently.
This proposal here seems like it has the same goal, so maybe your review could 
go a long way here?

Thanks,
Stefan

On Tue, Mar 14, 2017 at 3:57 PM, Jonathan Tan  wrote:
> As described in "Background" below, there have been at least 2 patch 
> sets to support "partial clones" and on-demand blob fetches, where the 
> server part that supports on-demand blob fetches was treated at least 
> in outline. Here is a proposal treating that server part in detail.
>
> == Background
>
> The desire for Git to support (i) missing blobs and (ii) fetching them 
> as needed from a remote repository has surfaced on the mailing list a 
> few times, most recently in the form of RFC patch sets [1] [2].
>
> A local repository that supports (i) will be created by a "partial 
> clone", that is, a clone with some special parameters (exact 
> parameters are still being discussed) that does not download all blobs 
> normally downloaded. Such a repository should support (ii), which is what 
> this proposal describes.
>
> == Design
>
> A new endpoint "server" is created. The client will send a message in 
> the following format:
>
> 
> fbp-request = PKT-LINE("fetch-blob-pack")
>   1*want
>   flush-pkt
> want = PKT-LINE("want" SP obj-id)
> 
>
> The client may send one or more SHA-1s for which it wants blobs, then 
> a flush-pkt.

I know we're considering server behavior here, but how large do you generally
expect these blob-want requests to be? I ask because we took an initial approach
very similar to this, however, we had a hard time being clever about figuring 
out
what set of blobs to request for those clients that didn't want the entire set, 
and
ended up falling back to single-blob requests. 

Obviously, this could be due to thenature of our 
filesystem-virtualization-based client, 
but I also suspect that the teams attacking this problem are more often than 
not dealing
with very large blob objects, so the cost of a round-trip becomes lower 
relative to sending
the object content itself.

>
> The server will then reply:
>
> 
> server-reply = flush-pkt | PKT-LINE("ERR" SP message)
> 
>
> If there was no error, the server will then send them in a packfile, 
> formatted like described in "Packfile Data" in pack-protocol.txt with 
> "side-band-64k" enabled.
> Any server that supports "partial clone" will also support this, and 
> the client will automatically assume this. (How a client discovers 
> "partial clone" is not covered by this proposal.)

Along the same lines as above, this is where we started and it worked well for
low-volume requests. However, when we started ramping up the load,
`pack-objects` operating on a very large packed repository (~150 GiB) became
very computationally expensive, even with `--compression=1 --depth=0 
--window=0`.

Being a bit more clever about packing objects (e.g. splitting blobs out from 
commits
and trees) improved this a bit, but we still hit a bottlenecks from what 
appeared to
be a large number of memory-mapping operations on a ~140GiB packfile of blobs.

Each `pack-objects` process would consume approximately one CPU core for the
duration of the request. It's possible that further splitting of these large 
blob packs
would have improved performance in some scenarios, but that would increase the
amount of pack-index lookups necessary to find a single object.

>
> The server will perform reachability checks on requested blobs through 
> the equivalent of "git rev-list --use-bitmap-index" (like "git 
> upload-pack" when using the allowreachablesha1inwant option), unless 
> configured to suppress reachability checks through a config option.
> The server administrator is highly recommended to regularly regenerate 
> the bitmap (or suppress reachability checks).
>
> === Endpoint support for forward compatibility
>
> This "server" endpoint requires that the first line be understood, but 
> will ignore any other lines starting with words that it does not 
> understand. This allows new "commands" to be added (distinguished by 
> their first lines) and existing commands to be "upgraded" with backwards 
> compatibility.

This seems like a clever way to avoid the canonical 
`/info/refs?service=git-upload-pack` 
capability negotiation on every call. However, using error handling to fallback 
seems
slightly wonky to me. Hopefully users are 

Re: [PATCH 3/4] submodule.c: harden submodule_move_head against broken submodules

2017-04-12 Thread Philip Oakley

From: "Stefan Beller" 

Early on in submodule_move_head just after the check if the submodule is
initialized, we need to check if the submodule is populated correctly.

If the submodule is initialized but doesn't look like populated, this

s/like//   or s/like/like it is/


is a red flag and can indicate multiple sorts of failures:
(1) The submodule may be recorded at an object name, that is missing.
(2) The submodule '.git' file link may be broken and it is not pointing
   at a repository.

In both cases we want to complain to the user in the non-forced mode,
and in the forced mode ignoring the old state and just moving the
submodule into its new state with a fixed '.git' file link.

Signed-off-by: Stefan Beller 
---

--
Philip


submodule.c   | 17 +
t/lib-submodule-update.sh | 23 ---
2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 2fa42519a4..dda1ead854 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1331,10 +1331,19 @@ int submodule_move_head(const char *path,
 int ret = 0;
 struct child_process cp = CHILD_PROCESS_INIT;
 const struct submodule *sub;
+ int *errorcode, error_code;

 if (!is_submodule_initialized(path))
 return 0;

+ if (flags & SUBMODULE_MOVE_HEAD_FORCE)
+ errorcode = _code;
+ else
+ errorcode = NULL;
+
+ if (old && !is_submodule_populated_gently(path, errorcode))
+ return 0;
+
 sub = submodule_from_path(null_sha1, path);

 if (!sub)
@@ -1361,6 +1370,14 @@ int submodule_move_head(const char *path,
 /* make sure the index is clean as well */
 submodule_reset_index(path);
 }
+
+ if (old && (flags & SUBMODULE_MOVE_HEAD_FORCE)) {
+ struct strbuf sb = STRBUF_INIT;
+ strbuf_addf(, "%s/modules/%s",
+ get_git_common_dir(), sub->name);
+ connect_work_tree_and_git_dir(path, sb.buf);
+ strbuf_release();
+ }
 }

 prepare_submodule_repo_env_no_git_dir(_array);
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 22dd9e060c..f0b1b18206 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -1213,14 +1213,31 @@ test_submodule_forced_switch_recursing () {
 )
 '
 # Updating a submodule from an invalid sha1 updates
- test_expect_success "$command: modified submodule does not update 
submodule work tree from invalid commit" '
+ test_expect_success "$command: modified submodule does update submodule 
work tree from invalid commit" '

 prolog &&
 reset_work_tree_to_interested invalid_sub1 &&
 (
 cd submodule_update &&
 git branch -t valid_sub1 origin/valid_sub1 &&
- test_must_fail $command valid_sub1 &&
- test_superproject_content origin/invalid_sub1
+ $command valid_sub1 &&
+ test_superproject_content origin/valid_sub1 &&
+ test_submodule_content sub1 origin/valid_sub1
+ )
+ '
+
+ # Old versions of Git were buggy writing the .git link file
+ # (e.g. before f8eaa0ba98b and then moving the superproject repo
+ # whose submodules contained absolute paths)
+ test_expect_success "$command: updating submodules fixes .git links" '
+ prolog &&
+ reset_work_tree_to_interested add_sub1 &&
+ (
+ cd submodule_update &&
+ git branch -t modify_sub1 origin/modify_sub1 &&
+ echo "gitdir: bogus/path" >sub1/.git &&
+ $command modify_sub1 &&
+ test_superproject_content origin/modify_sub1 &&
+ test_submodule_content sub1 origin/modify_sub1
 )
 '
}
--
2.12.2.603.g7b28dc31ba






Re: [PATCH 1/4] entry.c: submodule recursing: respect force flag correctly

2017-04-12 Thread Philip Oakley

From: "Stefan Beller" 

In case of a non-forced worktree update, the submodule movement is tested
in a dry run first, such that it doesn't matter if the actual update is
done via the force flag. However for correctness, we want to give the
flag is specified by the user. All callers have been inspected and updated

s/flag is/flag as/


if needed.

Signed-off-by: Stefan Beller 
---

--
Philip



entry.c| 8 
unpack-trees.c | 7 ++-
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512da90..d6b263f78e 100644
--- a/entry.c
+++ b/entry.c
@@ -208,7 +208,8 @@ static int write_entry(struct cache_entry *ce,
 sub = submodule_from_ce(ce);
 if (sub)
 return submodule_move_head(ce->name,
- NULL, oid_to_hex(>oid), SUBMODULE_MOVE_HEAD_FORCE);
+ NULL, oid_to_hex(>oid),
+ state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 break;
 default:
 return error("unknown file mode for %s in index", path);
@@ -282,12 +283,11 @@ int checkout_entry(struct cache_entry *ce,
 unlink_or_warn(ce->name);

 return submodule_move_head(ce->name,
- NULL, oid_to_hex(>oid),
- SUBMODULE_MOVE_HEAD_FORCE);
+ NULL, oid_to_hex(>oid), 0);
 } else
 return submodule_move_head(ce->name,
 "HEAD", oid_to_hex(>oid),
- SUBMODULE_MOVE_HEAD_FORCE);
+ state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
 }

 if (!changed)
diff --git a/unpack-trees.c b/unpack-trees.c
index 8333da2cc9..7316ca99c2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -252,14 +252,18 @@ static int check_submodule_move_head(const struct 
cache_entry *ce,

  const char *new_id,
  struct unpack_trees_options *o)
{
+ unsigned flags = SUBMODULE_MOVE_HEAD_DRY_RUN;
 const struct submodule *sub = submodule_from_ce(ce);
 if (!sub)
 return 0;

+ if (o->reset)
+ flags |= SUBMODULE_MOVE_HEAD_FORCE;
+
 switch (sub->update_strategy.type) {
 case SM_UPDATE_UNSPECIFIED:
 case SM_UPDATE_CHECKOUT:
- if (submodule_move_head(ce->name, old_id, new_id, 
SUBMODULE_MOVE_HEAD_DRY_RUN))

+ if (submodule_move_head(ce->name, old_id, new_id, flags))
 return o->gently ? -1 :
 add_rejected_path(o, ERROR_WOULD_LOSE_SUBMODULE, ce->name);
 return 0;
@@ -308,6 +312,7 @@ static void unlink_entry(const struct cache_entry *ce)
 case SM_UPDATE_CHECKOUT:
 case SM_UPDATE_REBASE:
 case SM_UPDATE_MERGE:
+ /* state.force is set at the caller. */
 submodule_move_head(ce->name, "HEAD", NULL,
 SUBMODULE_MOVE_HEAD_FORCE);
 break;
--
2.12.2.603.g7b28dc31ba






[PATCH] refs.h: rename submodule arguments to submodule_path

2017-04-12 Thread Stefan Beller
In submodule land we carefully need to distinguish between the path of a
submodule and its name.

The path of a submodule is the path that is recorded in the working tree
of the superproject and describes where the submodule is bound to the
superprojects tree.

The name as introduced in 941987a554 (git-submodule: give submodules
proper names, 2007-06-11) exists to track submodules across renames of
submodules. It is also used for the internal path in .git/modules/
to store the git directory of the submodule inside the superproject.

When looking up ref functions to use, I was confused which of the two
submodule properties are meant in the argument of the ref functions.
The context in which the functions were used however revealed it is the
path of the submodules.

Rename the arguments to clearly describe what is expected as an input
argument.

Signed-off-by: Stefan Beller 
---
 refs.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs.h b/refs.h
index 3df0d45ebb..451f92958a 100644
--- a/refs.h
+++ b/refs.h
@@ -86,7 +86,7 @@ int peel_ref(const char *refname, unsigned char *sha1);
  * successful, return 0 and set sha1 to the name of the object;
  * otherwise, return a non-zero value.
  */
-int resolve_gitlink_ref(const char *submodule, const char *refname,
+int resolve_gitlink_ref(const char *submodule_path, const char *refname,
unsigned char *sha1);
 
 /*
@@ -204,16 +204,16 @@ int for_each_glob_ref(each_ref_fn fn, const char 
*pattern, void *cb_data);
 int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
 const char *prefix, void *cb_data);
 
-int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
-int for_each_ref_submodule(const char *submodule,
+int head_ref_submodule(const char *submodule_path, each_ref_fn fn, void 
*cb_data);
+int for_each_ref_submodule(const char *submodule_path,
   each_ref_fn fn, void *cb_data);
-int for_each_ref_in_submodule(const char *submodule, const char *prefix,
+int for_each_ref_in_submodule(const char *submodule_path, const char *prefix,
each_ref_fn fn, void *cb_data);
-int for_each_tag_ref_submodule(const char *submodule,
+int for_each_tag_ref_submodule(const char *submodule_path,
   each_ref_fn fn, void *cb_data);
-int for_each_branch_ref_submodule(const char *submodule,
+int for_each_branch_ref_submodule(const char *submodule_path,
  each_ref_fn fn, void *cb_data);
-int for_each_remote_ref_submodule(const char *submodule,
+int for_each_remote_ref_submodule(const char *submodule_path,
  each_ref_fn fn, void *cb_data);
 
 int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-- 
2.12.2.603.g7b28dc31ba



[no subject]

2017-04-12 Thread David Taylor
On Tue, Apr 11, 2017 at 10:14 PM, taylor, david  
wrote:
> We are using Git in a distributed environment.
>
> In the United States, we have the master repository in one state and a 
build cluster in a different state.
> In addition to people in the US doing builds, we have people in other 
countries (Ireland, India, Israel,
> Russia, possibly others) doing builds -- using the build cluster.
>
> The local mirror of the repository is NFS accessible.  The plan is to 
make builds faster through the use
> of work trees.  The build cluster nodes involved in the build will have a 
worktree in RAM -- checked out
> for the duration of the build.  Since the worktree is in RAM, it will not 
be NFS accessible.
>
> [Cloning takes 20+ minutes when the network is unloaded.  Building, with 
sources NFS mounted, takes
> 5-10 minutes.]

Using worktrees in this scenario kinda defeats the distributed nature
of git. Cloning takes long, yes. But what about just "git pull" (or
"git fetch && git checkout -f" if you want to avoid merging)?

Merging isn't the issue.  Speed is an issue.  Repeatability is an issue.
Disk space is an issue.

If someone does a build on their desktop instead of using the build
cluster, it will take hours rather than minudes.  And if they are not in
Massachusetts, they probably don't have access to the controlled
toolchain that is used to do the builds.

Users may choose to share their repository amongst several work trees
rather than having lots of clones.  Their choice.  Such work trees would
be 'long lived'.

I was thinking of a different use for work trees.  Work trees that would
be short lived -- less than, say, two hours.  Typically less than 30
minutes.

There would be a local repository mirroring the master repository.  It
would be a true mirror -- it would be updated only from thhe master via
git fetch; there would never be any 'git commit's to it.

When someone who is remote to Massachusetts (which is where the build
cluster lives), wants to do a build they will invoke a script (yet to be
written) that will determine two things:

  . a SHA1 that exists in the master repository that is an ancestor of
  their workspace

  . their workspace differences relative to that SHA1

A lightly loaded build cluster node will be selected, it will be given
the SHA1 and the patch file.  A short lived worktree will be created
which has the SHA1 checked out and the patch applied.

Once the workspace has been recreated, it is built.

[Actually, the build target is decomposed into a dozen pieces that are
built on separate build cluster nodes in parallel using separate copies
of the workspace.]

The build deliverables will be delivered back to the requestor and the
intermediate build products and the work tree will be deleted.

[The script will also be used by Jenkins for continuous integration builds.]

> This presents a few problems.
>
> When we are done with a work tree, we want to clean up -- think: prune.  
But, you cannot prune just
> one worktree; you have to prune the set.  And no machine has access to 
all the worktrees.  So, no
> machine knows which ones are prunable.

By "prune one worktree", did you mean delete one? Or delete a branch
the worktree uses and prune the object database?

As in:

  rm -rf /path/to/top/of/work/tree

and then ideally:

  git worktree prune /path/to/top/of/work/tree

or, alternatively, just:

  git worktree prune

> There is no 'lock' option to 'add'.  If someone does a 'prune' after you 
do an 'add' and before you do a
> 'lock', then your 'add' is undone.
>
> Are there any plans to add a '[--lock]' option to 'add' to create the 
worktree in the locked state?  And/or
> plans to add a [...] option to prune to say 'prune only this path / 
these paths'?

So this is "git worktree prune". Adding "worktree add --locked" sounds
reasonable (and quite simple too, because "worktree add" does lock the
worktree at creation time; we just need to stop it from releasing the
lock). I might be able to do it quickly (it does not mean "available
in the next release" though).

Yes, add the worktree, and if there are no errors, leave it in the
locked state.

If you need to just prune "this path", I think it's the equivalent of
"git worktree remove" (i.e. delete a specific worktree). Work has been
going on for a while to add that command. Maybe it'll be available
later this year.

I'm not familiar with 'git worktree remove', but, yes, just 'git
worktree prune' a specific specified path.  I was thinking do what 'git
worktree prune' does, but do it only for the path(s) specified on the
command line.

[While I only anticipate giving it one path, I see no need / reason to
limit it to just one path on the command line.]

> If there are no plans, is the above an acceptable interface?  And if we 
implemented it, would it be looked

Re: [PATCH 1/5] run-command: convert sane_execvp to sane_execvpe

2017-04-12 Thread Brandon Williams
On 04/10, Brandon Williams wrote:
> Convert 'sane_execvp()' to 'sane_execvpe()' which optionally takes a
> pointer to an array of 'char *' which should be used as the environment
> for the process being exec'd.  If no environment is provided (by passing
> NULL instead) then the already existing environment, as stored in
> 'environ', will be used.

Turns out we probably can't use execvpe since it isn't portable.  From
some of the other discussion it makes more sense to just move to using
execve instead.

> 
> Signed-off-by: Brandon Williams 
> ---
>  cache.h   |  3 +--
>  exec_cmd.c|  2 +-
>  run-command.c | 15 ++-
>  3 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 5c8078291..10d40ecae 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -2185,8 +2185,7 @@ int checkout_fast_forward(const unsigned char *from,
> const unsigned char *to,
> int overwrite_ignore);
>  
> -
> -int sane_execvp(const char *file, char *const argv[]);
> +int sane_execvpe(const char *file, char *const argv[], char *const envp[]);
>  
>  /*
>   * A struct to encapsulate the concept of whether a file has changed
> diff --git a/exec_cmd.c b/exec_cmd.c
> index fb94aeba9..c375f354d 100644
> --- a/exec_cmd.c
> +++ b/exec_cmd.c
> @@ -118,7 +118,7 @@ int execv_git_cmd(const char **argv) {
>   trace_argv_printf(nargv.argv, "trace: exec:");
>  
>   /* execvp() can only ever return if it fails */
> - sane_execvp("git", (char **)nargv.argv);
> + sane_execvpe("git", (char **)nargv.argv, NULL);
>  
>   trace_printf("trace: exec failed: %s\n", strerror(errno));
>  
> diff --git a/run-command.c b/run-command.c
> index 574b81d3e..682bc3ca5 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -168,10 +168,15 @@ static int exists_in_PATH(const char *file)
>   return r != NULL;
>  }
>  
> -int sane_execvp(const char *file, char * const argv[])
> +int sane_execvpe(const char *file, char * const argv[], char *const envp[])
>  {
> - if (!execvp(file, argv))
> - return 0; /* cannot happen ;-) */
> + if (envp) {
> + if (!execvpe(file, argv, envp))
> + return 0; /* cannot happen ;-) */
> + } else {
> + if (!execvp(file, argv))
> + return 0; /* cannot happen ;-) */
> + }
>  
>   /*
>* When a command can't be found because one of the directories
> @@ -226,7 +231,7 @@ static int execv_shell_cmd(const char **argv)
>   struct argv_array nargv = ARGV_ARRAY_INIT;
>   prepare_shell_cmd(, argv);
>   trace_argv_printf(nargv.argv, "trace: exec:");
> - sane_execvp(nargv.argv[0], (char **)nargv.argv);
> + sane_execvpe(nargv.argv[0], (char **)nargv.argv, NULL);
>   argv_array_clear();
>   return -1;
>  }
> @@ -442,7 +447,7 @@ int start_command(struct child_process *cmd)
>   else if (cmd->use_shell)
>   execv_shell_cmd(cmd->argv);
>   else
> - sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
> + sane_execvpe(cmd->argv[0], (char *const*) cmd->argv, 
> NULL);
>   if (errno == ENOENT) {
>   if (!cmd->silent_exec_failure)
>   error("cannot run %s: %s", cmd->argv[0],
> -- 
> 2.12.2.715.g7642488e1d-goog
> 

-- 
Brandon Williams


git subtree bug

2017-04-12 Thread Andrew Hoyt
Hi guys,

I've been using git subtrees extensively to include third party tools
in my projects. Today I stumbled across a bug with git-subtree when I
was trying to update the jacoco (https://github.com/jacoco/jacoco.git)
subtree in one of my projects.

The problem stems from adding a project as a subtree with a prefix
that has the same name as a top-level subdirectory in that project.
>From then on, any changes made to the top-level directory of that
sub-project will cause issues when you attempt a git subtree pull
(e.g. the entire subtree AND the parent project will be deleted in
your working directory).

Steps to reproduce:

mkdir --parents /tmp/project/prj
cd /tmp/project
echo "file in subdirectory" > prj/file_in_subdirectory
echo "file in root" > file_in_root
git init
git add .
git commit -m "Init subproject repository"

mkdir /tmp/test
cd /tmp/test
git init
git commit --allow-empty -m "Init test repository"
git subtree add --prefix prj /tmp/project master

cd /tmp/project
echo "temp file" > temp
git add temp
git commit -m "add temp file"

cd /tmp/test
git subtree pull --prefix=prj /tmp/project master


In the above example, the problem occurs because we've added the
subtree with the prefix "prj" when it happens to contain a top-level
directory also called "prj". A change is then made to "project"s
top-level directory (the file "temp" is created) and thus the "git
subtree pull" command puts "test" into a broken state. If we had added
the subtree with any other prefix, the problem would not have
occurred. Likewise, if we had added "temp" anywhere other than the top
level of "project" the subtree pull would not have caused problems.

Anyhow, I'm not sure if you guys are aware of the problem or not, but
I figured I'd bring it to your attention just in case.

Thanks so much,
- Andrew


Re: [PATCH] fetch-pack: show clearer error message upon ERR

2017-04-12 Thread Jonathan Tan

On 04/11/2017 02:47 PM, Jonathan Nieder wrote:

nit about the error message: since this isn't a sign of an internal
error, we don't need to tell the user that it happened in git
fetch-pack.  How about using the same error used e.g. in
run_remote_archiver and get_remote_heads?

die(_("remote error: %s"), arg);

With that change,
Reviewed-by: Jonathan Nieder 


Thanks - used your error message suggestion in the latest version [1].

[1] 
https://public-inbox.org/git/20170412180602.12713-1-jonathanta...@google.com/



Follow-up ideas:

* would it be straightforward to verify this error message is produced
  correctly in tests?  If not, is there some way to manually trigger it
  as a sanity-check?


The best idea I can think of is to intercept fetch-pack's output and 
substitute a fake hash for a certain hash. This is probably easiest to 
do if we run upload-pack in stateless RPC mode (so that we can pipe 
fetch-pack's output into sed, and then into upload-pack) but that would 
require something like my test-un-pkt helper (in [2]). The change in 
this patch seems to be at best, correct, and at worst, harmless, so I 
didn't pursue this further.


[2] 
https://public-inbox.org/git/40f36d5eeb984adc220a4038fc77ed6ad1398fef.1491851452.git.jonathanta...@google.com/



* Documentation/technical/pack-protocol.txt only says that an error-line
  can be produced in response to the initial git-upload-pack command.
  Can it be updated to say where they are now allowed?


Done (in the latest version).


* Are there other points in the protocol where it would be useful to
  allow an error-line?  Is there some lower level place where they could
  be handled?


For fetch-pack/upload-pack, I'm not sure - the sideband already has its 
own error reporting mechanism, so everything might already be covered 
(although I didn't look into detail). Lower-level handling would be 
ideal, but something that I'm not sure off-hand how to do - pkt-lines 
are used quite diversely (for both text and binary data, and with 
sideband indicator and without), so we would need to ensure that a 
solution would work with all those (and any future uses).


[PATCH v2] fetch-pack: show clearer error message upon ERR

2017-04-12 Thread Jonathan Tan
Currently, fetch-pack prints a confusing error message ("expected
ACK/NAK") when the server it's communicating with sends a pkt-line
starting with "ERR".  Replace it with a less confusing error message.

Also update the documentation describing the fetch-pack/upload-pack
protocol (pack-protocol.txt) to indicate that "ERR" can be sent in the
place of "ACK" or "NAK". In practice, this has been done for quite some
time by other Git implementations (e.g. JGit sends "want $id not valid")
and by Git itself (since commit bdb31ea: "upload-pack: report "not our
ref" to client", 2017-02-23) whenever a "want" line references an object
that it does not have. (This is uncommon, but can happen if a repository
is garbage-collected during a negotiation.)

Signed-off-by: Jonathan Tan 
---

Changes from v1:
 - used jrneider's suggested change to error message
 - added documentation about "ERR" in protocol
 - updated commit message to explain documentation change

 Documentation/technical/pack-protocol.txt | 7 ++-
 fetch-pack.c  | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index c59ac9936..5b0ba3ef2 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -351,14 +351,19 @@ ACK after 'done' if there is at least one common base and 
multi_ack or
 multi_ack_detailed is enabled. The server always sends NAK after 'done'
 if there is no common base found.
 
+Instead of 'ACK' or 'NAK', the server may send an error message (for
+example, if it does not recognize an object in a 'want' line received
+from the client).
+
 Then the server will start sending its packfile data.
 
 
-  server-response = *ack_multi ack / nak
+  server-response = *ack_multi ack / nak / error-line
   ack_multi   = PKT-LINE("ACK" SP obj-id ack_status)
   ack_status  = "continue" / "common" / "ready"
   ack = PKT-LINE("ACK" SP obj-id)
   nak = PKT-LINE("NAK")
+  error-line =  PKT-LINE("ERR" SP explanation-text)
 
 
 A simple clone may look like this (with no 'have' lines):
diff --git a/fetch-pack.c b/fetch-pack.c
index d07d85ce3..4bed270c5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -276,6 +276,8 @@ static enum ack_type get_ack(int fd, unsigned char 
*result_sha1)
return ACK;
}
}
+   if (skip_prefix(line, "ERR ", ))
+   die(_("remote error: %s"), arg);
die(_("git fetch-pack: expected ACK/NAK, got '%s'"), line);
 }
 
-- 
2.12.2.715.g7642488e1d-goog



Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-12 Thread Taylor Blau
I think this is a great approach and one that I'd be happy to implement in LFS.
The additional capability isn't too complex, so I think other similar filters to
LFS shouldn't have a hard time implementing it either.

I left a few comments, mostly expressing approval to the documentation changes.
I'll leave the C review to someone more expert than me.

+1 from me on the protocol changes.

> +Delay
> +^
> +
> +If the filter supports the "delay" capability, then Git can send the
> +flag "delay-able" after the filter command and pathname.

Nit: I think either way is fine, but `can_delay` will save us 1 byte per each
new checkout entry.

> +"delay-id", a number that identifies the blob, and a flush packet. The
> +filter acknowledges this number with a "success" status and a flush
> +packet.

I mentioned this in another thread, but I'd prefer, if possible, that we use the
pathname as a unique identifier for referring back to a particular checkout
entry. I think adding an additional identifier adds unnecessary complication to
the protocol and introduces a forced mapping on the filter side from id to
path.

Both Git and the filter are going to have to keep these paths in memory
somewhere, be that in-process, or on disk. That being said, I can see potential
troubles with a large number of long paths that exceed the memory available to
Git or the filter when stored in a hashmap/set.

On Git's side, I think trading that for some CPU time might make sense. If Git
were to SHA1 each path and store that in a hashmap, it would consume more CPU
time, but less memory to store each path. Git and the filter could then exchange
path names, and Git would simply SHA1 the pathname each time it needed to refer
back to memory associated with that entry in a hashmap.

> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.

I think the blocking nature of the `list_available_blobs` command is a great
synchronization mechanism for telling the filter when it can and can't send new
blob information to Git. I was tenatively thinking of suggesting to remove this
command and instead allow the filter to send readied blobs in sequence after all
unique checkout entries had been sent from Git to the filter. But I think this
allows approach allows us more flexibility, and isn't that much extra
complication or bytes across the pipe.


--
Thanks,
Taylor Blau


Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-12 Thread Taylor Blau
> > (And at this point, may I suggest to change "delay-id" into "request-id=1" ?
>
> If there is no objection by another reviewer then I am happy to change it.

I think "delay-id" may be more illustrative of what's occurring in this request.
That being said, my preference would be that we remove the
"delay-id"/"request-id" entirely from the protocol, and make Git responsible for
handling the path lookup by a hashmap.

Is the concern that a hashmap covering all entries in a large checkout would be
too large to keep in memory? If so, keeping an opaque ID as a part of the
protocol is something I would not object to.

> >> +packet:  git> 
> >> +packet:  git>   # empty content!
> >> +packet:  git< status=success
> >> +packet:  git< 
> >> +packet:  git< SMUDGED_CONTENT
> >> +packet:  git< 
> >> +packet:  git< 
> >
> > OK, good.
> >
> > The quest is: what happens next ?
> >
> > 2 things, kind of in parallel, but we need to prioritize and serialize:
> > - Send the next blob
> > - Fetch ready blobs
> > - And of course: ask for more ready blobs.
> > (it looks as if Peff and Jakub had useful comments already,
> >  so I can stop here?)
>
> I would like to keep the mechanism as follows:
>
> 1. sends all blobs to the filter
> 2. fetch blobs until we are done
>
> @Taylor: Do you think that would be OK for LFS?

I think that this would be fine for LFS and filters of this kind in general. For
LFS in particular, my initial inclination would be to have the protocol open
support writing blob data back to Git at anytime during the checkout process,
not just after all blobs have been sent to the filter.

That being said, I don't think this holds up in practice. The blobs are too big
to fit in memory anyway, and will just end up getting written to LFS's object
cache in .git/lfs/objects.

Since they're already in there, all we would have to do is keep the list of
`readyIds map[int]*os.File` in memory (or even map int -> LFS OID, and open the
file later), and then `io.Copy()` from the open file back to Git.

This makes me think of adding another capability to the protocol, which would
just be exchanging paths on disk in `/tmp` or any other directory so that we
wouldn't have to stream content over the pipe. Instead of responding with

packet:  git< status=success
packet:  git< 
packet:  git< SMUDGED_CONTENT
packet:  git< 
packet:  git< 

We could respond with:

packet:  git< status=success
packet:  git< 
packet:  git< /path/to/contents.dat # <-
packet:  git< 
packet:  git< 

Git would then be responsible for opening that file on disk (the filter would
guarantee that to be possible), and then copying its contents into the working
tree.

I think that's a topic for later discussion, though :-).

> > In general, Git should not have a unlimited number of blobs outstanding,
> > as memory constraints may apply.
> > There may be a config variable for the number of outstanding blobs,
> > (similar to the window size in other protocols) and a variable
> > for the number of "send bytes in outstanding blobs"
> > (similar to window size (again!) in e.g TCP)
> >
> > The number of outstanding blobs is may be less important, and it is more
> > important to monitor the number of bytes we keep in memory in some way.
> >
> > Something like "we set a limit to 500K of out standng data", once we are
> > above the limit, don't send any new blobs.
>
> I don't expect the filter to keep everything in memory. If there is no memory
> anymore then I expect the filter to spool to disk. This keeps the protocol 
> simple.
> If this turns out to be not sufficient then we could improve that later, too.

Agree.


--
Thanks,
Taylor Blau


Re: [PATCH] ls-files: properly prepare submodule environment

2017-04-12 Thread Brandon Williams
On 04/11, Jacob Keller wrote:
> From: Jacob Keller 
> 
> Since commit e77aa336f116 ("ls-files: optionally recurse into
> submodules", 2016-10-07) ls-files has known how to recurse into
> submodules when displaying files.
> 
> Unfortunately this code does not work as expected. Initially, it
> produced results indicating that a --super-prefix can't be used from
> a subdirectory:
> 
>   fatal: can't use --super-prefix from a subdirectory
> 
> After commit b58a68c1c187 ("setup: allow for prefix to be passed to git
> commands", 2017-03-17) this behavior changed and resulted in repeated
> calls of ls-files with an expanding super-prefix.
> 
> This recursive expanding super-prefix appears to be the result of
> ls-files acting on the super-project instead of on the submodule files.
> 
> We can fix this by properly preparing the submodule environment when
> setting up the submodule process. This ensures that the command we
> execute properly reads the directory and ensures that we correctly
> operate in the submodule instead of repeating oureslves on the
> super-project contents forever.
> 
> While we're here lets also add some tests to ensure that ls-files works
> with recurse-submodules to catch these issues in the future.
> 
> Signed-off-by: Jacob Keller 
> ---
> I found this fix on accident after looking at git-grep code and
> comparing how ls-files instantiated the submodule. Can someone who knows
> more about submodules explain the reasoning better for this fix?
> Essentially we "recurse" into the submodule folder, but still operate as
> if we're at the root of the project but with a new prefix. So then we
> keep recursing into the submodule forever.

The reason why this fix is required is that the env var GIT_DIR is set
to be the parents gitdir.  When recursing the childprocess just uses the
parents gitdir instead of its own causing it to recurse into itself
again and again.  The child process's environment needs to have the
GIT_DIR var cleared so that the child will do discovery and actually
find its own gitdir.

> 
> I also added some tests here, and I *definitely* think this should be a
> maintenance backport into any place which supports ls-files
> --recurse-submodules, since as far as I can tell it is otherwise
> useless.
> 
> There were no tests for it, so I added some based on git-grep tests. I
> did not try them against the broken setups, because of the endless
> recursion.

There are tests for ls-files --recurse-submodules in
t3007-ls-files-recurse-submodules.sh, though it looks like this
particular case (which triggers this bug) maybe didn't have any tests.
I'm actually unsure of why the existing tests didn't catch this (I'm
probably just bad at writing tests).


> 
>  builtin/ls-files.c |   4 +
>  t/t3080-ls-files-recurse-submodules.sh | 162 
> +
>  2 files changed, 166 insertions(+)
>  create mode 100755 t/t3080-ls-files-recurse-submodules.sh
> 
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index d449e46db551..e9b3546ca053 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -15,6 +15,7 @@
>  #include "string-list.h"
>  #include "pathspec.h"
>  #include "run-command.h"
> +#include "submodule.h"
>  
>  static int abbrev;
>  static int show_deleted;
> @@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce)
>   struct child_process cp = CHILD_PROCESS_INIT;
>   int status;
>  
> + prepare_submodule_repo_env(_array);
> + argv_array_push(_array, GIT_DIR_ENVIRONMENT);

Yes these lines need to be included in order to prepare the environment.
Thanks for catching that.

> +
>   if (prefix_len)
>   argv_array_pushf(_array, "%s=%s",
>GIT_TOPLEVEL_PREFIX_ENVIRONMENT,
> diff --git a/t/t3080-ls-files-recurse-submodules.sh 
> b/t/t3080-ls-files-recurse-submodules.sh
> new file mode 100755
> index ..6788a8f09635
> --- /dev/null
> +++ b/t/t3080-ls-files-recurse-submodules.sh
> @@ -0,0 +1,162 @@
> +#!/bin/sh
> +
> +test_description='Test ls-files recurse-submodules feature
> +
> +This test verifies the recurse-submodules feature correctly lists files 
> across
> +submodules.
> +'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup directory structure and submodule' '
> + echo "foobar" >a &&
> + mkdir b &&
> + echo "bar" >b/b &&
> + git add a b &&
> + git commit -m "add a and b" &&
> + git init submodule &&
> + echo "foobar" >submodule/a &&
> + git -C submodule add a &&
> + git -C submodule commit -m "add a" &&
> + git submodule add ./submodule &&
> + git commit -m "added submodule"
> +'
> +
> +test_expect_success 'ls-files correctly lists files in a submodule' '
> + cat >expect <<-\EOF &&
> + .gitmodules
> + a
> + b/b
> + submodule/a
> + EOF
> +
> + git ls-files --recurse-submodules >actual &&
> + test_cmp expect actual

RE: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-12 Thread Kevin Willford

> -Original Message-
> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On
> Behalf Of Duy Nguyen
> Sent: Wednesday, April 12, 2017 7:21 AM
> To: Kevin Willford 
> Cc: Kevin Willford ; git@vger.kernel.org;
> gits...@pobox.com; p...@peff.net
> Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid
> data loss.
> 
> On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford 
> wrote:
> > The loss of the skip-worktree bits is part of the problem if you are
> > talking about modified files.  The other issue that I was having is
> > when running a reset and there were files added in the commit that is
> > being reset, there will not be an entry in the index and not a file on
> > disk so the data for that file is completely lost at that point.
> > "status" also doesn't include anything about this loss of data.  On
> > modified files status will at least have the file as deleted since
> > there is still an index entry but again the previous version of the file 
> > and it's
> data is lost.
> 
> Well, we could have "deleted" index entries, those marked with
> CE_REMOVE. They are never written down to file though, so 'status'
> won't benefit from that. Hopefully we can restore the file before the index
> file is written down and we really lose skip-worktree bits?

Because this is a reset --mixed it will never run through unpack_trees and 
The entries are never marked with CE_REMOVE.

> 
> > To me this is totally unexpected behavior, for example if I am on a
> > commit where there are only files that where added and run a reset
> > HEAD~1 and then a status, it will show a clean working directory.
> > Regardless of skip-worktree bits the user needs to have the data in
> > the working directory after the reset or data is lost which is always bad.
> 
> I agree we no longer have a place to save the skip-worktree bit, we have to
> restore the state back as if skip-worktree bit does not exist.
> It would be good if we could keep the logic in unpack_trees() though.
> For example, if the file is present on disk even if skip-worktree bit is on,
> unpack_trees() would abort instead of silently overwriting it.
> This is a difference between skip-worktree and assume-unchanged bits.
> If you do explicit checkout_entry() you might have to add more checks to
> keep behavior consistent.
> --
> Duy

Because this is a reset --mixed it will follow the code path calling 
read_from_tree
and ends up calling update_index_from_diff in the format_callback of the diff,
so unpack_trees() is never called in the --mixed case.  This code change also 
only applies
when the file does not exist and the skip-worktree bit is on and the updated 
index entry either will be missing (covers the added scenario) or was not 
missing
before (covers the modified scenario).  If there is a better way to get the 
previous
index entry to disk than what I am doing, I am happy to implement it correctly.

Thanks,
Kevin


[PATCH] worktree: change 'worktree' to 'working tree' in help text

2017-04-12 Thread Nguyễn Thái Ngọc Duy
We have been trying to keep the terminology consistent on the
user-facing front. Let's keep sticking to "working tree".

While at there, change 'other' to 'another'. The latter is probably more
correct.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/worktree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3dab07c829..b8ea1ef966 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -330,7 +330,7 @@ static int add(int ac, const char **av, const char *prefix)
char *path;
const char *branch;
struct option options[] = {
-   OPT__FORCE(, N_("checkout  even if already 
checked out in other worktree")),
+   OPT__FORCE(, N_("checkout  even if already 
checked out in another working tree")),
OPT_STRING('b', NULL, _branch, N_("branch"),
   N_("create a new branch")),
OPT_STRING('B', NULL, _branch_force, N_("branch"),
-- 
2.11.0.157.gd943d85



[PATCH] worktree add: add --lock option

2017-04-12 Thread Nguyễn Thái Ngọc Duy
As explained in the document. This option has an advantage over the
command sequence "git worktree add && git worktree lock": there will be
no gap that somebody can accidentally "prune" the new worktree (or soon,
explicitly "worktree remove" it).

"worktree add" does keep a lock on while it's preparing the worktree.
If --lock is specified, this lock remains after the worktree is created.

Suggested-by: David Taylor 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 A patch that adds --lock may look like this.

 Documentation/git-worktree.txt |  7 ++-
 builtin/worktree.c | 12 +++-
 t/t2025-worktree-add.sh|  6 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 553cf8413f..b472acc356 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -9,7 +9,7 @@ git-worktree - Manage multiple working trees
 SYNOPSIS
 
 [verse]
-'git worktree add' [-f] [--detach] [--checkout] [-b ]  
[]
+'git worktree add' [-f] [--detach] [--checkout] [--lock] [-b ] 
 []
 'git worktree list' [--porcelain]
 'git worktree lock' [--reason ] 
 'git worktree prune' [-n] [-v] [--expire ]
@@ -107,6 +107,11 @@ OPTIONS
such as configuring sparse-checkout. See "Sparse checkout"
in linkgit:git-read-tree[1].
 
+--lock::
+   Keep the working tree locked after creation. This is the
+   equivalent of `git worktree lock` after `git worktree add`,
+   but without race condition.
+
 -n::
 --dry-run::
With `prune`, do not remove anything; just report what it would
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 9993ded41a..3dab07c829 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -24,6 +24,7 @@ struct add_opts {
int force;
int detach;
int checkout;
+   int keep_locked;
const char *new_branch;
int force_new_branch;
 };
@@ -305,7 +306,15 @@ static int add_worktree(const char *path, const char 
*refname,
 done:
strbuf_reset();
strbuf_addf(, "%s/locked", sb_repo.buf);
-   unlink_or_warn(sb.buf);
+   if (!ret && opts->keep_locked) {
+   /*
+* Don't keep the confusing "initializing" message
+* after it's already over.
+*/
+   truncate(sb.buf, 0);
+   } else {
+   unlink_or_warn(sb.buf);
+   }
argv_array_clear(_env);
strbuf_release();
strbuf_release();
@@ -328,6 +337,7 @@ static int add(int ac, const char **av, const char *prefix)
   N_("create or reset a branch")),
OPT_BOOL(0, "detach", , N_("detach HEAD at named 
commit")),
OPT_BOOL(0, "checkout", , N_("populate the new 
working tree")),
+   OPT_BOOL(0, "lock", _locked, N_("keep the new working 
tree locked")),
OPT_END()
};
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index b618d6be21..6dce920c03 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -63,6 +63,12 @@ test_expect_success '"add" worktree' '
)
 '
 
+test_expect_success '"add" worktree with lock' '
+   git rev-parse HEAD >expect &&
+   git worktree add --detach --lock here-with-lock master &&
+   test_must_be_empty .git/worktrees/here-with-lock/locked
+'
+
 test_expect_success '"add" worktree from a subdir' '
(
mkdir sub &&
-- 
2.11.0.157.gd943d85



Re: git work trees

2017-04-12 Thread Duy Nguyen
On Tue, Apr 11, 2017 at 10:14 PM, taylor, david  wrote:
> We are using Git in a distributed environment.
>
> In the United States, we have the master repository in one state and a build 
> cluster in a different state.
> In addition to people in the US doing builds, we have people in other 
> countries (Ireland, India, Israel,
> Russia, possibly others) doing builds -- using the build cluster.
>
> The local mirror of the repository is NFS accessible.  The plan is to make 
> builds faster through the use
> of work trees.  The build cluster nodes involved in the build will have a 
> worktree in RAM -- checked out
> for the duration of the build.  Since the worktree is in RAM, it will not be 
> NFS accessible.
>
> [Cloning takes 20+ minutes when the network is unloaded.  Building, with 
> sources NFS mounted, takes
> 5-10 minutes.]

Using worktrees in this scenario kinda defeats the distributed nature
of git. Cloning takes long, yes. But what about just "git pull" (or
"git fetch && git checkout -f" if you want to avoid merging)?

> This presents a few problems.
>
> When we are done with a work tree, we want to clean up -- think: prune.  But, 
> you cannot prune just
> one worktree; you have to prune the set.  And no machine has access to all 
> the worktrees.  So, no
> machine knows which ones are prunable.

By "prune one worktree", did you mean delete one? Or delete a branch
the worktree uses and prune the object database?

> There is no 'lock' option to 'add'.  If someone does a 'prune' after you do 
> an 'add' and before you do a
> 'lock', then your 'add' is undone.
>
> Are there any plans to add a '[--lock]' option to 'add' to create the 
> worktree in the locked state?  And/or
> plans to add a [...] option to prune to say 'prune only this path / 
> these paths'?

So this is "git worktree prune". Adding "worktree add --locked" sounds
reasonable (and quite simple too, because "worktree add" does lock the
worktree at creation time; we just need to stop it from releasing the
lock). I might be able to do it quickly (it does not mean "available
in the next release" though).

If you need to just prune "this path", I think it's the equivalent of
"git worktree remove" (i.e. delete a specific worktree). Work has been
going on for a while to add that command. Maybe it'll be available
later this year.

> If there are no plans, is the above an acceptable interface?  And if we 
> implemented it, would it be looked
> upon favorably?

Speaking of this use case (and this is my own opinion) I think this is
stretching "git worktree" too much. When I created it, I imagined this
functionality to be used by a single person.
-- 
Duy


Re: `git status` output is very misleading after a merge on a "detached HEAD"

2017-04-12 Thread Ævar Arnfjörð Bjarmason
On Wed, Apr 12, 2017 at 3:11 PM, Michael J Gruber  wrote:
> Ævar Arnfjörð Bjarmason venit, vidit, dixit 12.04.2017 14:18:
>> On Wed, Apr 12, 2017 at 7:43 AM, Michael J Gruber  wrote:
>>> Am 11. April 2017 22:40:14 MESZ schrieb "Ævar Arnfjörð Bjarmason" 
>>> :
 On Tue, Apr 11, 2017 at 5:13 PM, Enis Bayramoğlu
> HEAD detached from origin/master 1 commit ago,

 In lieu of that, which would need some of the rev-list machinery to be
 invoked on every git-status, I wonder if just saying "HEAD detached &
 diverged from origin/master" wouldn't be clearer:

 diff --git a/wt-status.c b/wt-status.c
 index 308cf3779e..79c8cfd1cf 100644
 --- a/wt-status.c
 +++ b/wt-status.c
 @@ -1542,7 +1542,7 @@ static void wt_longstatus_print(struct wt_status
 *s)
if (state.detached_at)
  on_what = _("HEAD detached at ");
else
 -   on_what = _("HEAD detached from
 ");
 +   on_what = _("HEAD detached &
 diverged from ");
} else {
branch_name = "";
   on_what = _("Not currently on any branch.");



>>>
>>> No way. That would reduce the information that we give.
>>
>> How does it reduce the information we give? Maybe I've missed
>> something about what "from" means here, as opposed to "at", but it
>> seems to me to mean the same thing as "diverged" by definition, i.e.
>> we detached from the branch but we diverged from it.
>
> No, "at" means we're still at that commit - detached but not diverged.
> "from" means we only started from that commit, but are not at it any more.
>
>> Saying "diverged"
>> just makes it clearer, how does it reduce the information we give?
>
> I misread your patch on my mobile phone, sorry. I thought you meant to
> replace both "at" and "from" by "diverged from" because you considered
> them synonymous.
>
> But your patch touches just the" from" case and emphasizes the "diverge"
> aspect, which is fine, of course.

Ah, that explains the confusion. Yeah saying "diverged" for the "at"
case wouldn't make any sense, there's been no divergence.

>>> Note that the difference between from and at is also: are there commits 
>>> that we could lose when we switch away, that is: that git checkout would 
>>> warn us about?
>>
>> Right, but I don't see how that's in any way conflicting or mutually
>> exclusive with saying before hand that we've diverged from the branch.
>>
>>> Maybe improve the doc instead?
>>
>> Aside from whether my patch makes any sense, the solution to a UX
>> issue really can't be "oh this just needs to be documented". For every
>> user who's confused by some interface we provide a *tiny* minority of
>> them go and exhaustively read the docs for an explanation, will just
>> remain confused.
>>
>> I think saying from v.s. at is way too subtle, I for one have been
>> missing it for years until this thread, that's bad UX, git's also used
>> by a lot of non-native English speakers who may not at all get the
>> subtle difference between at and from in this context, or if they do
>> think the UI is using that subtlety to tell them something.
>
> Well, we have to find the right balance between clarity and brevity - an
> interface that is too chatty is a nightmare. That's why I suggested both
> doc changes and additional information.
>
> What do you think about the ahead/behind info as suggested? That should
> be more informative both qualitatively (something diverged) and
> quantitatively (by how much).

I think it's better UX, but worry about placing something like
revision walking into "git status", that can take a long time. E.g. on
linux.git doing:

git checkout origin/master
git reset --hard v2.6.13

Will yield:

$ time echo "[ahead $(git rev-list origin/master..HEAD|wc -l),
behind $(git rev-list HEAD..origin/master|wc -l)]"
[ahead 1, behind 657045]
real0m7.859s

That's admittedly a very pathological case, and could be solved in
practice by changing the copy so that after e.g. 100 commits we start
saying "over 100 commits ahead..." or something like that.


Re: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.

2017-04-12 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford  wrote:
> The loss of the skip-worktree bits is part of the problem if you are talking
> about modified files.  The other issue that I was having is when running a 
> reset
> and there were files added in the commit that is being reset, there will not
> be an entry in the index and not a file on disk so the data for that file is
> completely lost at that point.  "status" also doesn't include anything about
> this loss of data.  On modified files status will at least have the file as 
> deleted
> since there is still an index entry but again the previous version of the file
> and it's data is lost.

Well, we could have "deleted" index entries, those marked with
CE_REMOVE. They are never written down to file though, so 'status'
won't benefit from that. Hopefully we can restore the file before the
index file is written down and we really lose skip-worktree bits?

> To me this is totally unexpected behavior, for example if I am on a commit
> where there are only files that where added and run a reset HEAD~1 and
> then a status, it will show a clean working directory. Regardless of
> skip-worktree bits the user needs to have the data in the working directory
> after the reset or data is lost which is always bad.

I agree we no longer have a place to save the skip-worktree bit, we
have to restore the state back as if skip-worktree bit does not exist.
It would be good if we could keep the logic in unpack_trees() though.
For example, if the file is present on disk even if skip-worktree bit
is on, unpack_trees() would abort instead of silently overwriting it.
This is a difference between skip-worktree and assume-unchanged bits.
If you do explicit checkout_entry() you might have to add more checks
to keep behavior consistent.
-- 
Duy


Re: `git status` output is very misleading after a merge on a "detached HEAD"

2017-04-12 Thread Michael J Gruber
Ævar Arnfjörð Bjarmason venit, vidit, dixit 12.04.2017 14:18:
> On Wed, Apr 12, 2017 at 7:43 AM, Michael J Gruber  wrote:
>> Am 11. April 2017 22:40:14 MESZ schrieb "Ævar Arnfjörð Bjarmason" 
>> :
>>> On Tue, Apr 11, 2017 at 5:13 PM, Enis Bayramoğlu
 HEAD detached from origin/master 1 commit ago,
>>>
>>> In lieu of that, which would need some of the rev-list machinery to be
>>> invoked on every git-status, I wonder if just saying "HEAD detached &
>>> diverged from origin/master" wouldn't be clearer:
>>>
>>> diff --git a/wt-status.c b/wt-status.c
>>> index 308cf3779e..79c8cfd1cf 100644
>>> --- a/wt-status.c
>>> +++ b/wt-status.c
>>> @@ -1542,7 +1542,7 @@ static void wt_longstatus_print(struct wt_status
>>> *s)
>>>if (state.detached_at)
>>>  on_what = _("HEAD detached at ");
>>>else
>>> -   on_what = _("HEAD detached from
>>> ");
>>> +   on_what = _("HEAD detached &
>>> diverged from ");
>>>} else {
>>>branch_name = "";
>>>   on_what = _("Not currently on any branch.");
>>>
>>>
>>>
>>
>> No way. That would reduce the information that we give.
> 
> How does it reduce the information we give? Maybe I've missed
> something about what "from" means here, as opposed to "at", but it
> seems to me to mean the same thing as "diverged" by definition, i.e.
> we detached from the branch but we diverged from it.

No, "at" means we're still at that commit - detached but not diverged.
"from" means we only started from that commit, but are not at it any more.

> Saying "diverged"
> just makes it clearer, how does it reduce the information we give?

I misread your patch on my mobile phone, sorry. I thought you meant to
replace both "at" and "from" by "diverged from" because you considered
them synonymous.

But your patch touches just the" from" case and emphasizes the "diverge"
aspect, which is fine, of course.

>> Note that the difference between from and at is also: are there commits that 
>> we could lose when we switch away, that is: that git checkout would warn us 
>> about?
> 
> Right, but I don't see how that's in any way conflicting or mutually
> exclusive with saying before hand that we've diverged from the branch.
> 
>> Maybe improve the doc instead?
> 
> Aside from whether my patch makes any sense, the solution to a UX
> issue really can't be "oh this just needs to be documented". For every
> user who's confused by some interface we provide a *tiny* minority of
> them go and exhaustively read the docs for an explanation, will just
> remain confused.
> 
> I think saying from v.s. at is way too subtle, I for one have been
> missing it for years until this thread, that's bad UX, git's also used
> by a lot of non-native English speakers who may not at all get the
> subtle difference between at and from in this context, or if they do
> think the UI is using that subtlety to tell them something.

Well, we have to find the right balance between clarity and brevity - an
interface that is too chatty is a nightmare. That's why I suggested both
doc changes and additional information.

What do you think about the ahead/behind info as suggested? That should
be more informative both qualitatively (something diverged) and
quantitatively (by how much).

Michael


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 8:01 PM, Jeff King  wrote:
> I dunno. Maybe I am missing some subtle case, but it's not clear to me
> what the user would be trying to do by having git stay in the original
> directory.

Not sure if it really happens. But if we jump from worktree is inside
original cwd and we have to jump to worktree, we set empty prefix, not
"../../../" one. So we can't refer back to files relative to original
cwd with prefix. I was kinda hoping "super prefix" would solve it, but
that one seems designed specifically for submodules.
-- 
Duy


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Jeff King
On Wed, Apr 12, 2017 at 06:13:49PM +0700, Duy Nguyen wrote:

> > Can't we model this after how setup_git_directory_gently() gives the
> > subcommands a choice?  While the majority of subcommands do not call
> > the _gently() variant and die when we are not in a repository, the
> > rest do use it and continue after learning that they are outside a
> > repository.
> 
> It may work, but we need to be careful because paths as command line
> arguments will not be relative to cwd anymore. If some code assumes
> cwd unchanged, they're in trouble. I guess they're in trouble either
> way because of the "sometimes chdir, sometimes not" current behavior.

Exactly. The code itself must respect "prefix", and if it doesn't, it's
broken. So I don't think a code switch makes any sense here. It's not
the code which needs to care, it's the user who might be relying on the
externally visible behavior.

I can't think of a case where the existing behavior actually makes sense
for the user, though. They've provided a $GIT_WORK_TREE, but somehow
want their out-of-worktree relative paths to still work? What are they
using the paths for? Surely:

  cd /out-of-tree
  echo content >foo
  GIT_DIR=/repo/.git GIT_WORK_TREE=/repo git add foo

shouldn't work?

Likewise reading .mailmap and .gitattributes from /out-of-tree is simply
a bug.

I dunno. Maybe I am missing some subtle case, but it's not clear to me
what the user would be trying to do by having git stay in the original
directory.

-Peff


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Jeff King
On Wed, Apr 12, 2017 at 01:30:31PM +0700, Duy Nguyen wrote:

> On Tue, Apr 11, 2017 at 12:13 AM, Jeff King  wrote:
> > On Mon, Apr 10, 2017 at 07:01:00PM +0700, Duy Nguyen wrote:
> >> An alternative is, when you have found out you need to read .mailmap,
> >> you call setup_work_tree() then, which prepares the worktree for you
> >> (including moving back to cwd) or dies if worktree does not exist, or
> >> no-op if worktree has already been asked by somebody. Many commands do
> >> lazy worktree initialization this way.
> >
> > I think this is much more than just .mailmap, though. For instance, I
> > have noticed a similar problem with .gitattributes:
> 
> Urgh. assuming that we should not read .gitattributes if there's no
> worktree to read from (similar to the "defaults to .git" situation),
> how about
> 
>  - if mailmap stuff is requested, setup worktree, or die trying
>  - if worktree is detected, but setup code does not jump to it, do it
>  - if no worktree is detected, tell git-log to stop reading .gitattributes
> 
> We probablly want some "if no wotktree then die()" in .gitattributes
> and .gitignore code, just in case it's incorrectly and accidentally
> executed in exotic setup

I didn't check what we do with attributes when there is no worktree. The
behavior you describe above sounds right. But note that in my test we
_do_ have a worktree, but just look in the wrong location. So doing a
chdir to $GIT_WORK_TREE would just work.

-Peff


Re: [PATCH v6] http.postbuffer: allow full range of ssize_t values

2017-04-12 Thread Jeff King
On Tue, Apr 11, 2017 at 07:39:27PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> The only unresolved issue was whether we can count on curl being new
> >> enough for CURLOPT_POSTFIELDSIZE_LARGE to be present.  I say
> >> "unresolved" but it is resolved in my mind since git doesn't build and
> >> pass tests with such old versions of curl --- what's unresolved is
> >> formalizing what the oldest curl version is that we want to support.
> >> And that doesn't need to hold this patch hostage.
> >
> > It could build on older curl with a minor fix; the regression is in
> > v2.12. So if we did want to continue to support the same versions of
> > curl we did in v2.11, we could apply that fix and then we _would_ care
> > about #ifdef-ing this.
> 
> What would the fix be?  Have a code that notices that the value set
> to http.postbuffer is too large and ignore the request on the other
> side of #ifdef, i.e. when Git is built with older curl that lack
> CURLOPT_POSTFIELDSIZE_LARGE?

The fix I meant there is for a different spot. During the course of the
discussion, somebody noticed that v2.12 does not compile using older
curls:

  http://public-inbox.org/git/20170404133241.ga15...@gevaerts.be/

So _if_ we care about those older curls, then we should consider that a
regression and fix it on the v2.12-maint track.

And likewise, we should not accept this patch into master without a
similar fix. Which is probably, yes, an ifdef for older curl that uses
the non-LARGE version of POSTFIELDSIZE and defines xcurl_off_t() to
check against "long" and complain when it overflows.

-Peff


Re: `git status` output is very misleading after a merge on a "detached HEAD"

2017-04-12 Thread Ævar Arnfjörð Bjarmason
On Wed, Apr 12, 2017 at 7:43 AM, Michael J Gruber  wrote:
> Am 11. April 2017 22:40:14 MESZ schrieb "Ævar Arnfjörð Bjarmason" 
> :
>>On Tue, Apr 11, 2017 at 5:13 PM, Enis Bayramoğlu
>>> HEAD detached from origin/master 1 commit ago,
>>
>>In lieu of that, which would need some of the rev-list machinery to be
>>invoked on every git-status, I wonder if just saying "HEAD detached &
>>diverged from origin/master" wouldn't be clearer:
>>
>>diff --git a/wt-status.c b/wt-status.c
>>index 308cf3779e..79c8cfd1cf 100644
>>--- a/wt-status.c
>>+++ b/wt-status.c
>>@@ -1542,7 +1542,7 @@ static void wt_longstatus_print(struct wt_status
>>*s)
>>if (state.detached_at)
>>  on_what = _("HEAD detached at ");
>>else
>>-   on_what = _("HEAD detached from
>>");
>>+   on_what = _("HEAD detached &
>>diverged from ");
>>} else {
>>branch_name = "";
>>   on_what = _("Not currently on any branch.");
>>
>>
>>
>
> No way. That would reduce the information that we give.

How does it reduce the information we give? Maybe I've missed
something about what "from" means here, as opposed to "at", but it
seems to me to mean the same thing as "diverged" by definition, i.e.
we detached from the branch but we diverged from it. Saying "diverged"
just makes it clearer, how does it reduce the information we give?

> Note that the difference between from and at is also: are there commits that 
> we could lose when we switch away, that is: that git checkout would warn us 
> about?

Right, but I don't see how that's in any way conflicting or mutually
exclusive with saying before hand that we've diverged from the branch.

> Maybe improve the doc instead?

Aside from whether my patch makes any sense, the solution to a UX
issue really can't be "oh this just needs to be documented". For every
user who's confused by some interface we provide a *tiny* minority of
them go and exhaustively read the docs for an explanation, will just
remain confused.

I think saying from v.s. at is way too subtle, I for one have been
missing it for years until this thread, that's bad UX, git's also used
by a lot of non-native English speakers who may not at all get the
subtle difference between at and from in this context, or if they do
think the UI is using that subtlety to tell them something.


[PATCH v2 1/1] Document how to normalize the line endings

2017-04-12 Thread tboegi
From: Torsten Bögershausen 

The instructions how to normalize the line endings should have been updated
as part of commit 6523728499e 'convert: unify the "auto" handling of CRLF',
(but that part never made it into the commit).

Update the documentation in Documentation/gitattributes.txt
and add a test case in t0025.

Reported by Kristian Adrup
https://github.com/git-for-windows/git/issues/954

Signed-off-by: Torsten Bögershausen 
---
 Documentation/gitattributes.txt |  6 ++
 t/t0025-crlf-auto.sh| 26 ++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 976243a..3b76687 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -227,11 +227,9 @@ From a clean working directory:
 
 -
 $ echo "* text=auto" >.gitattributes
-$ rm .git/index # Remove the index to force Git to
-$ git reset # re-scan the working directory
+$ rm .git/index # Remove the index to re-scan the working directory
+$ git add .
 $ git status# Show files that will be normalized
-$ git add -u
-$ git add .gitattributes
 $ git commit -m "Introduce end-of-line normalization"
 -
 
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index d0bee08..89826c5 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -152,4 +152,30 @@ test_expect_success 'eol=crlf _does_ normalize binary 
files' '
test -z "$LFwithNULdiff"
 '
 
+test_expect_success 'prepare unnormalized' '
+   > .gitattributes &&
+   git config core.autocrlf false &&
+   printf "LINEONE\nLINETWO\r\n" >mixed &&
+   git add mixed .gitattributes &&
+   git commit -m "Add mixed" &&
+   git ls-files --eol | egrep "i/crlf" &&
+   git ls-files --eol | egrep "i/mixed"
+'
+
+test_expect_success 'normalize unnormalized' '
+   echo "* text=auto" >.gitattributes &&
+   rm .git/index &&
+   git add . &&
+   git commit -m "Introduce end-of-line normalization" &&
+   git ls-files --eol | tr "\\t" " " | sort >act &&
+cat >exp <

Re: `git status` output is very misleading after a merge on a "detached HEAD"

2017-04-12 Thread Michael J Gruber
Enis Bayramoğlu venit, vidit, dixit 12.04.2017 08:15:
> On Wed, Apr 12, 2017 at 8:43 AM, Michael J Gruber  wrote:
>> Am 11. April 2017 22:40:14 MESZ schrieb "Ævar Arnfjörð Bjarmason" 
>> :
>>> On Tue, Apr 11, 2017 at 5:13 PM, Enis Bayramoğlu
>>>  wrote:
> Well, what do you suggest as an alternative?
>
> Git tells you that you are in detached state and where you came from
> (detached from).

 I think it'd be best if git status somehow indicated that you're no
 longer at the same commit. Maybe something like:

 $ git status
 HEAD detached from origin/master, no longer at the same commit
 nothing to commit, working directory clean
>>>
>>> I'm not saying this is clear, I didn't know this until I read the code
>>> just now, but for what it's worth it says "detached at" if you're
>>> detached from BRANCH but at the same commit, and "detached from" if
>>> you're now on a different commit.
>>>
>>
>> That's what I explained in my first reply which the OP quoted in a chopped 
>> way.  I think he even misquoted the git output he got.
>>
>> It's the difference between from and at.
> 
> You're right, I hadn't noticed the difference, I really thought I
> copied the git output verbatim from the console, but I must've been
> confused while switching windows. Sorry about that.
> 
> I consider myself fairly fluent in English, but it's not my native
> language. If you think head detached "from" vs. "at" is immediately
> and unambiguously clear about what's going on, then I guess it's not
> worth changing the behavior.

I don't mind trying to make this clearer (which is why I asked for
alternatives), I just don't want to reduce the amount of information.
The man pages mentions neither "from" nor "at", so that would be a
starting point.

In fact, I think that "git status [--short]" lacks some of the
information that you only get from, e.g., the git prompt or from "git
checkout" after you switch away. I've set out to amend "git status" with
"in-progress information" (bisecting etc.) in a different thread.

Now for the detached HEAD, I see two variants:

"git branch --list -vv" displays information such as "[origin/next:
ahead 2, behind 3]" for a branch that has diverged from its upstream
"origin/next", and whose upstream has new commits. Mimicking that, git
status could display

HEAD detached from origin/next [ahead 2, behind 3]

either by default or with an extra flag. This requires a "git rev-list
--count --left-right origin/next...HEAD", which is not too bad. The
behind info might be superfluous, but maybe keeping it similar to "git
branch" is beneficial, maybe even more similar:

HEAD detached [from origin/next: ahead 2, behind 3]

"git checkout somewhereElse" display information such as "Warning: you
are leaving 2 commits behind, not connected to any of your branches" and
lists them (the first up to 4). Mimicking that, git status could display

HEAD detached from origin/next [2 orphans]

but this takes more time to compute, especially when you have a lot of
branches. It's the more relevant information in the sense that it counts
those commits that you would loose (once pruned from the reflog) unless
you bind a ref to them or a child commit.

>>
>>
 or, to be more informative

 HEAD detached from origin/master 1 commit ago,
>>>
>>> In lieu of that, which would need some of the rev-list machinery to be
>>> invoked on every git-status, I wonder if just saying "HEAD detached &
>>> diverged from origin/master" wouldn't be clearer:
>>>
>>> diff --git a/wt-status.c b/wt-status.c
>>> index 308cf3779e..79c8cfd1cf 100644
>>> --- a/wt-status.c
>>> +++ b/wt-status.c
>>> @@ -1542,7 +1542,7 @@ static void wt_longstatus_print(struct wt_status
>>> *s)
>>>if (state.detached_at)
>>>  on_what = _("HEAD detached at ");
>>>else
>>> -   on_what = _("HEAD detached from
>>> ");
>>> +   on_what = _("HEAD detached &
>>> diverged from ");
>>>} else {
>>>branch_name = "";
>>>   on_what = _("Not currently on any branch.");
>>>
>>>
>>>
>>
>> No way. That would reduce the information that we give.
>>
>> Note that the difference between from and at is also: are there commits that 
>> we could lose when we switch away, that is: that git checkout would warn us 
>> about?
>>
>> Maybe improve the doc instead?
>>
>>>
 On Tue, Apr 11, 2017 at 5:55 PM, Michael J Gruber 
>>> wrote:
> Enis Bayramoğlu venit, vidit, dixit 11.04.2017 10:57:
>> I've encountered a very misleading output from `git status`. Here's
>>> a
>> sequence of events that demonstrates the issue:
>>
>> $ git --version
>> git version 2.12.0
>>
>> $ git checkout origin/master
>>

Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Duy Nguyen
On Wed, Apr 12, 2017 at 3:41 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>>> I think this is much more than just .mailmap, though. For instance, I
>>> have noticed a similar problem with .gitattributes:
>>
>> Urgh. assuming that we should not read .gitattributes if there's no
>> worktree to read from (similar to the "defaults to .git" situation),
>> how about
>>
>>  - if mailmap stuff is requested, setup worktree, or die trying
>>  - if worktree is detected, but setup code does not jump to it, do it
>>  - if no worktree is detected, tell git-log to stop reading .gitattributes
>
> My gut reaction is that we are doing something wrong once we start
> saying "if mailmap stuff is requested".  This is not about .mailmap
> but is about how sanely the paths relative to the root of the
> working tree (which includes a path in the index, or comparing
> $commit:$path with $path in the working tree) can be named by any
> subcommand of Git.

I probably should phrased that as "if any worktree stuff is
requested". I was under an impression that you need to pass a command
line option to use mailmap, but I dind't check the code

> Can't we model this after how setup_git_directory_gently() gives the
> subcommands a choice?  While the majority of subcommands do not call
> the _gently() variant and die when we are not in a repository, the
> rest do use it and continue after learning that they are outside a
> repository.

It may work, but we need to be careful because paths as command line
arguments will not be relative to cwd anymore. If some code assumes
cwd unchanged, they're in trouble. I guess they're in trouble either
way because of the "sometimes chdir, sometimes not" current behavior.

> Perhaps we want a new bit GOTO_WORK_TREE_ROOT that is orthogonal to
> NEED_WORK_TREE to tell the codepath that calls cmd_foo() to always
> move to the root of the working tree (if there is one), regaredless
> of the behaviour f3bb8b4b84 documents.  I have a strong suspicion
> that we didn't _care_ about a silly use case where GIT_WORK_TREE is
> specified and the command is started from somewhere completely
> unrelated to that location, and the users with such a silly use case
> didn't care either what Git does to the files in the working tree,
> i.e. what you quoted in your previous message
>
> "11. When user's cwd is outside worktree, cwd remains unchanged,
> prefix is NULL."
>
> This behavior probably started long before my topic though, mine was
> more of documentation, making worktree detection more consistent. It's
>
> was not describing the design, but just describing whatever random
> thing the code happened to be doing.

I never said otherwise :) The only thing I was worried was how long it
had behaved that way, long enough that scripts started to depend on
it? If we can get rid of special cases in setup code, I will be the
first one to be happier.
-- 
Duy


Re: Tools that do an automatic fetch defeat "git push --force-with-lease"

2017-04-12 Thread Stefan Haller
Junio C Hamano  wrote:

> On Tue, Apr 11, 2017 at 8:33 AM, Jacob Keller  wrote:
> >
> > If you're already copying sha1s around you could use those as the
> > --force-with-lease=branch:, no?
> >
> > That's better guarantee than just using --force-with-lease alone.
> 
> Absolutely. That was the _only_ way the feature was originally designed
> to be used sensibly. We really shouldn't have added that "lazy" option [...]
> 
> Perhaps we should deprecate that "lazy" feature and remove it over
> time, making sure that everybody feeds the explicit commit object name
> that was recorded by the user somewhere (e.g. the approach to tag the
> commit to record the expected remote tip, which Peff illustrated).

I agree that this is better than giving the user a false sense of
security.

The problem is that manually recording the lease is painful. Like I
illustrated, the assumption that you can "simply" do this:

  (... my working copy is in some ramdom state)

  (... now I decide I want to rewrite history)

  $ git tag lease origin/master
  $ git rebase -i
  $ git push --force-with-lease=master:lease

doesn't hold, because by the time you decide to rewrite the history it's
already too late.

To solve this, I'd have to record the lease every time I pull or push;
this is a lot of work, and easy to forget and error-prone. Hence my
suggestion to have git do this automatically. I'd be interested in your
thoughts about that proposal, Junio; you didn't say anything about that
at all yet.


-- 
Stefan Haller
Berlin, Germany
http://www.haller-berlin.de/


Re: Tools that do an automatic fetch defeat "git push --force-with-lease"

2017-04-12 Thread Stefan Haller
Jeff King  wrote:

> On Tue, Apr 11, 2017 at 02:37:27PM +0200, Stefan Haller wrote:
> 
> > Are you talking about the case where the user doesn't say git pull, but
> > instead says "git fetch && git merge --ff @{u}"? Just so that I
> > understand the concern.
> 
> Yes, that (which is the main way that I merge changes).

OK; in my proposal I already mentioned that a few other commands besides
push and pull may have to update the lease; examples I mentioned were 

   git rebase @{u}
   git reset @{u}

and you add "git merge --ff @{u}" to that list now. There might be
others that we can add to make the feature work better. (But this could
happen incrementally later, as we learn about more use cases.)

> But also what happens with:
> 
>   git merge origin/other-branch
>   git rebase origin/master
> 
> I think we only care when origin/master has independently merged
> other-branch, too. And even though we have taken its commits into
> account, we would fail (because "both sides did the same thing" is
> really out of scope for the concept of a lease). So that's OK.

I think there's nothing special to consider here; "git rebase
origin/master" updates the lease (to origin/master), period. It doesn't
matter whether origin/other-branch was merged before, or whether or not
it was independently merged to origin/master too, or whether our local
branch has cherry-picked all the commits of other-branch instead of
merging it, or whatever. In all these cases, the local branch is "up to
date" with origin/master after the rebase, so it's ok to update the
lease at that point.


-- 
Stefan Haller
Berlin, Germany
http://www.haller-berlin.de/


Re: Tools that do an automatic fetch defeat "git push --force-with-lease"

2017-04-12 Thread Stefan Haller
Stefan Haller  wrote:

> Then, every command that either integrates the remote tracking branch
> into the local branch, or updates the remote tracking branch to the
> local branch, will update the value of the "lease" entry. The most
> obvious ones are "git pull" and "git push", or course;

I thought a bit more about what this means, concretely. The problem is
that only a *successful* pull must update the lease; an unsuccessful
pull must leave it alone. Pull is either fetch+merge or fetch+rebase,
and if the merge or rebase fails with conflicts, then we can only tell
much later whether the pull was successful; in the case of merge only
after the next commit (success) or after git merge --abort (failure),
and in the case of rebase after the next rebase --continue (success), or
rebase --abort (failure).

To implement this, git pull could set the value "branch.*.pending-lease"
after fetch was successful (but leave "branch.*.lease" alone); then git
merge and git rebase could look for that value, and if successful, set
branch.*.lease to the value of branch.*.pending-lease, and delete
pending-lease. If unsuccessful, they'd just delete the pending-lease
entry. Other command may also have to delete the pending-lease entry,
e.g. git reset.

I do realize that this is a lot of complexity, and it has the potential
of missing some cases. However, this complexity is also the reason why I
can't build my own wrappers around pull/push to implement the feature
outside of git; alias mypull='git pull && git tag -f lease @{u}' just
doesn't cut it.


-- 
Stefan Haller
Berlin, Germany
http://www.haller-berlin.de/


Re: [PATCH v2] pathspec: honor `PATHSPEC_PREFIX_ORIGIN` with empty prefix

2017-04-12 Thread Junio C Hamano
Patrick Steinhardt  writes:

> Just a short reminder on this patch, as I haven't seen it or
> v1 being picked up by the What's Cooking reports. Am I simply
> being too eager or was this an oversight?

I have been offline and will be for a few more days; I may
occasionally pop in to give design level guidance on topics, but
won't be reading patches carefully enough to queue a new topic to my
tree.  New topics won't be added to my tree until next week the
earliest.


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Junio C Hamano
Duy Nguyen  writes:

>> I think this is much more than just .mailmap, though. For instance, I
>> have noticed a similar problem with .gitattributes:
>
> Urgh. assuming that we should not read .gitattributes if there's no
> worktree to read from (similar to the "defaults to .git" situation),
> how about
>
>  - if mailmap stuff is requested, setup worktree, or die trying
>  - if worktree is detected, but setup code does not jump to it, do it
>  - if no worktree is detected, tell git-log to stop reading .gitattributes

My gut reaction is that we are doing something wrong once we start
saying "if mailmap stuff is requested".  This is not about .mailmap
but is about how sanely the paths relative to the root of the
working tree (which includes a path in the index, or comparing
$commit:$path with $path in the working tree) can be named by any
subcommand of Git.

Can't we model this after how setup_git_directory_gently() gives the
subcommands a choice?  While the majority of subcommands do not call
the _gently() variant and die when we are not in a repository, the
rest do use it and continue after learning that they are outside a
repository.  

Perhaps we want a new bit GOTO_WORK_TREE_ROOT that is orthogonal to
NEED_WORK_TREE to tell the codepath that calls cmd_foo() to always
move to the root of the working tree (if there is one), regaredless
of the behaviour f3bb8b4b84 documents.  I have a strong suspicion
that we didn't _care_ about a silly use case where GIT_WORK_TREE is
specified and the command is started from somewhere completely
unrelated to that location, and the users with such a silly use case
didn't care either what Git does to the files in the working tree,
i.e. what you quoted in your previous message

"11. When user's cwd is outside worktree, cwd remains unchanged,
prefix is NULL."

This behavior probably started long before my topic though, mine was
more of documentation, making worktree detection more consistent. It's

was not describing the design, but just describing whatever random
thing the code happened to be doing.


Re: [PATCH v2] pathspec: honor `PATHSPEC_PREFIX_ORIGIN` with empty prefix

2017-04-12 Thread Patrick Steinhardt
On Tue, Apr 04, 2017 at 11:16:56AM +0200, Patrick Steinhardt wrote:
> Previous to commit 5d8f084a5 (pathspec: simpler logic to prefix original
> pathspec elements, 2017-01-04), we were always using the computed
> `match` variable to perform pathspec matching whenever
> `PATHSPEC_PREFIX_ORIGIN` is set. This is for example useful when passing
> the parsed pathspecs to other commands, as the computed `match` may
> contain a pathspec relative to the repository root. The commit changed
> this logic to only do so when we do have an actual prefix and when
> literal pathspecs are deactivated.
> 
> But this change may actually break some commands which expect passed
> pathspecs to be relative to the repository root. One such case is `git
> add --patch`, which now fails when using relative paths from a
> subdirectory. For example if executing "git add -p ../foo.c" in a
> subdirectory, the `git-add--interactive` command will directly pass
> "../foo.c" to `git-ls-files`. As ls-files is executed at the
> repository's root, the command will notice that "../foo.c" is outside
> the repository and fail.
> 
> Fix the issue by again using the computed `match` variable when
> `PATHSPEC_PREFIX_ORIGIN` is set and global literal pathspecs are
> deactivated. Note that in contrast to previous behavior, we will now
> always call `prefix_magic` regardless of whether a prefix is actually
> set. But this is the right thing to do: when the `match` variable has
> been resolved to the repository's root, it will be set to an empty
> string. When passing the empty string directly to other commands, it
> will result in a warning regarding deprecated empty pathspecs. By always
> adding the prefix magic, we will end up with at least the string
> ":(prefix:0)" and thus avoid the warning.
> 
> Signed-off-by: Patrick Steinhardt 
> ---

Just a short reminder on this patch, as I haven't seen it or
v1 being picked up by the What's Cooking reports. Am I simply
being too eager or was this an oversight?

Thanks
Patrick

> 
> This is the second version of [1]. It fixes a bug catched by
> Brandon when the pathspec is resolved to the empty string and
> improves the test a bit to actually catch this issue.
> 
> [1]: 
> http://public-inbox.org/git/1556910880cfce391bdca2d8f0cbcb8c71371691.1491206540.git...@pks.im/T/#u
> 
> 
>  pathspec.c |  2 +-
>  t/t3701-add-interactive.sh | 22 ++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/pathspec.c b/pathspec.c
> index 303efda83..3079a817a 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -505,7 +505,7 @@ static void init_pathspec_item(struct pathspec_item 
> *item, unsigned flags,
>* original. Useful for passing to another command.
>*/
>   if ((flags & PATHSPEC_PREFIX_ORIGIN) &&
> - prefixlen && !get_literal_global()) {
> + !get_literal_global()) {
>   struct strbuf sb = STRBUF_INIT;
>  
>   /* Preserve the actual prefix length of each pattern */
> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
> index f9528fa00..2ecb43a61 100755
> --- a/t/t3701-add-interactive.sh
> +++ b/t/t3701-add-interactive.sh
> @@ -436,6 +436,28 @@ test_expect_success 'add -p handles globs' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'add -p handles relative paths' '
> + git reset --hard &&
> +
> + echo base >relpath.c &&
> + git add "*.c" &&
> + git commit -m relpath &&
> +
> + echo change >relpath.c &&
> + mkdir -p subdir &&
> + git -C subdir add -p .. 2>error <<-\EOF &&
> + y
> + EOF
> +
> + test_must_be_empty error &&
> +
> + cat >expect <<-\EOF &&
> + relpath.c
> + EOF
> + git diff --cached --name-only >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'add -p does not expand argument lists' '
>   git reset --hard &&
>  
> -- 
> 2.12.2


signature.asc
Description: PGP signature


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-12 Thread Duy Nguyen
On Tue, Apr 11, 2017 at 12:13 AM, Jeff King  wrote:
> On Mon, Apr 10, 2017 at 07:01:00PM +0700, Duy Nguyen wrote:
>> An alternative is, when you have found out you need to read .mailmap,
>> you call setup_work_tree() then, which prepares the worktree for you
>> (including moving back to cwd) or dies if worktree does not exist, or
>> no-op if worktree has already been asked by somebody. Many commands do
>> lazy worktree initialization this way.
>
> I think this is much more than just .mailmap, though. For instance, I
> have noticed a similar problem with .gitattributes:

Urgh. assuming that we should not read .gitattributes if there's no
worktree to read from (similar to the "defaults to .git" situation),
how about

 - if mailmap stuff is requested, setup worktree, or die trying
 - if worktree is detected, but setup code does not jump to it, do it
 - if no worktree is detected, tell git-log to stop reading .gitattributes

We probablly want some "if no wotktree then die()" in .gitattributes
and .gitignore code, just in case it's incorrectly and accidentally
executed in exotic setup
-- 
Duy


Re: `git status` output is very misleading after a merge on a "detached HEAD"

2017-04-12 Thread Enis Bayramoğlu
On Wed, Apr 12, 2017 at 8:43 AM, Michael J Gruber  wrote:
> Am 11. April 2017 22:40:14 MESZ schrieb "Ævar Arnfjörð Bjarmason" 
> :
>>On Tue, Apr 11, 2017 at 5:13 PM, Enis Bayramoğlu
>> wrote:
 Well, what do you suggest as an alternative?

 Git tells you that you are in detached state and where you came from
 (detached from).
>>>
>>> I think it'd be best if git status somehow indicated that you're no
>>> longer at the same commit. Maybe something like:
>>>
>>> $ git status
>>> HEAD detached from origin/master, no longer at the same commit
>>> nothing to commit, working directory clean
>>
>>I'm not saying this is clear, I didn't know this until I read the code
>>just now, but for what it's worth it says "detached at" if you're
>>detached from BRANCH but at the same commit, and "detached from" if
>>you're now on a different commit.
>>
>
> That's what I explained in my first reply which the OP quoted in a chopped 
> way.  I think he even misquoted the git output he got.
>
> It's the difference between from and at.

You're right, I hadn't noticed the difference, I really thought I
copied the git output verbatim from the console, but I must've been
confused while switching windows. Sorry about that.

I consider myself fairly fluent in English, but it's not my native
language. If you think head detached "from" vs. "at" is immediately
and unambiguously clear about what's going on, then I guess it's not
worth changing the behavior.

>
>
>>> or, to be more informative
>>>
>>> HEAD detached from origin/master 1 commit ago,
>>
>>In lieu of that, which would need some of the rev-list machinery to be
>>invoked on every git-status, I wonder if just saying "HEAD detached &
>>diverged from origin/master" wouldn't be clearer:
>>
>>diff --git a/wt-status.c b/wt-status.c
>>index 308cf3779e..79c8cfd1cf 100644
>>--- a/wt-status.c
>>+++ b/wt-status.c
>>@@ -1542,7 +1542,7 @@ static void wt_longstatus_print(struct wt_status
>>*s)
>>if (state.detached_at)
>>  on_what = _("HEAD detached at ");
>>else
>>-   on_what = _("HEAD detached from
>>");
>>+   on_what = _("HEAD detached &
>>diverged from ");
>>} else {
>>branch_name = "";
>>   on_what = _("Not currently on any branch.");
>>
>>
>>
>
> No way. That would reduce the information that we give.
>
> Note that the difference between from and at is also: are there commits that 
> we could lose when we switch away, that is: that git checkout would warn us 
> about?
>
> Maybe improve the doc instead?
>
>>
>>> On Tue, Apr 11, 2017 at 5:55 PM, Michael J Gruber 
>>wrote:
 Enis Bayramoğlu venit, vidit, dixit 11.04.2017 10:57:
> I've encountered a very misleading output from `git status`. Here's
>>a
> sequence of events that demonstrates the issue:
>
> $ git --version
> git version 2.12.0
>
> $ git checkout origin/master
>
> $ git status
> HEAD detached from origin/master
> nothing to commit, working directory clean

 Hmm. My Git would display "detached at" here as long as you are on
>>the
 commit that you detached from.

> $ git merge --ff f3515b749be861b57fc70c2341c1234eeb0d5b87
>
> $ git status
> HEAD detached from origin/master
> nothing to commit, working directory clean
>
> $ git rev-parse origin/master
> e1dc1baaadee0f1aef2d5c45d068306025d11f67
>
> $ git rev-parse HEAD
> 786cb6dd09897e0950a2bdc971f0665a059efd33
>
> I think it's extremely misleading that `git status` simply reports
> "HEAD detached from origin/master" while this simply happens to be
>>a
> mildly relevant fact about some past state.
>
> Thanks and regards
>

 Well, what do you suggest as an alternative?

 Git tells you that you are in detached state and where you came from
 (detached from).

 Michael
>

Thanks,
Enis