date:20170123

Re: [PATCH 7/7] completion: recognize more long-options

2017-01-23 Thread Johannes Sixt

If at all possible, please use your real email address as the From 
address. It is pointless to hide behind a fake address because as Git 
contributor you will have to reveal your identity anyway.


Please study item (5) "Sign your work" in 
Documentation/SubmittingPatches and sign off your work.


I'm by no means a bash completion expert, but I have a comment on this 
patch.


Am 22.01.2017 um 23:57 schrieb bitte.keine.werbung.einwer...@googlemail.com:

Recognize several new long-options for bash completion in the following
commands:


AFAIR, it was a deliberate decision that potentially destructive command 
line options are not included in command completions. In the list given, 
I find these:



 - apply: --unsafe-paths
 - reset: --merge --mixed --hard --soft --patch --keep
 - rm: --force


Additionally, these options are only for internal purposes, but I may be 
wrong:



 - archive: --remote= --exec=


-- Hannes

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Giuseppe Bilotta

On Tue, Jan 24, 2017 at 12:27 AM, Junio C Hamano  wrote:
> Giuseppe Bilotta  writes:
>>
>> I'm not sure I follow. If the user doesn't want to signoff during a
>> rebase, they can simply not pass --signoff. If they do, they can not
>> pass it. Am I missing something?
>
> alias.
>
> Which also means that there needs to be --no-signoff option that can
> be given to countermand an earlier --signoff, if a user did
>
> [alias] rb = rebase --signoff
>
> and wants to disable it one time only with
>
> $ git rb --no-signoff

Oh, right, good point. This should be easy, I'll give this a go.

>>> In any case, will queue as-is so that we won't lose the patch while
>>> waiting for people to raise their opinions.
>>
>> Thanks.
>
> Thanks.  The final version would also need tests, so it may be a
> good time to start thinking about what aspect of this feature wants
> to be protected against future breakages.

I have troubles thinking how it could go wrong.  The most obvious
thing I can think of is it could not be remembered after an
interruption+continue. I'll think about this some more.

-- 
Giuseppe "Oblomov" Bilotta

[PATCH] gitk: use right colour for remote refs in the "Tags and heads" dialog

2017-01-23 Thread Paul Wise

Makes it easier to see which refs are local and which refs are remote.
Adds consistency with the remote background colour in the graph display.

Signed-off-by: Paul Wise 
---
 gitk-git/gitk | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gitk-git/gitk b/gitk-git/gitk
index a14d7a16b..14aebc23e 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -3404,6 +3404,8 @@ set rectmask {
 }
 image create bitmap reficon-H -background black -foreground "#00ff00" \
 -data $rectdata -maskdata $rectmask
+image create bitmap reficon-R -background black -foreground "#ffddaa" \
+-data $rectdata -maskdata $rectmask
 image create bitmap reficon-o -background black -foreground "#ff" \
 -data $rectdata -maskdata $rectmask
 
@@ -10022,6 +10024,7 @@ proc sel_reflist {w x y} {
 set n [lindex $ref 0]
 switch -- [lindex $ref 1] {
"H" {selbyid $headids($n)}
+   "R" {selbyid $headids($n)}
"T" {selbyid $tagids($n)}
"o" {selbyid $otherrefids($n)}
 }
@@ -10051,7 +10054,11 @@ proc refill_reflist {} {
 foreach n [array names headids] {
if {[string match $reflistfilter $n]} {
if {[commitinview $headids($n) $curview]} {
-   lappend refs [list $n H]
+   if {[string match "remotes/*" $n]} {
+   lappend refs [list $n R]
+   } else {
+   lappend refs [list $n H]
+   }
} else {
interestedin $headids($n) {run refill_reflist}
}
-- 
2.11.0

Re: [PATCH v3 4/5] show-ref: Detect dangling refs under --verify as well

2017-01-23 Thread Junio C Hamano

Vladimir Panteleev  writes:

> Move detection of dangling refs into show_one, so that they are
> detected when --verify is present as well as when it is absent.
>
> Signed-off-by: Vladimir Panteleev 
> ---
>  builtin/show-ref.c  | 16 
>  t/t1403-show-ref.sh | 22 ++
>  2 files changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/show-ref.c b/builtin/show-ref.c
> index ab8e0dc41..107d05fe0 100644
> --- a/builtin/show-ref.c
> +++ b/builtin/show-ref.c
> @@ -22,6 +22,14 @@ static void show_one(const char *refname, const struct 
> object_id *oid)
>   const char *hex;
>   struct object_id peeled;
>  
> + /* This changes the semantics slightly that even under quiet we
> +  * detect and return error if the repository is corrupt and
> +  * ref points at a nonexistent object.
> +  */

This is my fault from more than 10 years ago, but I think the
comment shouldn't have been here (or at its original location).  It
talks about the behaviour change relative to the previous version
when the comment was added, i.e. cf0adba788 ("Store peeled refs in
packed-refs file.", 2006-11-19).

I'll remove it after the series settles.

Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure

2017-01-23 Thread Junio C Hamano

Brandon Williams  writes:

> ... It seems like breaking the question and answer up
> doesn't buy you much in terms of reducing allocation churn and instead
> complicates the API with needing to keep track of two structures instead
> of a one.

In my mind, the value of having a constant check_attr is primarily
that it gives us a stable pointer to serve as a hashmap key,
i.e. the identifier for each call site, in a later iteration.

Of course, in order to populate the "question" array, we'd need the
interning of attribute names to attr objects, which need to be
protected by mutex, and you would probably not want to do that every
time the control hits the codepath.

But all of the above comes from my intuition, and I'll very much
welcome to be proven wrong with an alternative design, or better
yet, a working code based on an alternative design ;-).

Re: [PATCH] [draft]blame: add --aggregate option

2017-01-23 Thread Edmundo Carmona Antoranz

Developers of the world, rejoice! :-)

Junio, Pranit (and whoever is paying attention to the conversation
that was being held about --tips), here's a draft of what I meant when
I was talking about the option of "aggregating" blame output. I'm not
considering _all_ cases yet, just would like for people to give it a
quick test and tell me if they think it's worth "polishing" it for
inclusion into mainline git.

The output would look like this:

$ ./git blame -L 1,19 -t --aggregate builtin/blame.c
Blaming lines:   0% (19/2974), done.
   cee7f245dc builtin-pickaxe.c (Junio C Hamano   1161298804 -0700)
 1) /*
   31653c1abc builtin-blame.c   (Eugene Letuchy   1235170271 -0800)
 2)  * Blame
   cee7f245dc builtin-pickaxe.c (Junio C Hamano   1161298804 -0700)
 3)  *
   7e6ac6e439 builtin/blame.c   (David Kastrup1398470209 +0200)
 4)  * Copyright (c) 2006, 2014 by its authors
 5)  * See COPYING for licensing conditions
   cee7f245dc builtin-pickaxe.c (Junio C Hamano   1161298804 -0700)
 6)  */
 7)
 8) #include "cache.h"
   fb58c8d507 builtin/blame.c   (Michael Haggerty 1434981785 +0200)
 9) #include "refs.h"
   cee7f245dc builtin-pickaxe.c (Junio C Hamano   1161298804 -0700)
10) #include "builtin.h"
11) #include "blob.h"
12) #include "commit.h"
13) #include "tag.h"
14) #include "tree-walk.h"
15) #include "diff.h"
16) #include "diffcore.h"
17) #include "revision.h"
   717d1462ba builtin-blame.c   (Linus Torvalds   1169976846 -0800)
18) #include "quote.h"
   cee7f245dc builtin-pickaxe.c (Junio C Hamano   1161298804 -0700)
19) #include "xdiff-interface.h"



It can be seen that options like -t still work in aggregation.

In relation to the previous conversation about "tips", I think a
better name could be "hints" and it could be added on top of the
aggregation.

On Mon, Jan 23, 2017 at 8:10 PM, Edmundo Carmona Antoranz
 wrote:
> ---
>  builtin/blame.c | 78 
> +
>  1 file changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 126b8c9e5..9e8403303 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1884,6 +1884,7 @@ static const char *format_time(unsigned long time, 
> const char *tz_str,
>  #define OUTPUT_NO_AUTHOR   0200
>  #define OUTPUT_SHOW_EMAIL  0400
>  #define OUTPUT_LINE_PORCELAIN 01000
> +#define OUTPUT_AGGREGATE  02000
>
>  static void emit_porcelain_details(struct origin *suspect, int repeat)
>  {
> @@ -1931,43 +1932,36 @@ static void emit_porcelain(struct scoreboard *sb, 
> struct blame_entry *ent,
> putchar('\n');
>  }
>
> -static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int 
> opt)
> +/**
> + * Print information about the revision.
> + * This information can be used in either aggregated output
> + * or prepending each line of the content of the file being blamed
> + */
> +static void print_revision_info(char* revision_hex, int revision_length, 
> struct blame_entry* ent,
> +   struct commit* commit, struct commit_info ci, int opt, int 
> show_raw_time)
>  {
> -   int cnt;
> -   const char *cp;
> -   struct origin *suspect = ent->suspect;
> -   struct commit_info ci;
> -   char hex[GIT_SHA1_HEXSZ + 1];
> -   int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);
> -
> -   get_commit_info(suspect->commit, , 1);
> -   sha1_to_hex_r(hex, suspect->commit->object.oid.hash);
> -
> -   cp = nth_line(sb, ent->lno);
> -   for (cnt = 0; cnt < ent->num_lines; cnt++) {
> -   char ch;
> -   int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ 
> : abbrev;
> -
> -   if (suspect->commit->object.flags & UNINTERESTING) {
> +   if (opt & OUTPUT_AGGREGATE)
> +   printf("\t");
> +   int length = revision_length;
> +   if (commit->object.flags & UNINTERESTING) {
> if (blank_boundary)
> -   memset(hex, ' ', length);
> +   memset(revision_hex, ' ', length);
> else if (!(opt & OUTPUT_ANNOTATE_COMPAT)) {
> length--;
> putchar('^');
> }
> }
>
> -   printf("%.*s", length, hex);
> +   printf("%.*s", length, revision_hex);
> if (opt & OUTPUT_ANNOTATE_COMPAT) {
> const char *name;
> if (opt & OUTPUT_SHOW_EMAIL)
> name = ci.author_mail.buf;
> else
> name = ci.author.buf;
> -   printf("\t(%10s\t%10s\t%d)", name,
> +   printf("\t(%10s\t%10s\t", name,
>format_time(ci.author_time, ci.author_tz.buf,
> -

Re: [PATCH 0/12] reducing resource usage of for_each_alternate_ref

2017-01-23 Thread Jeff King

On Mon, Jan 23, 2017 at 05:33:41PM -0800, Brandon Williams wrote:

> On 01/23, Jeff King wrote:
> > 
> > A brief overview of the patches:
> > 
> >   [01/12]: for_each_alternate_ref: handle failure from real_pathdup()
> >   [02/12]: for_each_alternate_ref: stop trimming trailing slashes
> >   [03/12]: for_each_alternate_ref: use strbuf for path allocation
> > 
> > Bugfixes and cleanups (the first one is actually a recent-ish
> > regression).
> 
> Which is most likely my fault, Sorry! :)
> 
> I think the old behavior was to die and not return NULL.

Yes, it is. :)

But I think it's probably pretty hard to trigger in practice. And on the
plus side, I think the new behavior after my patch is much more sensible
than even the original.

-Peff

[PATCH] [draft]blame: add --aggregate option

2017-01-23 Thread Edmundo Carmona Antoranz

---
 builtin/blame.c | 78 +
 1 file changed, 51 insertions(+), 27 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 126b8c9e5..9e8403303 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1884,6 +1884,7 @@ static const char *format_time(unsigned long time, const 
char *tz_str,
 #define OUTPUT_NO_AUTHOR   0200
 #define OUTPUT_SHOW_EMAIL  0400
 #define OUTPUT_LINE_PORCELAIN 01000
+#define OUTPUT_AGGREGATE  02000
 
 static void emit_porcelain_details(struct origin *suspect, int repeat)
 {
@@ -1931,43 +1932,36 @@ static void emit_porcelain(struct scoreboard *sb, 
struct blame_entry *ent,
putchar('\n');
 }
 
-static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
+/**
+ * Print information about the revision.
+ * This information can be used in either aggregated output
+ * or prepending each line of the content of the file being blamed
+ */
+static void print_revision_info(char* revision_hex, int revision_length, 
struct blame_entry* ent,
+   struct commit* commit, struct commit_info ci, int opt, int 
show_raw_time)
 {
-   int cnt;
-   const char *cp;
-   struct origin *suspect = ent->suspect;
-   struct commit_info ci;
-   char hex[GIT_SHA1_HEXSZ + 1];
-   int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);
-
-   get_commit_info(suspect->commit, , 1);
-   sha1_to_hex_r(hex, suspect->commit->object.oid.hash);
-
-   cp = nth_line(sb, ent->lno);
-   for (cnt = 0; cnt < ent->num_lines; cnt++) {
-   char ch;
-   int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ : 
abbrev;
-
-   if (suspect->commit->object.flags & UNINTERESTING) {
+   if (opt & OUTPUT_AGGREGATE)
+   printf("\t");
+   int length = revision_length;
+   if (commit->object.flags & UNINTERESTING) {
if (blank_boundary)
-   memset(hex, ' ', length);
+   memset(revision_hex, ' ', length);
else if (!(opt & OUTPUT_ANNOTATE_COMPAT)) {
length--;
putchar('^');
}
}
 
-   printf("%.*s", length, hex);
+   printf("%.*s", length, revision_hex);
if (opt & OUTPUT_ANNOTATE_COMPAT) {
const char *name;
if (opt & OUTPUT_SHOW_EMAIL)
name = ci.author_mail.buf;
else
name = ci.author.buf;
-   printf("\t(%10s\t%10s\t%d)", name,
+   printf("\t(%10s\t%10s\t", name,
   format_time(ci.author_time, ci.author_tz.buf,
-  show_raw_time),
-  ent->lno + 1 + cnt);
+  show_raw_time));
} else {
if (opt & OUTPUT_SHOW_SCORE)
printf(" %*d %02d",
@@ -1975,11 +1969,7 @@ static void emit_other(struct scoreboard *sb, struct 
blame_entry *ent, int opt)
   ent->suspect->refcnt);
if (opt & OUTPUT_SHOW_NAME)
printf(" %-*.*s", longest_file, longest_file,
-  suspect->path);
-   if (opt & OUTPUT_SHOW_NUMBER)
-   printf(" %*d", max_orig_digits,
-  ent->s_lno + 1 + cnt);
-
+  ent->suspect->path);
if (!(opt & OUTPUT_NO_AUTHOR)) {
const char *name;
int pad;
@@ -1994,9 +1984,42 @@ static void emit_other(struct scoreboard *sb, struct 
blame_entry *ent, int opt)
   ci.author_tz.buf,
   show_raw_time));
}
+   }
+   if (opt & OUTPUT_AGGREGATE)
+   printf(")\n");
+}
+
+static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
+{
+   int cnt;
+   const char *cp;
+   struct origin *suspect = ent->suspect;
+   struct commit_info ci;
+   char hex[GIT_SHA1_HEXSZ + 1];
+   int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);
+   int revision_length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ 
: abbrev;
+
+   get_commit_info(suspect->commit, , 1);
+   sha1_to_hex_r(hex, suspect->commit->object.oid.hash);
+
+   if (opt & OUTPUT_AGGREGATE)
+   print_revision_info(hex, revision_length, ent, suspect->commit, 
ci, opt, show_raw_time);
+
+   cp = nth_line(sb,

Re: [PATCH 0/12] reducing resource usage of for_each_alternate_ref

2017-01-23 Thread Brandon Williams

On 01/23, Jeff King wrote:
> 
> A brief overview of the patches:
> 
>   [01/12]: for_each_alternate_ref: handle failure from real_pathdup()
>   [02/12]: for_each_alternate_ref: stop trimming trailing slashes
>   [03/12]: for_each_alternate_ref: use strbuf for path allocation
> 
> Bugfixes and cleanups (the first one is actually a recent-ish
> regression).

Which is most likely my fault, Sorry! :)

I think the old behavior was to die and not return NULL.

-- 
Brandon Williams

Re: [PATCH 3/3] git-prompt.sh: fix for submodule 'dirty' indicator

2017-01-23 Thread brian m. carlson

On Sun, Jan 22, 2017 at 08:30:21PM +0100, Benjamin Fuchs wrote:
> Fixing wrong git diff line.

This patch says 3/3, but I don't see 1 and 2.  Also, this description
doesn't tell me what the problem is, or why this fix is useful.  Such
information helps us down the line when looking at the history, and it
also helps reviewers determine whether your change makes sense.

Right now I can only guess that there's an issue with spaces or slashes
somehow.  You might want to take a look at
Documentation/SubmittingPatches, especially point 2.

> ---
>  contrib/completion/git-prompt.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/contrib/completion/git-prompt.sh 
> b/contrib/completion/git-prompt.sh
> index c44b9a2..43b28e9 100644
> --- a/contrib/completion/git-prompt.sh
> +++ b/contrib/completion/git-prompt.sh
> @@ -306,9 +306,9 @@ __git_ps1_submodule ()
>   local submodule_name="$(basename "$git_dir")"
>   if [ "$submodule_name" != ".git" ] && [ "$submodule_name" != "." ]; then
>   local parent_top="${git_dir%.git*}"
> - local submodule_top="${git_dir#*modules}"
> + local submodule_top="${git_dir#*modules/}"
>   local status=""
> - git diff -C "$parent_top" --no-ext-diff 
> --ignore-submodules=dirty --quiet -- "$submodule_top" 2>/dev/null || 
> status="+"
> + git -C "$parent_top" diff --no-ext-diff 
> --ignore-submodules=dirty --quiet -- "$submodule_top" 2>/dev/null || 
> status="+"
>   printf "$status$submodule_name:"
>   fi
>  }
> @@ -544,7 +544,7 @@ __git_ps1 ()
>  
>   local sub=""
>   if [ -n "${GIT_PS1_SHOWSUBMODULE}" ]; then
> - sub="$(__git_ps1_submodule $g)"
> + sub="$(__git_ps1_submodule "$g")"
>   fi
>  
>   local f="$w$i$s$u"
> -- 
> 2.7.4
> 

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure

2017-01-23 Thread Brandon Williams

On 01/23, Brandon Williams wrote:
> As we discussed off-line I'll also do the rework to break up the
> question and result.  That way two threads can be executing using the
> same attr_check structure.

Thinking about this I don't really see what we would gain by breaking
them up.

Right now most callers have a static attr_check struct which holds the
question and answer (and in my series a buffer of all_attrs used during
the collection process).  If this struct is broken up into question and
answer then the only part of it that can be shared with multiple threads
is the question, which ends up being an array with 2 maybe 3 entries on
average.  The result and the array of all_attrs would then need to be
allocated each time calling into the attribute system since they can't
be shared.  Since this allocation is already going to happen wouldn't it
just make sense to drop the static modifier on the check structure (or
have a per-thread check structure) if you really wanted a particular
function thread safe?  It seems like breaking the question and answer up
doesn't buy you much in terms of reducing allocation churn and instead
complicates the API with needing to keep track of two structures instead
of a one.

-- 
Brandon Williams

[PATCH 11/12] receive-pack: treat namespace .have lines like alternates

2017-01-23 Thread Jeff King

Namely, de-duplicate them. We use the same set as the
alternates, since we call them both ".have" (i.e., there is
no value in showing one versus the other).

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 8f8762e4a..c55e2f993 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -251,8 +251,9 @@ static void show_ref(const char *path, const unsigned char 
*sha1)
 }
 
 static int show_ref_cb(const char *path_full, const struct object_id *oid,
-  int flag, void *unused)
+  int flag, void *data)
 {
+   struct oidset *seen = data;
const char *path = strip_namespace(path_full);
 
if (ref_is_hidden(path, path_full))
@@ -263,8 +264,11 @@ static int show_ref_cb(const char *path_full, const struct 
object_id *oid,
 * refs, so that the client can use them to minimize data
 * transfer but will otherwise ignore them.
 */
-   if (!path)
+   if (!path) {
+   if (oidset_insert(seen, oid))
+   return 0;
path = ".have";
+   }
show_ref(path, oid->hash);
return 0;
 }
@@ -287,7 +291,7 @@ static void write_head_info(void)
 
for_each_alternate_ref(show_one_alternate_ref, );
oidset_clear();
-   for_each_ref(show_ref_cb, NULL);
+   for_each_ref(show_ref_cb, );
if (!sent_capabilities)
show_ref("capabilities^{}", null_sha1);
 
-- 
2.11.0.765.g454d2182f

[PATCH 12/12] receive-pack: avoid duplicates between our refs and alternates

2017-01-23 Thread Jeff King

We de-duplicate ".have" refs among themselves, but never
check if they are duplicates of our local refs. It's not
unreasonable that they would be if we are a "--shared" or
"--reference" clone of a similar repository; we'd have all
the same tags.

We can handle this by inserting our local refs into the
oidset, but obviously not suppressing duplicates (since the
refnames are important).

Note that this also switches the order in which we advertise
refs, processing ours first and then any alternates. The
order shouldn't matter (and arguably showing our refs first
makes more sense).

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c |  4 +++-
 t/t5400-send-pack.sh   | 38 ++
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index c55e2f993..bc7ce0ea2 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -268,6 +268,8 @@ static int show_ref_cb(const char *path_full, const struct 
object_id *oid,
if (oidset_insert(seen, oid))
return 0;
path = ".have";
+   } else {
+   oidset_insert(seen, oid);
}
show_ref(path, oid->hash);
return 0;
@@ -289,9 +291,9 @@ static void write_head_info(void)
 {
static struct oidset seen = OIDSET_INIT;
 
+   for_each_ref(show_ref_cb, );
for_each_alternate_ref(show_one_alternate_ref, );
oidset_clear();
-   for_each_ref(show_ref_cb, );
if (!sent_capabilities)
show_ref("capabilities^{}", null_sha1);
 
diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 305ca7a93..3331e0f53 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -255,4 +255,42 @@ test_expect_success 'deny pushing to delete current 
branch' '
)
 '
 
+extract_ref_advertisement () {
+   perl -lne '
+   # \\ is there to skip capabilities after \0
+   /push< ([^\\]+)/ or next;
+   exit 0 if $1 eq "";
+   print $1;
+   '
+}
+
+test_expect_success 'receive-pack de-dupes .have lines' '
+   git init shared &&
+   git -C shared commit --allow-empty -m both &&
+   git clone -s shared fork &&
+   (
+   cd shared &&
+   git checkout -b only-shared &&
+   git commit --allow-empty -m only-shared &&
+   git update-ref refs/heads/foo HEAD
+   ) &&
+
+   # Notable things in this expectation:
+   #  - local refs are not de-duped
+   #  - .have does not duplicate locals
+   #  - .have does not duplicate itself
+   local=$(git -C fork rev-parse HEAD) &&
+   shared=$(git -C shared rev-parse only-shared) &&
+   cat >expect <<-EOF &&
+   $local refs/heads/master
+   $local refs/remotes/origin/HEAD
+   $local refs/remotes/origin/master
+   $shared .have
+   EOF
+
+   GIT_TRACE_PACKET=$(pwd)/trace git push fork HEAD:foo &&
+   extract_ref_advertisement refs &&
+   test_cmp expect refs
+'
+
 test_done
-- 
2.11.0.765.g454d2182f

[PATCH 09/12] receive-pack: use oidset to de-duplicate .have lines

2017-01-23 Thread Jeff King

If you have an alternate object store with a very large
number of refs, the peak memory usage of the sha1_array can
grow high, even if most of them are duplicates that end up
not being printed at all.

The similar for_each_alternate_ref() code-paths in
fetch-pack solve this by using flags in "struct object" to
de-duplicate (and so are relying on obj_hash at the core).

But we don't have a "struct object" at all in this case. We
could call lookup_unknown_object() to get one, but if our
goal is reducing memory footprint, it's not great:

 - an unknown object is as large as the largest object type
   (a commit), which is bigger than an oidset entry

 - we can free the memory after our ref advertisement, but
   "struct object" entries persist forever (and the
   receive-pack may hang around for a long time, as the
   bottleneck is often client upload bandwidth).

So let's use an oidset. Note that unlike a sha1-array it
doesn't sort the output as a side effect. However, our
output is at least stable, because for_each_alternate_ref()
will give us the sha1s in ref-sorted order.

In one particularly pathological case with an alternate that
has 60,000 unique refs out of 80 million total, this reduced
the peak heap usage of "git receive-pack . 
---
 builtin/receive-pack.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index b9f2c0cc5..27bb52988 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -21,6 +21,7 @@
 #include "sigchain.h"
 #include "fsck.h"
 #include "tmp-objdir.h"
+#include "oidset.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
@@ -271,27 +272,24 @@ static int show_ref_cb(const char *path_full, const 
struct object_id *oid,
return 0;
 }
 
-static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
+static void show_one_alternate_ref(const char *refname,
+  const struct object_id *oid,
+  void *data)
 {
-   show_ref(".have", sha1);
-   return 0;
-}
+   struct oidset *seen = data;
 
-static void collect_one_alternate_ref(const char *refname,
- const struct object_id *oid,
- void *data)
-{
-   struct sha1_array *sa = data;
-   sha1_array_append(sa, oid->hash);
+   if (oidset_insert(seen, oid))
+   return;
+
+   show_ref(".have", oid->hash);
 }
 
 static void write_head_info(void)
 {
-   struct sha1_array sa = SHA1_ARRAY_INIT;
+   static struct oidset seen = OIDSET_INIT;
 
-   for_each_alternate_ref(collect_one_alternate_ref, );
-   sha1_array_for_each_unique(, show_one_alternate_sha1, NULL);
-   sha1_array_clear();
+   for_each_alternate_ref(show_one_alternate_ref, );
+   oidset_clear();
for_each_ref(show_ref_cb, NULL);
if (!sent_capabilities)
show_ref("capabilities^{}", null_sha1);
-- 
2.11.0.765.g454d2182f

[PATCH 10/12] receive-pack: fix misleading namespace/.have comment

2017-01-23 Thread Jeff King

The comment claims that we handle alternate ".have" lines
through this function, but that hasn't been the case since
85f251045 (write_head_info(): handle "extra refs" locally,
2012-01-06).

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 27bb52988..8f8762e4a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -261,10 +261,7 @@ static int show_ref_cb(const char *path_full, const struct 
object_id *oid,
/*
 * Advertise refs outside our current namespace as ".have"
 * refs, so that the client can use them to minimize data
-* transfer but will otherwise ignore them. This happens to
-* cover ".have" that are thrown in by add_one_alternate_ref()
-* to mark histories that are complete in our alternates as
-* well.
+* transfer but will otherwise ignore them.
 */
if (!path)
path = ".have";
-- 
2.11.0.765.g454d2182f

[PATCH 08/12] add oidset API

2017-01-23 Thread Jeff King

This is similar to many of our uses of sha1-array, but it
overcomes one limitation of a sha1-array: when you are
de-duplicating a large input with relatively few unique
entries, sha1-array uses 20 bytes per non-unique entry.
Whereas this set will use memory linear in the number of
unique entries (albeit a few more than 20 bytes due to
hashmap overhead).

Signed-off-by: Jeff King 
---
This may be overkill. You can get roughly the same thing by making
actual object structs via lookup_unknown_object(). But see the next
patch for some comments on that.

 Makefile |  1 +
 oidset.c | 49 +
 oidset.h | 45 +
 3 files changed, 95 insertions(+)
 create mode 100644 oidset.c
 create mode 100644 oidset.h

diff --git a/Makefile b/Makefile
index 27afd0f37..e41efc2d8 100644
--- a/Makefile
+++ b/Makefile
@@ -774,6 +774,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += oidset.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/oidset.c b/oidset.c
new file mode 100644
index 0..6094cff8c
--- /dev/null
+++ b/oidset.c
@@ -0,0 +1,49 @@
+#include "cache.h"
+#include "oidset.h"
+
+struct oidset_entry {
+   struct hashmap_entry hash;
+   struct object_id oid;
+};
+
+int oidset_hashcmp(const void *va, const void *vb,
+ const void *vkey)
+{
+   const struct oidset_entry *a = va, *b = vb;
+   const struct object_id *key = vkey;
+   return oidcmp(>oid, key ? key : >oid);
+}
+
+int oidset_contains(const struct oidset *set, const struct object_id *oid)
+{
+   struct hashmap_entry key;
+
+   if (!set->map.cmpfn)
+   return 0;
+
+   hashmap_entry_init(, sha1hash(oid->hash));
+   return !!hashmap_get(>map, , oid);
+}
+
+int oidset_insert(struct oidset *set, const struct object_id *oid)
+{
+   struct oidset_entry *entry;
+
+   if (!set->map.cmpfn)
+   hashmap_init(>map, oidset_hashcmp, 0);
+
+   if (oidset_contains(set, oid))
+   return 1;
+
+   entry = xmalloc(sizeof(*entry));
+   hashmap_entry_init(>hash, sha1hash(oid->hash));
+   oidcpy(>oid, oid);
+
+   hashmap_add(>map, entry);
+   return 0;
+}
+
+void oidset_clear(struct oidset *set)
+{
+   hashmap_free(>map, 1);
+}
diff --git a/oidset.h b/oidset.h
new file mode 100644
index 0..b7eaab5b8
--- /dev/null
+++ b/oidset.h
@@ -0,0 +1,45 @@
+#ifndef OIDSET_H
+#define OIDSET_H
+
+/**
+ * This API is similar to sha1-array, in that it maintains a set of object ids
+ * in a memory-efficient way. The major differences are:
+ *
+ *   1. It uses a hash, so we can do online duplicate removal, rather than
+ *  sort-and-uniq at the end. This can reduce memory footprint if you have
+ *  a large list of oids with many duplicates.
+ *
+ *   2. The per-unique-oid memory footprint is slightly higher due to hash
+ *  table overhead.
+ */
+
+/**
+ * A single oidset; should be zero-initialized (or use OIDSET_INIT).
+ */
+struct oidset {
+   struct hashmap map;
+};
+
+#define OIDSET_INIT { { NULL } }
+
+/**
+ * Returns true iff `set` contains `oid`.
+ */
+int oidset_contains(const struct oidset *set, const struct object_id *oid);
+
+/**
+ * Insert the oid into the set; a copy is made, so "oid" does not need
+ * to persist after this function is called.
+ *
+ * Returns 1 if the oid was already in the set, 0 otherwise. This can be used
+ * to perform an efficient check-and-add.
+ */
+int oidset_insert(struct oidset *set, const struct object_id *oid);
+
+/**
+ * Remove all entries from the oidset, freeing any resources associated with
+ * it.
+ */
+void oidset_clear(struct oidset *set);
+
+#endif /* OIDSET_H */
-- 
2.11.0.765.g454d2182f

[PATCH 07/12] fetch-pack: cache results of for_each_alternate_ref

2017-01-23 Thread Jeff King

We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.

Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.

The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.

However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.

Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.

There are two extra optimizations I didn't do here:

  - we actually store an array of pointer-to-object.
Technically we could just walk the obj_hash table
looking for entries with the ALTERNATE flag set (because
our use case doesn't care about the order here).

But that hash table may be mostly composed of
non-ALTERNATE entries, so we'd waste time walking over
them. So it would be a slight win in memory use, but a
loss in CPU.

  - the items we pull out of the cache are actual "struct
object"s, but then we feed "obj->sha1" to our
sub-functions, which promptly call parse_object().

This second parse is cheap, because it starts with
lookup_object() and will bail immediately when it sees
we've already parsed the object. We could save the extra
hash lookup, but it would involve refactoring the
functions we call. It may or may not be worth the
trouble.

Signed-off-by: Jeff King 
---
 fetch-pack.c | 52 ++--
 object.h |  2 +-
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 54f84c573..e0f5d5ce8 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -35,6 +35,7 @@ static const char *alternate_shallow_file;
 #define COMMON_REF (1U << 2)
 #define SEEN   (1U << 3)
 #define POPPED (1U << 4)
+#define ALTERNATE  (1U << 5)
 
 static int marked;
 
@@ -67,6 +68,41 @@ static inline void print_verbose(const struct 
fetch_pack_args *args,
fputc('\n', stderr);
 }
 
+struct alternate_object_cache {
+   struct object **items;
+   size_t nr, alloc;
+};
+
+static void cache_one_alternate(const char *refname,
+   const struct object_id *oid,
+   void *vcache)
+{
+   struct alternate_object_cache *cache = vcache;
+   struct object *obj = parse_object(oid->hash);
+
+   if (!obj || (obj->flags & ALTERNATE))
+   return;
+
+   obj->flags |= ALTERNATE;
+   ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
+   cache->items[cache->nr++] = obj;
+}
+
+static void for_each_cached_alternate(void (*cb)(struct object *))
+{
+   static int initialized;
+   static struct alternate_object_cache cache;
+   size_t i;
+
+   if (!initialized) {
+   for_each_alternate_ref(cache_one_alternate, );
+   initialized = 1;
+   }
+
+   for (i = 0; i < cache.nr; i++)
+   cb(cache.items[i]);
+}
+
 static void rev_list_push(struct commit *commit, int mark)
 {
if (!(commit->object.flags & mark)) {
@@ -253,11 +289,9 @@ static void send_request(struct fetch_pack_args *args,
write_or_die(fd, buf->buf, buf->len);
 }
 
-static void insert_one_alternate_ref(const char *refname,
-const struct object_id *oid,
-void *unused)
+static void insert_one_alternate_object(struct object *obj)
 {
-   rev_list_insert_ref(NULL, oid->hash);
+   rev_list_insert_ref(NULL, obj->oid.hash);
 }
 
 #define INITIAL_FLUSH 16
@@ -300,7 +334,7 @@ static int find_common(struct fetch_pack_args *args,
marked = 1;
 
for_each_ref(rev_list_insert_ref_oid, NULL);
-   for_each_alternate_ref(insert_one_alternate_ref, NULL);
+   for_each_cached_alternate(insert_one_alternate_object);
 
fetching = 0;
for ( ; refs ; refs = refs->next) {
@@ -621,11 +655,9 @@ static void filter_refs(struct fetch_pack_args *args,
*refs = newlist;
 }
 
-static void mark_alternate_complete(const char *refname,
-   const struct object_id *oid,
-

[PATCH 06/12] clone: disable save_commit_buffer

2017-01-23 Thread Jeff King

Normally git caches the raw commit object contents in
"struct commit". This makes it fast to run parse_commit()
followed by a pretty-print operation.

For commands which don't actually pretty-print the commits,
the caching is wasteful (and may use quite a lot of memory
if git accesses a large number of commits).

For fetching operations like clone, we already disable
save_commit_buffer in everything_local(), where we may
traverse. But clone also parses commits before it even gets
there; it looks at commits reachable from its refs, and from
alternate refs.

In one real-world case with a large number of tags, this
cut about 10MB off of clone's heap usage. Not spectacular,
but there's really no downside.

Signed-off-by: Jeff King 
---
 builtin/clone.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5ef81927a..3fca45e7e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -858,6 +858,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   save_commit_buffer = 0;
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
-- 
2.11.0.765.g454d2182f

[PATCH 05/12] for_each_alternate_ref: replace transport code with for-each-ref

2017-01-23 Thread Jeff King

The current method for getting the refs from an alternate is
to run upload-pack in the alternate and parse its output
using the normal transport code.  This works and is
reasonably short, but it has a very bad memory footprint
when there are a lot of refs in the alternate. There are two
problems:

  1. It reads in all of the refs before passing any back to
 us. Which means that our peak memory usage has to store
 every ref (including duplicates for peeled variants),
 even if our callback could determine that some are not
 interesting (e.g., because they point to the same sha1
 as another ref).

  2. It allocates a "struct ref" for each one. Among other
 things, this contains 3 separate 20-byte oids, along
 with the name and various pointers.  That can add up,
 especially if the callback is only interested in the
 sha1 (which it can store in a sha1_array as just 20
 bytes).

On a particularly pathological case, where the alternate had
over 80 million refs pointing to only around 60,000 unique
objects, the peak heap usage of "git clone --reference" grew
to over 25GB.

This patch instead calls git-for-each-ref in the alternate
repository, and passes each line to the callback as we read
it. That drops the peak heap of the same command to 50MB.

I considered and rejected a few alternatives.

We could read all of the refs in the alternate using our own
ref code, just as we do with submodules.  However, as memory
footprint is one of the concerns here, we want to avoid
loading those refs into our own memory as a whole.

It's possible that this will be a better technique in the
future when the ref code can more easily iterate without
loading all of packed-refs into memory.

Another option is to keep calling upload-pack, and just
parse its output ourselves in a streaming fashion. Besides
for-each-ref being simpler (we get to define the format
ourselves, and don't have to deal with speaking the git
protocol), it's more flexible for possible future changes.

For instance, it might be useful for the caller to be able
to limit the set of "interesting" alternate refs.  The
motivating example is one where many "forks" of a particular
repository share object storage, and the shared storage has
refs for each fork (which is why so many of the refs are
duplicates; each fork has the same tags).  A plausible
future optimization would be to ask for the alternate refs
for just _one_ fork (if you had some out-of-band way of
knowing which was the most interesting or important for the
current operation).

Similarly, no callbacks actually care about the symref value
of alternate refs, and as before, this patch ignores them
entirely.  However, if we wanted to add them, for-each-ref's
"%(symref)" is going to be more flexible than upload-pack,
because the latter only handles the HEAD symref due to
historical constraints.

There is one potential downside, though: unlike upload-pack,
our for-each-ref command doesn't report the peeled value of
refs. The existing code calls the alternate_ref_fn callback
twice for tags: once for the tag, and once for the peeled
value with the refname set to "ref^{}".

For the callers in fetch-pack, this doesn't matter at all.
We immediately peel each tag down to a commit either way (so
there's a slight improvement, as do not bother passing the
redundant data over the pipe). For the caller in
receive-pack, it means we will not advertise the peeled
values of tags in our alternate. However, we also don't
advertise peeled values for our _own_ tags, so this is
actually making things more consistent.

It's unclear whether receive-pack advertising peeled values
is a win or not. On one hand, giving more information to the
other side may let it omit some objects from the push. On
the other hand, for tags which both sides have, they simply
bloat the advertisement. The upload-pack advertisement of
git.git is about 30% larger than the receive-pack
advertisement due to its peeled information.

This patch omits the peeled information from
for_each_alternate_ref entirely, and leaves it up to the
caller whether they want to dig up the information.

Signed-off-by: Jeff King 
---
I also tried adding "%(*objectname)" to for-each-ref to just
grab the peeled information, but the peel implementation in
ref-filter is _really_ slow. It doesn't use the packed-ref
peel information, and it doesn't recognize duplicates (so in
the 80 million case, it really parses 80 million tags).

 transport.c | 48 ++--
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/transport.c b/transport.c
index 983d8fec1..ef8e09298 100644
--- a/transport.c
+++ b/transport.c
@@ -1199,6 +1199,42 @@ char *transport_anonymize_url(const char *url)
return xstrdup(url);
 }
 
+static void read_alternate_refs(const char *path,
+   alternate_ref_fn *cb,
+   void *data)
+{
+   struct

[PATCH 04/12] for_each_alternate_ref: pass name/oid instead of ref struct

2017-01-23 Thread Jeff King

Breaking down the fields in the interface makes it easier to
change the backend of for_each_alternate_ref to something
that doesn't use "struct ref" internally.

The only field that callers actually look at is the oid,
anyway. The refname is kept in the interface as a plausible
thing for future code to want.

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c |  6 --
 fetch-pack.c   | 12 
 transport.c|  2 +-
 transport.h|  2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 6b97cbdbe..b9f2c0cc5 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -277,10 +277,12 @@ static int show_one_alternate_sha1(const unsigned char 
sha1[20], void *unused)
return 0;
 }
 
-static void collect_one_alternate_ref(const struct ref *ref, void *data)
+static void collect_one_alternate_ref(const char *refname,
+ const struct object_id *oid,
+ void *data)
 {
struct sha1_array *sa = data;
-   sha1_array_append(sa, ref->old_oid.hash);
+   sha1_array_append(sa, oid->hash);
 }
 
 static void write_head_info(void)
diff --git a/fetch-pack.c b/fetch-pack.c
index 601f0779a..54f84c573 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -253,9 +253,11 @@ static void send_request(struct fetch_pack_args *args,
write_or_die(fd, buf->buf, buf->len);
 }
 
-static void insert_one_alternate_ref(const struct ref *ref, void *unused)
+static void insert_one_alternate_ref(const char *refname,
+const struct object_id *oid,
+void *unused)
 {
-   rev_list_insert_ref(NULL, ref->old_oid.hash);
+   rev_list_insert_ref(NULL, oid->hash);
 }
 
 #define INITIAL_FLUSH 16
@@ -619,9 +621,11 @@ static void filter_refs(struct fetch_pack_args *args,
*refs = newlist;
 }
 
-static void mark_alternate_complete(const struct ref *ref, void *unused)
+static void mark_alternate_complete(const char *refname,
+   const struct object_id *oid,
+   void *unused)
 {
-   mark_complete(ref->old_oid.hash);
+   mark_complete(oid->hash);
 }
 
 static int everything_local(struct fetch_pack_args *args,
diff --git a/transport.c b/transport.c
index 594533d88..983d8fec1 100644
--- a/transport.c
+++ b/transport.c
@@ -1231,7 +1231,7 @@ static int refs_from_alternate_cb(struct 
alternate_object_database *e,
for (extra = transport_get_remote_refs(transport);
 extra;
 extra = extra->next)
-   cb->fn(extra, cb->data);
+   cb->fn(extra->name, >old_oid, cb->data);
transport_disconnect(transport);
 out:
strbuf_release();
diff --git a/transport.h b/transport.h
index 9820f10b8..b7bb07d2c 100644
--- a/transport.h
+++ b/transport.h
@@ -254,6 +254,6 @@ int transport_refs_pushed(struct ref *ref);
 void transport_print_push_status(const char *dest, struct ref *refs,
  int verbose, int porcelain, unsigned int *reject_reasons);
 
-typedef void alternate_ref_fn(const struct ref *, void *);
+typedef void alternate_ref_fn(const char *refname, const struct object_id 
*oid, void *);
 extern void for_each_alternate_ref(alternate_ref_fn, void *);
 #endif
-- 
2.11.0.765.g454d2182f

[PATCH 03/12] for_each_alternate_ref: use strbuf for path allocation

2017-01-23 Thread Jeff King

We have a string with ".../objects" pointing to the
alternate object store, and overwrite bits of it to look at
other paths in the (potential) git repository holding it.
This works because the only path we care about is "refs",
which is shorter than "objects".

Using a strbuf to hold the path lets us get rid of some
magic numbers, and makes it more obvious that the memory
operations are safe.

Signed-off-by: Jeff King 
---
I had thought I was going to need to generate more paths in the
alternate repo, but I ended up not needing to. So this was originally
done to make that easier, but I think it stands on its own as a cleanup.

 transport.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/transport.c b/transport.c
index 6b4b3ed31..594533d88 100644
--- a/transport.c
+++ b/transport.c
@@ -1207,34 +1207,34 @@ struct alternate_refs_data {
 static int refs_from_alternate_cb(struct alternate_object_database *e,
  void *data)
 {
-   char *other;
-   size_t len;
+   struct strbuf path = STRBUF_INIT;
+   size_t base_len;
struct remote *remote;
struct transport *transport;
const struct ref *extra;
struct alternate_refs_data *cb = data;
 
-   other = real_pathdup(e->path);
-   if (!other)
-   return 0;
-   len = strlen(other);
-
-   if (len < 8 || memcmp(other + len - 8, "/objects", 8))
+   if (!strbuf_realpath(, e->path, 0))
+   goto out;
+   if (!strbuf_strip_suffix(, "/objects"))
goto out;
+   base_len = path.len;
+
/* Is this a git repository with refs? */
-   memcpy(other + len - 8, "/refs", 6);
-   if (!is_directory(other))
+   strbuf_addstr(, "/refs");
+   if (!is_directory(path.buf))
goto out;
-   other[len - 8] = '\0';
-   remote = remote_get(other);
-   transport = transport_get(remote, other);
+   strbuf_setlen(, base_len);
+
+   remote = remote_get(path.buf);
+   transport = transport_get(remote, path.buf);
for (extra = transport_get_remote_refs(transport);
 extra;
 extra = extra->next)
cb->fn(extra, cb->data);
transport_disconnect(transport);
 out:
-   free(other);
+   strbuf_release();
return 0;
 }
 
-- 
2.11.0.765.g454d2182f

[PATCH 02/12] for_each_alternate_ref: stop trimming trailing slashes

2017-01-23 Thread Jeff King

The real_pathdup() function will have removed extra slashes
for us already (on top of the normalize_path() done when we
created the alternate_object_database struct in the first
place).

Incidentally, this also fixes the case where the path is
just "/", which would read off the start of the array.
That doesn't seem possible to trigger in practice, though,
as link_alt_odb_entry() blindly eats trailing slashes,
including a bare "/".

Signed-off-by: Jeff King 
---
I think the "/" thing in link_alt_odb_entry() is buggy, and it's an easy
one-liner fix.  But I notice some of the rest of the code isn't ready to
handle "/" (mostly it just duplicates "/" when appending to the path),
so I left it for now (and I doubt anybody cares anyway).

 transport.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/transport.c b/transport.c
index 74d0e45bd..6b4b3ed31 100644
--- a/transport.c
+++ b/transport.c
@@ -1219,8 +1219,6 @@ static int refs_from_alternate_cb(struct 
alternate_object_database *e,
return 0;
len = strlen(other);
 
-   while (other[len-1] == '/')
-   other[--len] = '\0';
if (len < 8 || memcmp(other + len - 8, "/objects", 8))
goto out;
/* Is this a git repository with refs? */
-- 
2.11.0.765.g454d2182f

[PATCH 01/12] for_each_alternate_ref: handle failure from real_pathdup()

2017-01-23 Thread Jeff King

In older versions of git, if real_path() failed to resolve
the alternate object store path, we would die() with an
error. However, since 4ac9006f8 (real_path: have callers use
real_pathdup and strbuf_realpath, 2016-12-12) we use the
real_pathdup() function, which may return NULL. Since we
don't check the return value, we can segfault.

This is hard to trigger in practice, since we check that the
path is accessible before creating the alternate_object_database
struct. But it could be removed racily, or we could see a
transient filesystem error.

We could restore the original behavior by switching back to
xstrdup(real_path()).  However, dying is probably not the
best option here. This whole function is best-effort
already; there might not even be a repository around the
shared objects at all. And if the alternate store has gone
away, there are no objects to show.

So let's just quietly return, as we would if we failed to
open "refs/", or if upload-pack failed to start, etc.

Signed-off-by: Jeff King 
---
 transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/transport.c b/transport.c
index c86ba2eb8..74d0e45bd 100644
--- a/transport.c
+++ b/transport.c
@@ -1215,6 +1215,8 @@ static int refs_from_alternate_cb(struct 
alternate_object_database *e,
struct alternate_refs_data *cb = data;
 
other = real_pathdup(e->path);
+   if (!other)
+   return 0;
len = strlen(other);
 
while (other[len-1] == '/')
-- 
2.11.0.765.g454d2182f

[PATCH 0/12] reducing resource usage of for_each_alternate_ref

2017-01-23 Thread Jeff King

As I've mentioned before, I have some alternates repositories with
absurd numbers of refs, most of which are duplicates of each other[1].
There are a couple of problems I've seen:

 1. the way that for_each_alternate_ref() works has very high peak memory
usage for this case

 2. the way that receive-pack de-duplicates the refs has high peak memory
usage

 3. we access the alternate refs twice in fetch-pack

This fixes all three, along with a few other minor bugfixes, cleanups,
and optimizations. I've tried to order the series to keep bugfixes and
less-contentious changes near the front.

Just to give some numbers, on my worst-case repository (see [1]), this
drops peak RSS for "git clone --reference" from over 25GB to about 40MB.
Sort of, anyway.  You still pay a big CPU and RSS cost on the process in
the alternates repo that accesses packed-refs, but the 25GB came on top
of that. So this is a first pass at the low-hanging fruit.

I'll be the first to admit that this setup is insane. And some of the
optimizations are tradeoffs that help particularly the case where your
refs aren't unique. But for the most part should help _every_ case. And
in the cases where your refs are unique, either you don't have many (so
the tradeoffs are OK) or you have so many that you are pretty much
screwed no matter what (if your fetch is looking at 30 million unique
ref tips, the object storage is your problem, not looking at the refs).

A brief overview of the patches:

  [01/12]: for_each_alternate_ref: handle failure from real_pathdup()
  [02/12]: for_each_alternate_ref: stop trimming trailing slashes
  [03/12]: for_each_alternate_ref: use strbuf for path allocation

Bugfixes and cleanups (the first one is actually a recent-ish
regression).

  [04/12]: for_each_alternate_ref: pass name/oid instead of ref struct
  [05/12]: for_each_alternate_ref: replace transport code with for-each-ref
  [06/12]: clone: disable save_commit_buffer

This gives the 25GB->40MB benefit. There are a bunch of caveats in
patch 05, but I _think_ it's the right direction for the reasons I
outlined there.

  [07/12]: fetch-pack: cache results of for_each_alternate_ref

Just running for-each-ref in the giant alternates repo is about a
minute of CPU, plus 10GB heap, and fetch-pack wants to do it twice.
This drops it to once.

  [08/12]: add oidset API
  [09/12]: receive-pack: use oidset to de-duplicate .have lines

These give another ~12GB RSS improvement when receive-pack looks at
the alternates in my worst-case repo.

This is less tested than the earlier ones, as we've disabled
receive-pack looking at alternates in production (see [1] below).
I just did them for completeness upstream.

  [10/12]: receive-pack: fix misleading namespace/.have comment
  [11/12]: receive-pack: treat namespace .have lines like alternates
  [12/12]: receive-pack: avoid duplicates between our refs and alternates

These are optimizations to avoid more duplicate .have lines, and
should benefit even non-insane cases. As with 8-12, not as well
tested by me.

 Makefile   |  1 +
 builtin/clone.c|  1 +
 builtin/receive-pack.c | 41 +++-
 fetch-pack.c   | 48 -
 object.h   |  2 +-
 oidset.c   | 49 ++
 oidset.h   | 45 +++
 t/t5400-send-pack.sh   | 38 ++
 transport.c| 72 +++---
 transport.h|  2 +-
 10 files changed, 250 insertions(+), 49 deletions(-)
 create mode 100644 oidset.c
 create mode 100644 oidset.h

-Peff

[1] Background, if you care:

I've mentioned before that GitHub's repository storage uses a
fork-network layout. Each user gets their own "fork" repository, and
it points to "network.git" as shared storage. The network.git
repository has a ref for each fork that is updated periodically via
"git fetch".

So the network.git repo contains O(nr_forks * nr_refs_in_fork) refs.
Quite a few of these point to the same tip sha1s, as each fork has
the same tags. One of the most pathological cases is a popular
public repo that has ~44K tags in it. The network repo has ~80
million refs, of which ~60K are unique. You can imagine that it
takes some time to access the refs. Basically anything that loads
the network.git packed-refs file takes an extra minute of CPU and
needs ~10G RSS to store the internal cache of `packed-refs`.

Most operations don't care about this; they work on the fork repo,
and never look at the network refs. But we _do_ sometimes peek at
alternate refs from receive-pack and fetch-pack to tell the other
side about tips we have.

These optimizations backfire completely in such a setting. Besides
the CPU and memory spikes on the server, even just the unique refs

What's cooking in git.git (Jan 2017, #04; Mon, 23)

2017-01-23 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ad/bisect-terms (2017-01-13) 1 commit
  (merged to 'next' on 2017-01-18 at 9f500d6cf5)
 + Documentation/bisect: improve on (bad|new) and (good|bad)

 Documentation fix.


* bw/read-blob-data-does-not-modify-index-state (2017-01-11) 1 commit
  (merged to 'next' on 2017-01-18 at f33363fb07)
 + index: improve constness for reading blob data

 Code clean-up.


* jk/grep-e-could-be-extended-beyond-posix (2017-01-11) 1 commit
  (merged to 'next' on 2017-01-18 at dd1b9d1fa8)
 + t7810: avoid assumption about invalid regex syntax

 Tighten a test to avoid mistaking an extended ERE regexp engine as
 a PRE regexp engine.


* rh/diff-orderfile-doc (2017-01-15) 2 commits
  (merged to 'next' on 2017-01-18 at cc2af9c628)
 + diff: document the format of the -O (diff.orderFile) file
 + diff: document behavior of relative diff.orderFile

 Documentation fix.


* sb/cd-then-git-can-be-written-as-git-c (2017-01-13) 1 commit
  (merged to 'next' on 2017-01-18 at 8923d2d001)
 + lib-submodule-update.sh: reduce use of subshell by using "git -C"

 Test clean-up.


* sb/submodule-config-tests (2017-01-12) 2 commits
  (merged to 'next' on 2017-01-18 at bd850f6ad3)
 + t7411: test lookup of uninitialized submodules
 + t7411: quote URLs

 Test updates.


* sb/submodule-embed-gitdir (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at 0a5e24a3f0)
 + submodule absorbgitdirs: mention in docstring help

 Help-text fix.


* sb/submodule-init (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at 2e8a38b1cc)
 + submodule update --init: display correct path from submodule

 Error message fix.


* sg/fix-versioncmp-with-common-suffix (2017-01-12) 8 commits
  (merged to 'next' on 2017-01-18 at f79c24a291)
 + versioncmp: generalize version sort suffix reordering
 + versioncmp: factor out helper for suffix matching
 + versioncmp: use earliest-longest contained suffix to determine sorting order
 + versioncmp: cope with common part overlapping with prerelease suffix
 + versioncmp: pass full tagnames to swap_prereleases()
 + t7004-tag: add version sort tests to show prerelease reordering issues
 + t7004-tag: use test_config helper
 + t7004-tag: delete unnecessary tags with test_when_finished

 The prereleaseSuffix feature of version comparison that is used in
 "git tag -l" did not correctly when two or more prereleases for the
 same release were present (e.g. when 2.0, 2.0-beta1, and 2.0-beta2
 are there and the code needs to compare 2.0-beta1 and 2.0-beta2).


* vn/diff-ihc-config (2017-01-12) 1 commit
  (merged to 'next' on 2017-01-18 at ac4915dbe6)
 + diff: add interhunk context config option

 "git diff" learned diff.interHunkContext configuration variable
 that gives the default value for its --inter-hunk-context option.


* ws/request-pull-code-cleanup (2017-01-15) 1 commit
  (merged to 'next' on 2017-01-18 at dfcb1405de)
 + request-pull: drop old USAGE stuff

 Code clean-up.

--
[New Topics]

* sh/grep-tree-obj-tweak-output (2017-01-20) 2 commits
 - grep: use '/' delimiter for paths
 - grep: only add delimiter if there isn't one already

 "git grep", when fed a tree-ish as an input, shows each hit
 prefixed with ":::".  As  is
 almost always either a commit or a tag that points at a commit, the
 early part of the output ":" can be used as the
 name of the blob and given to "git show".  When  is a
 tree given in the extended SHA-1 syntax (e.g. ":", or
 ":"), however, this results in a string that does not
 name a blob (e.g. "::" or "::").
 "git grep" has been taught to be a bit more intelligent about these
 cases and omit a colon (in the former case) or use slash (in the
 latter case) to produce ":" and
 ":/" that can be used as the name of a blob.

 Waiting for the review discussion to settle, followed by a reroll.


* vp/show-ref-verify-head (2017-01-23) 5 commits
 - show-ref: remove dead `if (verify)' check
 - show-ref: detect dangling refs under --verify as well
 - show-ref: move --quiet handling into show_one()
 - show-ref: allow -d to work with --verify
 - show-ref: accept HEAD with --verify

 "git show-ref HEAD" used with "--verify" because the user is not
 interested in seeing refs/remotes/origin/HEAD, and used with
 "--head" because the user does not want HEAD to be filtered out,
 i.e. "git show-ref --head --verify HEAD", did not work as expected.

 Will merge to 'next'.


* bc/use-asciidoctor-opt (2017-01-23) 7 commits
 - Makefile: add a knob to enable the use of Asciidoctor
 -

Re: [PATCH v2 0/5] string-list: make string_list_sort() reentrant

2017-01-23 Thread Jeff King

On Sun, Jan 22, 2017 at 06:47:18PM +0100, René Scharfe wrote:

> Use qsort_s() from C11 Annex K to make string_list_sort() safer, in
> particular when called from parallel threads.
> 
> Changes from v1:
> * Renamed HAVE_QSORT_S to HAVE_ISO_QSORT_S in Makefile to disambiguate.
> * Added basic perf test (patch 3).
> * Converted a second caller to QSORT_S, in ref-filter.c (patch 5).

This looks nicely done overall, and I appreciate the perf test.

The speed looks like a reasonable outcome. I'm torn on the qsort_r()
demo patch. I don't think it looks too bad. OTOH, I don't think I would
want to deal with the opposite-argument-order versions.

Is there any interest in people adding the ISO qsort_s() to their libc
implementations? It seems like it's been a fair number of years by now.

-Peff

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> On Mon, Jan 23, 2017 at 9:16 PM, Junio C Hamano  wrote:
>> Giuseppe Bilotta  writes:
>>
>>> On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano  wrote:

 Should we plan to extend this to the interactive backend that is
 shared between rebase -i and rebase -m, too?  Or is this patch
 already sufficient to cover them?
>>>
>>> AFAIK this is sufficient for both, in the sense that I've used it with
>>> git rebase -i and it works.
>>
>> That is a good news and at the same time a bit awkard one ;-)
>>
>> The mention of "passed to 'git am'" twice in the documentation and
>> help text would lead people to think "rebase -i" would not be
>> affected and (1) would need more work to do so, or (2) the user does
>> not want "rebase -i" to be unaffected for whatever reason, and gets
>> surprised to see that it actually does get affected.
>
> I'm not sure I follow. If the user doesn't want to signoff during a
> rebase, they can simply not pass --signoff. If they do, they can not
> pass it. Am I missing something?

alias.

Which also means that there needs to be --no-signoff option that can
be given to countermand an earlier --signoff, if a user did

[alias] rb = rebase --signoff

and wants to disable it one time only with

$ git rb --no-signoff

>
>> In any case, will queue as-is so that we won't lose the patch while
>> waiting for people to raise their opinions.
>
> Thanks.

Thanks.  The final version would also need tests, so it may be a
good time to start thinking about what aspect of this feature wants
to be protected against future breakages.

[PATCH 5/5] sequencer: allow to --skip current commit

2017-01-23 Thread Giuseppe Bilotta

If a sequencing gets interrupted (by a conflict or an empty commit or
whatever), the user can now opt to just skip it passing the `--skip`
command line option, which acts like a `--continue`, except that the
current commit gets skipped.

Signed-off-by: Giuseppe Bilotta 
---
 Documentation/sequencer.txt | 10 +-
 builtin/revert.c|  7 +--
 sequencer.c | 32 
 sequencer.h |  2 +-
 4 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/Documentation/sequencer.txt b/Documentation/sequencer.txt
index 5747f442f2..095d6cd732 100644
--- a/Documentation/sequencer.txt
+++ b/Documentation/sequencer.txt
@@ -1,7 +1,15 @@
 --continue::
Continue the operation in progress using the information in
'.git/sequencer'.  Can be used to continue after resolving
-   conflicts in a failed cherry-pick or revert.
+   conflicts in a failed cherry-pick or revert.  Use `--skip`
+   instead if the current commit should be ignored.
+
+--skip::
+   Skips the current commit, and then continues the operation
+   in progress using the information in '.git/sequencer'.i
+   Can be used to continue to a cherry-pick or rever that was
+   interrupted by an empty commit, or by a commit that conflicts
+   and for which the resolution is to discard the commit.
 
 --quit::
Forget about the current operation in progress.  Can be used
diff --git a/builtin/revert.c b/builtin/revert.c
index aca8a1d9d0..dece0bebf7 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -79,6 +79,7 @@ static int run_sequencer(int argc, const char **argv, struct 
replay_opts *opts)
struct option base_options[] = {
OPT_CMDMODE(0, "quit", , N_("end revert or cherry-pick 
sequence"), 'q'),
OPT_CMDMODE(0, "continue", , N_("resume revert or 
cherry-pick sequence"), 'c'),
+   OPT_CMDMODE(0, "skip", , N_("resume revert or cherry-pick 
sequence, skipping this commit"), 's'),
OPT_CMDMODE(0, "abort", , N_("cancel revert or cherry-pick 
sequence"), 'a'),
OPT_BOOL('n', "no-commit", >no_commit, N_("don't 
automatically commit")),
OPT_BOOL('e', "edit", >edit, N_("edit the commit 
message")),
@@ -127,6 +128,8 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
this_operation = "--quit";
else if (cmd == 'c')
this_operation = "--continue";
+   else if (cmd == 's')
+   this_operation = "--skip";
else {
assert(cmd == 'a');
this_operation = "--abort";
@@ -188,8 +191,8 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
 
if (cmd == 'q')
return sequencer_remove_state(opts);
-   if (cmd == 'c')
-   return sequencer_continue(opts);
+   if (cmd == 'c' || cmd == 's')
+   return sequencer_continue(opts, cmd);
if (cmd == 'a')
return sequencer_rollback(opts);
return sequencer_pick_revisions(opts);
diff --git a/sequencer.c b/sequencer.c
index 333d9112de..cfe8c06989 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1359,21 +1359,45 @@ static int continue_single_pick(void)
return run_command_v_opt(argv, RUN_GIT_CMD);
 }
 
-int sequencer_continue(struct replay_opts *opts)
+/*
+ * Continue the sequencing, after either committing
+ * (cmd == 'c') or skipping (cmd == 's') the current
+ * commit.
+ */
+int sequencer_continue(struct replay_opts *opts, char cmd)
 {
struct todo_list todo_list = TODO_LIST_INIT;
-   int res;
+   int single, res;
 
if (read_and_refresh_cache(opts))
return -1;
 
-   if (!file_exists(get_todo_path(opts)))
-   return continue_single_pick();
+   if (!file_exists(get_todo_path(opts))) {
+   if (cmd == 'c') {
+   return continue_single_pick();
+   } else {
+   assert(cmd == 's');
+   /* Skipping the only commit is equivalent to an abort */
+   return sequencer_rollback(opts);
+   }
+   }
if (read_populate_opts(opts))
return -1;
if ((res = read_populate_todo(_list, opts)))
goto release_todo_list;
 
+   /* If we were asked to skip this commit, rollback
+* and continue with the next */
+   if (cmd == 's') {
+   if ((res = rollback_single_pick()))
+   goto release_todo_list;
+   discard_cache();
+   if ((res = read_cache()) < 0)
+   goto release_todo_list;
+   printf("index unchanged: %d\n", is_index_unchanged());
+   goto skip_this_commit;
+   }
+
/* check

[PATCH 4/5] cherry-pick: allow skipping only redundant commits

2017-01-23 Thread Giuseppe Bilotta

This allows the preservation of originally empty commits with the
combination of flags --allow-empty --skip-redundant-commits.

Signed-off-by: Giuseppe Bilotta 
---
 Documentation/git-cherry-pick.txt |  8 -
 builtin/revert.c  | 18 +++-
 sequencer.c   | 62 ++-
 sequencer.h   |  1 +
 4 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-cherry-pick.txt 
b/Documentation/git-cherry-pick.txt
index ffced816d6..147e0cde0c 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -138,9 +138,15 @@ effect to your index in a row.
examine the commit. This option overrides that behavior and
creates an empty commit object.  Implies `--allow-empty`.
 
+--skip-redundant-commits::
+   Redundant commits will be skipped altogether. This does not
+   influence commits that were originally empty (see
+   `--allow-empty` and `--skip-empty`).
+
 --skip-empty::
This option causes empty and redundant cherry-picked commits to
-   be skipped without requesting the user intervention.
+   be skipped without requesting the user intervention. Implies
+   `--skip-redundant-commits`.
 
 --strategy=::
Use the given merge strategy.  Should only be used once.
diff --git a/builtin/revert.c b/builtin/revert.c
index ffdd367f99..aca8a1d9d0 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -102,7 +102,8 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
OPT_BOOL(0, "allow-empty", >allow_empty, 
N_("preserve initially empty commits")),
OPT_BOOL(0, "allow-empty-message", 
>allow_empty_message, N_("allow commits with empty messages")),
OPT_BOOL(0, "keep-redundant-commits", 
>keep_redundant_commits, N_("keep redundant, empty commits")),
-   OPT_BOOL(0, "skip-empty", >skip_empty, N_("skip 
redundant, empty commits")),
+   OPT_BOOL(0, "skip-empty", >skip_empty, N_("skip 
both redundant and initially empty commits")),
+   OPT_BOOL(0, "skip-redundant-commits", 
>skip_redundant_commits, N_("skip redundant commits")),
OPT_END(),
};
options = parse_options_concat(options, cp_extra);
@@ -115,6 +116,9 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
/* implies allow_empty */
if (opts->keep_redundant_commits)
opts->allow_empty = 1;
+   /* implies skip_redundant_commits */
+   if (opts->skip_empty)
+   opts->skip_redundant_commits = 1;
 
/* Check for incompatible command line arguments */
if (cmd) {
@@ -147,6 +151,18 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
"--edit", opts->edit,
NULL);
 
+   if (opts->keep_redundant_commits)
+   verify_opt_compatible(me, "--keep-redundant-commits",
+   "--skip-empty", opts->skip_empty,
+   "--skip-redundant-commits", 
opts->skip_redundant_commits,
+   NULL);
+
+   if (opts->keep_redundant_commits)
+   verify_opt_compatible(me, "--skip-empty",
+   "--allow-empty", opts->allow_empty,
+   "--keep-redundant-commits", 
opts->keep_redundant_commits,
+   NULL);
+
if (cmd) {
opts->revs = NULL;
} else {
diff --git a/sequencer.c b/sequencer.c
index 9c01310162..333d9112de 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -563,40 +563,70 @@ static int allow_or_skip_empty(struct replay_opts *opts, 
struct commit *commit)
 {
int index_unchanged, empty_commit;
 
-   /*
-* Four cases:
+   /* We have four options:
 *
-* (1) we do not allow empty at all and error out;
+* --allow-empty (AE)
+* --keep-redundant-commits (KR)
+* --skip-empty (SE)
+* --skip-redundant-commits (SR)
 *
-* (2) we skip empty commits altogether;
+* Additionally, if KR, then AE. And if SE, then SR.
+* 
+* We have three possible states:
+* Not Empty (NE)
+* Originally Empty (OE)
+* made REdundant (RE) (originally not empty)
 *
-* (3) we allow ones that were initially empty, but
-* forbid the ones that become empty;
+* NE always gets committed. The other two depend on the combination
+* of flags.
 *
-* (4) we allow both.
+*  OE outcome | RE outcome | AE  KR  SE  SR
+* Case 0:  0 (error)0 (error) 0   0   0   0
+* Case 1:  1 (allow)0

[PATCH 2/5] sequencer: save/load all options

2017-01-23 Thread Giuseppe Bilotta

Add the missing replay_opts to save_opts and populate_opts, so that an
interrupted cherry-pick will continue with the same setup it had before
the interruption.

Signed-off-by: Giuseppe Bilotta 
---
 sequencer.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 672c81b559..3d2f61c979 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -985,6 +985,14 @@ static int populate_opts_cb(const char *key, const char 
*value, void *data)
opts->signoff = git_config_bool_or_int(key, value, _flag);
else if (!strcmp(key, "options.allow-ff"))
opts->allow_ff = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.rerere-autoupdate"))
+   opts->allow_rerere_auto = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.allow-empty"))
+   opts->allow_empty = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.allow-empty-message"))
+   opts->allow_empty_message = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.keep-redundant-commits"))
+   opts->keep_redundant_commits = git_config_bool_or_int(key, 
value, _flag);
else if (!strcmp(key, "options.mainline"))
opts->mainline = git_config_int(key, value);
else if (!strcmp(key, "options.gpg-sign"))
@@ -1233,6 +1241,14 @@ static int save_opts(struct replay_opts *opts)
res |= git_config_set_in_file_gently(opts_file, 
"options.signoff", "true");
if (opts->allow_ff)
res |= git_config_set_in_file_gently(opts_file, 
"options.allow-ff", "true");
+   if (opts->allow_rerere_auto)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.rerere-autoupdate", "true");
+   if (opts->allow_empty)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.allow-empty", "true");
+   if (opts->allow_empty_message)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.allow-empty-message", "true");
+   if (opts->keep_redundant_commits)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.keep-redundant-commits", "true");
if (opts->mainline) {
struct strbuf buf = STRBUF_INIT;
strbuf_addf(, "%d", opts->mainline);
-- 
2.11.0.616.gd72966cf44.dirty

[PATCH 1/5] sequencer: sort options load/save by struct position

2017-01-23 Thread Giuseppe Bilotta

No functional change. The order in which options are serialized and
reloaded is now the same in which they appear in the replay_opts
structure. This makes it easier to spot when we forget to
serialize/reload an option value.

Signed-off-by: Giuseppe Bilotta 
---
 sequencer.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 9adb7bbf1d..672c81b559 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -975,22 +975,22 @@ static int populate_opts_cb(const char *key, const char 
*value, void *data)
 
if (!value)
error_flag = 0;
-   else if (!strcmp(key, "options.no-commit"))
-   opts->no_commit = git_config_bool_or_int(key, value, 
_flag);
else if (!strcmp(key, "options.edit"))
opts->edit = git_config_bool_or_int(key, value, _flag);
-   else if (!strcmp(key, "options.signoff"))
-   opts->signoff = git_config_bool_or_int(key, value, _flag);
else if (!strcmp(key, "options.record-origin"))
opts->record_origin = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.no-commit"))
+   opts->no_commit = git_config_bool_or_int(key, value, 
_flag);
+   else if (!strcmp(key, "options.signoff"))
+   opts->signoff = git_config_bool_or_int(key, value, _flag);
else if (!strcmp(key, "options.allow-ff"))
opts->allow_ff = git_config_bool_or_int(key, value, 
_flag);
else if (!strcmp(key, "options.mainline"))
opts->mainline = git_config_int(key, value);
-   else if (!strcmp(key, "options.strategy"))
-   git_config_string_dup(>strategy, key, value);
else if (!strcmp(key, "options.gpg-sign"))
git_config_string_dup(>gpg_sign, key, value);
+   else if (!strcmp(key, "options.strategy"))
+   git_config_string_dup(>strategy, key, value);
else if (!strcmp(key, "options.strategy-option")) {
ALLOC_GROW(opts->xopts, opts->xopts_nr + 1, opts->xopts_alloc);
opts->xopts[opts->xopts_nr++] = xstrdup(value);
@@ -1223,14 +1223,14 @@ static int save_opts(struct replay_opts *opts)
const char *opts_file = git_path_opts_file();
int res = 0;
 
-   if (opts->no_commit)
-   res |= git_config_set_in_file_gently(opts_file, 
"options.no-commit", "true");
if (opts->edit)
res |= git_config_set_in_file_gently(opts_file, "options.edit", 
"true");
-   if (opts->signoff)
-   res |= git_config_set_in_file_gently(opts_file, 
"options.signoff", "true");
if (opts->record_origin)
res |= git_config_set_in_file_gently(opts_file, 
"options.record-origin", "true");
+   if (opts->no_commit)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.no-commit", "true");
+   if (opts->signoff)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.signoff", "true");
if (opts->allow_ff)
res |= git_config_set_in_file_gently(opts_file, 
"options.allow-ff", "true");
if (opts->mainline) {
@@ -1239,10 +1239,10 @@ static int save_opts(struct replay_opts *opts)
res |= git_config_set_in_file_gently(opts_file, 
"options.mainline", buf.buf);
strbuf_release();
}
+   if (opts->gpg_sign)
+   res |= git_config_set_in_file_gently(opts_file, 
"options.gpg-sign", opts->gpg_sign);
if (opts->strategy)
res |= git_config_set_in_file_gently(opts_file, 
"options.strategy", opts->strategy);
-   if (opts->gpg_sign)
-   res |= git_config_set_in_file_gently(opts_file, 
"options.gpg-sign", opts->gpg_sign);
if (opts->xopts) {
int i;
for (i = 0; i < opts->xopts_nr; i++)
-- 
2.11.0.616.gd72966cf44.dirty

[PATCH 0/5] sequencer: allow skipping commits

2017-01-23 Thread Giuseppe Bilotta

This series introduces a few options to the sequencer,
to allow skipping unwanted/unnecessary commits.

The first patch is just cleanup. The second fixes a potential issue
about sequencing options not being correctly remembered across
interruptions.

The next two introduce cherry-pick options to skip empty (or only
redundant) commits. The two options are introduced separately because
of the complexity associated with the possible combinations that can be
had.

The last commit allows --skip as a reset + --continue, to quickly skip
the current commit during a failed cherry-pick or revert (for example
because a better version of the commit was already merged).

Giuseppe Bilotta (5):
  sequencer: sort options load/save by struct position
  sequencer: save/load all options
  cherry-pick: option to skip empty commits
  cherry-pick: allow skipping only redundant commits
  sequencer: allow to --skip current commit

 Documentation/git-cherry-pick.txt |  10 +++
 Documentation/sequencer.txt   |  10 ++-
 builtin/revert.c  |  24 +-
 sequencer.c   | 163 ++
 sequencer.h   |   4 +-
 5 files changed, 176 insertions(+), 35 deletions(-)

-- 
2.11.0.616.gd72966cf44.dirty

[PATCH 3/5] cherry-pick: option to skip empty commits

2017-01-23 Thread Giuseppe Bilotta

This allows cherry-picking a set of commits, some of which may be
redundant, without stopping to ask for the user intervention.

Signed-off-by: Giuseppe Bilotta 
---
 Documentation/git-cherry-pick.txt |  4 
 builtin/revert.c  |  1 +
 sequencer.c   | 45 +++
 sequencer.h   |  1 +
 4 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-cherry-pick.txt 
b/Documentation/git-cherry-pick.txt
index d35d771fc8..ffced816d6 100644
--- a/Documentation/git-cherry-pick.txt
+++ b/Documentation/git-cherry-pick.txt
@@ -138,6 +138,10 @@ effect to your index in a row.
examine the commit. This option overrides that behavior and
creates an empty commit object.  Implies `--allow-empty`.
 
+--skip-empty::
+   This option causes empty and redundant cherry-picked commits to
+   be skipped without requesting the user intervention.
+
 --strategy=::
Use the given merge strategy.  Should only be used once.
See the MERGE STRATEGIES section in linkgit:git-merge[1]
diff --git a/builtin/revert.c b/builtin/revert.c
index 4ca5b51544..ffdd367f99 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -102,6 +102,7 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
OPT_BOOL(0, "allow-empty", >allow_empty, 
N_("preserve initially empty commits")),
OPT_BOOL(0, "allow-empty-message", 
>allow_empty_message, N_("allow commits with empty messages")),
OPT_BOOL(0, "keep-redundant-commits", 
>keep_redundant_commits, N_("keep redundant, empty commits")),
+   OPT_BOOL(0, "skip-empty", >skip_empty, N_("skip 
redundant, empty commits")),
OPT_END(),
};
options = parse_options_concat(options, cp_extra);
diff --git a/sequencer.c b/sequencer.c
index 3d2f61c979..9c01310162 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -550,22 +550,32 @@ static int is_original_commit_empty(struct commit *commit)
 
 /*
  * Do we run "git commit" with "--allow-empty"?
+ *
+ * Or do we just skip this empty commit?
+ *
+ * Returns 1 if a commit should be done with --allow-empty,
+ * 0 if a commit should be done without --allow-empty,
+ * 2 if no commit should be done at all (skip empty commit)
+ * negative values in case of error
+ *
  */
-static int allow_empty(struct replay_opts *opts, struct commit *commit)
+static int allow_or_skip_empty(struct replay_opts *opts, struct commit *commit)
 {
int index_unchanged, empty_commit;
 
/*
-* Three cases:
+* Four cases:
 *
-* (1) we do not allow empty at all and error out.
+* (1) we do not allow empty at all and error out;
 *
-* (2) we allow ones that were initially empty, but
+* (2) we skip empty commits altogether;
+*
+* (3) we allow ones that were initially empty, but
 * forbid the ones that become empty;
 *
-* (3) we allow both.
+* (4) we allow both.
 */
-   if (!opts->allow_empty)
+   if (!opts->allow_empty && !opts->skip_empty)
return 0; /* let "git commit" barf as necessary */
 
index_unchanged = is_index_unchanged();
@@ -574,6 +584,9 @@ static int allow_empty(struct replay_opts *opts, struct 
commit *commit)
if (!index_unchanged)
return 0; /* we do not have to say --allow-empty */
 
+   if (opts->skip_empty)
+   return 2;
+
if (opts->keep_redundant_commits)
return 1;
 
@@ -612,7 +625,7 @@ static int do_pick_commit(enum todo_command command, struct 
commit *commit,
const char *base_label, *next_label;
struct commit_message msg = { NULL, NULL, NULL, NULL };
struct strbuf msgbuf = STRBUF_INIT;
-   int res, unborn = 0, allow;
+   int res = 0, unborn = 0, allow;
 
if (opts->no_commit) {
/*
@@ -771,12 +784,13 @@ static int do_pick_commit(enum todo_command command, 
struct commit *commit,
goto leave;
}
 
-   allow = allow_empty(opts, commit);
+   allow = allow_or_skip_empty(opts, commit);
if (allow < 0) {
res = allow;
goto leave;
}
-   if (!opts->no_commit)
+   /* allow == 2 means skip this commit */
+   if (allow != 2 && !opts->no_commit)
res = run_git_commit(opts->edit ? NULL : git_path_merge_msg(),
 opts, allow, opts->edit, 0, 0);
 
@@ -993,6 +1007,8 @@ static int populate_opts_cb(const char *key, const char 
*value, void *data)
opts->allow_empty_message = git_config_bool_or_int(key, value, 
_flag);
else if (!strcmp(key, "options.keep-redundant-commits"))
opts->keep_redundant_commits

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Giuseppe Bilotta

On Mon, Jan 23, 2017 at 9:16 PM, Junio C Hamano  wrote:
> Giuseppe Bilotta  writes:
>
>> On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano  wrote:
>>>
>>> Should we plan to extend this to the interactive backend that is
>>> shared between rebase -i and rebase -m, too?  Or is this patch
>>> already sufficient to cover them?
>>
>> AFAIK this is sufficient for both, in the sense that I've used it with
>> git rebase -i and it works.
>
> That is a good news and at the same time a bit awkard one ;-)
>
> The mention of "passed to 'git am'" twice in the documentation and
> help text would lead people to think "rebase -i" would not be
> affected and (1) would need more work to do so, or (2) the user does
> not want "rebase -i" to be unaffected for whatever reason, and gets
> surprised to see that it actually does get affected.

I'm not sure I follow. If the user doesn't want to signoff during a
rebase, they can simply not pass --signoff. If they do, they can not
pass it. Am I missing something?

> In any case, will queue as-is so that we won't lose the patch while
> waiting for people to raise their opinions.

Thanks.

-- 
Giuseppe "Oblomov" Bilotta

Re: [RFC PATCH] Option to allow cherry-pick to skip empty commits

2017-01-23 Thread Giuseppe Bilotta

On Mon, Jan 23, 2017 at 9:10 PM, Junio C Hamano  wrote:
> Giuseppe Bilotta  writes:
>
>> ... I still don't see how to force a complete reread of the index
>> after running a git reset (which I need for the --skip command), ...
>
> Do you mean discard_index() or discard_cache() followed by
> read_index() or read_cache(), or do you mean something more
> elaborate?

Apparently discard_cache() followed by read_cache() is exactly what I
needed. New patchset incoming 8-)




-- 
Giuseppe "Oblomov" Bilotta

Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure

2017-01-23 Thread Brandon Williams

On 01/23, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > The last big hurdle towards a thread-safe API for the attribute system
> > is the reliance on a global attribute stack that is modified during each
> > call into the attribute system.
> 
> The same comment as 22/27 applies here.  
> 
> It is not an immediate problem we need to solve in the scope of this
> series, in the sense that a Big Subsystem Lock for the attribute
> subsystem around git_check_attr() function can make it thread-safe.
> 
> But if we want to make it truly threadable without a Big Subsystem
> Lock, this and the other one would need to become per-thread at
> least.  I think the check_all_attrs scoreboard, which is the topic
> of 22/27, should become per git_check_attr() invocation (immediately
> before making a call to collect_some_attrs(), prepare an array with
> map.size elements and use that as a scoreboard, for example).  I do
> not think we can be sure that the "slimmed down attr stack" 15/27
> envisions would help performance without benchmarking, but if it
> does, then the "attr stack that holds entries that are relevant to
> the current query" would have to become per  pair, as
> two threads may be executing the same codepath looking for the same
> set of attributes (i.e. sharing a single attr_check instance), but
> working on two different parts of a tree structure.
> 
> > This patch removes this global stack and instead a stack is stored
> > locally in each attr_check instance.  This opens up the opportunity for
> > future optimizations to customize the attribute stack for the attributes
> > that a particular attr_check struct is interested in.
> 
> This is still true.  But two threads hitting the same attr_check
> would make the stack thrash between the paths they are working on to
> hurt performance once we go multi-threaded.
> 
> Perhaps, provided if the "slimmed down attr stack" is indeed a good
> idea, we should keep the global hashmap that holds everything we
> read from .gitattributes tree-wide (i.e. as in your v1), _and_
> introduce a mechanism to keep the slimmed down version that is
> relevant to check[] for each thread somehow.

Sounds good,  I'll reintroduce the hashmap of stacks that I had in v1
and instead make the all_attrs array that is used the in collection
process allocated at invocation time.  That will cause a bit of
allocation churn but in reality shouldn't make that much of an impact.

As we discussed off-line I'll also do the rework to break up the
question and result.  That way two threads can be executing using the
same attr_check structure.

-- 
Brandon Williams

Re: [PATCH v2 25/27] attr: store attribute stack in attr_check structure

2017-01-23 Thread Junio C Hamano

Brandon Williams  writes:

> The last big hurdle towards a thread-safe API for the attribute system
> is the reliance on a global attribute stack that is modified during each
> call into the attribute system.

The same comment as 22/27 applies here.  

It is not an immediate problem we need to solve in the scope of this
series, in the sense that a Big Subsystem Lock for the attribute
subsystem around git_check_attr() function can make it thread-safe.

But if we want to make it truly threadable without a Big Subsystem
Lock, this and the other one would need to become per-thread at
least.  I think the check_all_attrs scoreboard, which is the topic
of 22/27, should become per git_check_attr() invocation (immediately
before making a call to collect_some_attrs(), prepare an array with
map.size elements and use that as a scoreboard, for example).  I do
not think we can be sure that the "slimmed down attr stack" 15/27
envisions would help performance without benchmarking, but if it
does, then the "attr stack that holds entries that are relevant to
the current query" would have to become per  pair, as
two threads may be executing the same codepath looking for the same
set of attributes (i.e. sharing a single attr_check instance), but
working on two different parts of a tree structure.

> This patch removes this global stack and instead a stack is stored
> locally in each attr_check instance.  This opens up the opportunity for
> future optimizations to customize the attribute stack for the attributes
> that a particular attr_check struct is interested in.

This is still true.  But two threads hitting the same attr_check
would make the stack thrash between the paths they are working on to
hurt performance once we go multi-threaded.

Perhaps, provided if the "slimmed down attr stack" is indeed a good
idea, we should keep the global hashmap that holds everything we
read from .gitattributes tree-wide (i.e. as in your v1), _and_
introduce a mechanism to keep the slimmed down version that is
relevant to check[] for each thread somehow.

Re: [PATCH v2 22/27] attr: eliminate global check_all_attr array

2017-01-23 Thread Junio C Hamano

Brandon Williams  writes:

> Currently there is a reliance on 'check_all_attr' which is a global
> array of 'attr_check_item' items which is used to store the value of
> each attribute during the collection process.
>
> This patch eliminates this global and instead creates an array per
> 'attr_check' instance which is then used in the attribute collection
> process.  This brings the attribute system one step closer to being
> thread-safe.

Hmph, how close is "closer"?  

My understanding of this is that a codepath that has a single
"attr_check" can be executing simultaneously by multiple threads,
and "attr_check" is meant to contain read-only stuff sharable by
them.  Unless this check_all_attr is tied to the attr_result (which
in turn is tied to each invocation and typically is on stack), the
resulting code would not be safe, right?

[PATCH v2 27/27] attr: reformat git_attr_set_direction() function

2017-01-23 Thread Brandon Williams

Move the 'git_attr_set_direction()' up to be closer to the variables
that it modifies as well as a small formatting by renaming the variable
'new' to 'new_direction' so that it is more descriptive.

Update the comment about how 'direction' is used to read the state of
the world.  It should be noted that callers of
'git_attr_set_direction()' should ensure that other threads are not
making calls into the attribute system until after the call to
'git_attr_set_direction()' completes.  This function essentially acts as
reset button for the attribute system and should be handled with care.

Signed-off-by: Brandon Williams 
---
 attr.c | 49 -
 attr.h |  3 ++-
 2 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/attr.c b/attr.c
index c2ea5cb29..f35c1107f 100644
--- a/attr.c
+++ b/attr.c
@@ -578,26 +578,30 @@ static struct attr_stack *read_attr_from_array(const char 
**list)
 }
 
 /*
- * NEEDSWORK: these two are tricky.  The callers assume there is a
- * single, system-wide global state "where we read attributes from?"
- * and when the state is flipped by calling git_attr_set_direction(),
- * attr_stack is discarded so that subsequent attr_check will lazily
- * read from the right place.  And they do not know or care who called
- * by them uses the attribute subsystem, hence have no knowledge of
- * existing git_attr_check instances or future ones that will be
- * created).
- *
- * Probably we need a thread_local that holds these two variables,
- * and a list of git_attr_check instances (which need to be maintained
- * by hooking into git_attr_check_alloc(), git_attr_check_initl(), and
- * git_attr_check_clear().  Then git_attr_set_direction() updates the
- * fields in that thread_local for these two variables, iterate over
- * all the active git_attr_check instances and discard the attr_stack
- * they hold.  Yuck, but it sounds doable.
+ * Callers into the attribute system assume there is a single, system-wide
+ * global state where attributes are read from and when the state is flipped by
+ * calling git_attr_set_direction(), the stack frames that have been
+ * constructed need to be discarded so so that subsequent calls into the
+ * attribute system will lazily read from the right place.  Since changing
+ * direction causes a global paradigm shift, it should not ever be called while
+ * another thread could potentially be calling into the attribute system.
  */
 static enum git_attr_direction direction;
 static struct index_state *use_index;
 
+void git_attr_set_direction(enum git_attr_direction new_direction,
+   struct index_state *istate)
+{
+   if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
+   die("BUG: non-INDEX attr direction in a bare repo");
+
+   if (new_direction != direction)
+   drop_attr_stack();
+
+   direction = new_direction;
+   use_index = istate;
+}
+
 static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 {
FILE *fp = fopen(path, "r");
@@ -1132,19 +1136,6 @@ void attr_check_free(struct attr_check *check)
}
 }
 
-void git_attr_set_direction(enum git_attr_direction new, struct index_state 
*istate)
-{
-   enum git_attr_direction old = direction;
-
-   if (is_bare_repository() && new != GIT_ATTR_INDEX)
-   die("BUG: non-INDEX attr direction in a bare repo");
-
-   direction = new;
-   if (new != old)
-   drop_attr_stack();
-   use_index = istate;
-}
-
 void attr_start(void)
 {
 #ifndef NO_PTHREADS
diff --git a/attr.h b/attr.h
index da7c3a229..62dbcb6b8 100644
--- a/attr.h
+++ b/attr.h
@@ -76,7 +76,8 @@ enum git_attr_direction {
GIT_ATTR_CHECKOUT,
GIT_ATTR_INDEX
 };
-void git_attr_set_direction(enum git_attr_direction, struct index_state *);
+void git_attr_set_direction(enum git_attr_direction new_direction,
+   struct index_state *istate);
 
 extern void attr_start(void);
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 26/27] attr: push the bare repo check into read_attr()

2017-01-23 Thread Brandon Williams

Push the bare repository check into the 'read_attr()' function.  This
avoids needing to have extra logic which creates an empty stack frame
when inside a bare repo as a similar bit of logic already exists in the
'read_attr()' function.

Signed-off-by: Brandon Williams 
---
 attr.c | 114 +++--
 1 file changed, 54 insertions(+), 60 deletions(-)

diff --git a/attr.c b/attr.c
index d64d1959e..c2ea5cb29 100644
--- a/attr.c
+++ b/attr.c
@@ -648,25 +648,28 @@ static struct attr_stack *read_attr_from_index(const char 
*path, int macro_ok)
 
 static struct attr_stack *read_attr(const char *path, int macro_ok)
 {
-   struct attr_stack *res;
+   struct attr_stack *res = NULL;
 
-   if (direction == GIT_ATTR_CHECKOUT) {
+   if (direction == GIT_ATTR_INDEX) {
res = read_attr_from_index(path, macro_ok);
-   if (!res)
-   res = read_attr_from_file(path, macro_ok);
-   }
-   else if (direction == GIT_ATTR_CHECKIN) {
-   res = read_attr_from_file(path, macro_ok);
-   if (!res)
-   /*
-* There is no checked out .gitattributes file there, 
but
-* we might have it in the index.  We allow operation 
in a
-* sparsely checked out work tree, so read from it.
-*/
+   } else if (!is_bare_repository()) {
+   if (direction == GIT_ATTR_CHECKOUT) {
res = read_attr_from_index(path, macro_ok);
+   if (!res)
+   res = read_attr_from_file(path, macro_ok);
+   } else if (direction == GIT_ATTR_CHECKIN) {
+   res = read_attr_from_file(path, macro_ok);
+   if (!res)
+   /*
+* There is no checked out .gitattributes file
+* there, but we might have it in the index.
+* We allow operation in a sparsely checked out
+* work tree, so read from it.
+*/
+   res = read_attr_from_index(path, macro_ok);
+   }
}
-   else
-   res = read_attr_from_index(path, macro_ok);
+
if (!res)
res = xcalloc(1, sizeof(*res));
return res;
@@ -758,10 +761,7 @@ static void bootstrap_attr_stack(struct attr_stack **stack)
}
 
/* root directory */
-   if (!is_bare_repository() || direction == GIT_ATTR_INDEX)
-   e = read_attr(GITATTRIBUTES_FILE, 1);
-   else
-   e = xcalloc(1, sizeof(struct attr_stack));
+   e = read_attr(GITATTRIBUTES_FILE, 1);
push_stack(stack, e, xstrdup(""), 0);
 
/* info frame */
@@ -778,6 +778,7 @@ static void prepare_attr_stack(const char *path, int dirlen,
   struct attr_stack **stack)
 {
struct attr_stack *info;
+   struct strbuf pathbuf = STRBUF_INIT;
 
/*
 * At the bottom of the attribute stack is the built-in
@@ -824,54 +825,47 @@ static void prepare_attr_stack(const char *path, int 
dirlen,
}
 
/*
-* Read from parent directories and push them down
+* bootstrap_attr_stack() should have added, and the
+* above loop should have stopped before popping, the
+* root element whose attr_stack->origin is set to an
+* empty string.
 */
-   if (!is_bare_repository() || direction == GIT_ATTR_INDEX) {
-   /*
-* bootstrap_attr_stack() should have added, and the
-* above loop should have stopped before popping, the
-* root element whose attr_stack->origin is set to an
-* empty string.
-*/
-   struct strbuf pathbuf = STRBUF_INIT;
-
-   assert((*stack)->origin);
-   strbuf_addstr(, (*stack)->origin);
-   /* Build up to the directory 'path' is in */
-   while (pathbuf.len < dirlen) {
-   size_t len = pathbuf.len;
-   struct attr_stack *next;
-   char *origin;
-
-   /* Skip path-separator */
-   if (len < dirlen && is_dir_sep(path[len]))
-   len++;
-   /* Find the end of the next component */
-   while (len < dirlen && !is_dir_sep(path[len]))
-   len++;
-
-   if (pathbuf.len > 0)
-   strbuf_addch(, '/');
-   strbuf_add(, path + pathbuf.len,
-  (len - pathbuf.len));
-   strbuf_addf(, "/%s", GITATTRIBUTES_FILE);
-
-

[PATCH v2 25/27] attr: store attribute stack in attr_check structure

2017-01-23 Thread Brandon Williams

The last big hurdle towards a thread-safe API for the attribute system
is the reliance on a global attribute stack that is modified during each
call into the attribute system.

This patch removes this global stack and instead a stack is stored
locally in each attr_check instance.  This opens up the opportunity for
future optimizations to customize the attribute stack for the attributes
that a particular attr_check struct is interested in.

One caveat with pushing the attribute stack into the attr_check
structure is that the attribute system now needs to keep track of all
active attr_check instances.  Due to the direction mechanism the stack
needs to be dropped when the direction is switched.  In order to ensure
correctness when the direction is changed the attribute system needs to
iterate through all active attr_check instances and drop each of their
stacks.

Signed-off-by: Brandon Williams 
---
 attr.c | 277 -
 attr.h |   3 +
 2 files changed, 193 insertions(+), 87 deletions(-)

diff --git a/attr.c b/attr.c
index 95456503e..d64d1959e 100644
--- a/attr.c
+++ b/attr.c
@@ -434,17 +434,16 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
  * .gitignore file and info/excludes file as a fallback.
  */
 
-/* NEEDSWORK: This will become per git_attr_check */
-static struct attr_stack {
+struct attr_stack {
struct attr_stack *prev;
char *origin;
size_t originlen;
unsigned num_matches;
unsigned alloc;
struct match_attr **attrs;
-} *attr_stack;
+};
 
-static void free_attr_elem(struct attr_stack *e)
+static void attr_stack_free(struct attr_stack *e)
 {
int i;
free(e->origin);
@@ -467,6 +466,85 @@ static void free_attr_elem(struct attr_stack *e)
free(e);
 }
 
+/* List of all attr_check structs; access should be surrounded by mutex */
+static struct check_vector {
+   size_t nr;
+   size_t alloc;
+   struct attr_check **checks;
+#ifndef NO_PTHREADS
+   pthread_mutex_t mutex;
+#endif
+} check_vector;
+
+static inline void vector_lock(void)
+{
+#ifndef NO_PTHREADS
+   pthread_mutex_lock(_vector.mutex);
+#endif
+}
+
+static inline void vector_unlock(void)
+{
+#ifndef NO_PTHREADS
+   pthread_mutex_unlock(_vector.mutex);
+#endif
+}
+
+static void check_vector_add(struct attr_check *c)
+{
+   vector_lock();
+
+   ALLOC_GROW(check_vector.checks,
+  check_vector.nr + 1,
+  check_vector.alloc);
+   check_vector.checks[check_vector.nr++] = c;
+
+   vector_unlock();
+}
+
+static void check_vector_remove(struct attr_check *check)
+{
+   int i;
+
+   vector_lock();
+
+   /* Find entry */
+   for (i = 0; i < check_vector.nr; i++)
+   if (check_vector.checks[i] == check)
+   break;
+
+   if (i >= check_vector.nr)
+   die("BUG: no entry found");
+
+   /* shift entries over */
+   for (; i < check_vector.nr - 1; i++)
+   check_vector.checks[i] = check_vector.checks[i + 1];
+
+   check_vector.nr--;
+
+   vector_unlock();
+}
+
+/* Iterate through all attr_check instances and drop their stacks */
+static void drop_attr_stack(void)
+{
+   int i;
+
+   vector_lock();
+
+   for (i = 0; i < check_vector.nr; i++) {
+   struct attr_stack **stack = _vector.checks[i]->stack;
+
+   while (*stack) {
+   struct attr_stack *elem = *stack;
+   *stack = elem->prev;
+   attr_stack_free(elem);
+   }
+   }
+
+   vector_unlock();
+}
+
 static const char *builtin_attr[] = {
"[attr]binary -diff -merge -text",
NULL,
@@ -621,15 +699,6 @@ static void debug_set(const char *what, const char *match, 
struct git_attr *attr
 #define debug_set(a,b,c,d) do { ; } while (0)
 #endif /* DEBUG_ATTR */
 
-static void drop_attr_stack(void)
-{
-   while (attr_stack) {
-   struct attr_stack *elem = attr_stack;
-   attr_stack = elem->prev;
-   free_attr_elem(elem);
-   }
-}
-
 static const char *git_etc_gitattributes(void)
 {
static const char *system_wide;
@@ -638,6 +707,14 @@ static const char *git_etc_gitattributes(void)
return system_wide;
 }
 
+static const char *get_home_gitattributes(void)
+{
+   if (!git_attributes_file)
+   git_attributes_file = xdg_config_home("attributes");
+
+   return git_attributes_file;
+}
+
 static int git_attr_system(void)
 {
return !git_env_bool("GIT_ATTR_NOSYSTEM", 0);
@@ -657,47 +734,50 @@ static void push_stack(struct attr_stack **attr_stack_p,
}
 }
 
-static void bootstrap_attr_stack(void)
+static void bootstrap_attr_stack(struct attr_stack **stack)
 {
-   struct attr_stack *elem;
+   struct attr_stack *e;
 
-   if (attr_stack)
+   if (*stack)

[PATCH v2 19/27] attr: pass struct attr_check to collect_some_attrs

2017-01-23 Thread Brandon Williams

The old callchain used to take an array of attr_check_item items.
Instead pass the 'attr_check' container object to 'collect_some_attrs()'
and access the fields in the data structure directly.

Signed-off-by: Brandon Williams 
---
 attr.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/attr.c b/attr.c
index da727e3fd..e58fa340c 100644
--- a/attr.c
+++ b/attr.c
@@ -777,9 +777,7 @@ static int macroexpand_one(int nr, int rem)
  * check_all_attr. If num is non-zero, only attributes in check[] are
  * collected. Otherwise all attributes are collected.
  */
-static void collect_some_attrs(const char *path, int num,
-  struct attr_check_item *check)
-
+static void collect_some_attrs(const char *path, struct attr_check *check)
 {
struct attr_stack *stk;
int i, pathlen, rem, dirlen;
@@ -802,17 +800,18 @@ static void collect_some_attrs(const char *path, int num,
prepare_attr_stack(path, dirlen);
for (i = 0; i < attr_nr; i++)
check_all_attr[i].value = ATTR__UNKNOWN;
-   if (num && !cannot_trust_maybe_real) {
+   if (check->check_nr && !cannot_trust_maybe_real) {
rem = 0;
-   for (i = 0; i < num; i++) {
-   if (!check[i].attr->maybe_real) {
+   for (i = 0; i < check->check_nr; i++) {
+   const struct git_attr *a = check->check[i].attr;
+   if (!a->maybe_real) {
struct attr_check_item *c;
-   c = check_all_attr + check[i].attr->attr_nr;
+   c = check_all_attr + a->attr_nr;
c->value = ATTR__UNSET;
rem++;
}
}
-   if (rem == num)
+   if (rem == check->check_nr)
return;
}
 
@@ -821,18 +820,17 @@ static void collect_some_attrs(const char *path, int num,
rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-static int git_check_attrs(const char *path, int num,
-  struct attr_check_item *check)
+int git_check_attr(const char *path, struct attr_check *check)
 {
int i;
 
-   collect_some_attrs(path, num, check);
+   collect_some_attrs(path, check);
 
-   for (i = 0; i < num; i++) {
-   const char *value = 
check_all_attr[check[i].attr->attr_nr].value;
+   for (i = 0; i < check->check_nr; i++) {
+   const char *value = 
check_all_attr[check->check[i].attr->attr_nr].value;
if (value == ATTR__UNKNOWN)
value = ATTR__UNSET;
-   check[i].value = value;
+   check->check[i].value = value;
}
 
return 0;
@@ -843,7 +841,7 @@ void git_all_attrs(const char *path, struct attr_check 
*check)
int i;
 
attr_check_reset(check);
-   collect_some_attrs(path, check->check_nr, check->check);
+   collect_some_attrs(path, check);
 
for (i = 0; i < attr_nr; i++) {
const char *name = check_all_attr[i].attr->name;
@@ -856,11 +854,6 @@ void git_all_attrs(const char *path, struct attr_check 
*check)
}
 }
 
-int git_check_attr(const char *path, struct attr_check *check)
-{
-   return git_check_attrs(path, check->check_nr, check->check);
-}
-
 struct attr_check *attr_check_alloc(void)
 {
return xcalloc(1, sizeof(struct attr_check));
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 18/27] attr: retire git_check_attrs() API

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Since nobody uses the old API, make it file-scope static, and update
the documentation to describe the new API.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 Documentation/technical/api-gitattributes.txt | 86 +--
 attr.c|  3 +-
 attr.h|  1 -
 3 files changed, 58 insertions(+), 32 deletions(-)

diff --git a/Documentation/technical/api-gitattributes.txt 
b/Documentation/technical/api-gitattributes.txt
index 260266867..82f5130e7 100644
--- a/Documentation/technical/api-gitattributes.txt
+++ b/Documentation/technical/api-gitattributes.txt
@@ -16,10 +16,15 @@ Data Structure
of no interest to the calling programs.  The name of the
attribute can be retrieved by calling `git_attr_name()`.
 
-`struct git_attr_check`::
+`struct attr_check_item`::
 
-   This structure represents a set of attributes to check in a call
-   to `git_check_attr()` function, and receives the results.
+   This structure represents one attribute and its value.
+
+`struct attr_check`::
+
+   This structure represents a collection of `attr_check_item`.
+   It is passed to `git_check_attr()` function, specifying the
+   attributes to check, and receives their values.
 
 
 Attribute Values
@@ -27,7 +32,7 @@ Attribute Values
 
 An attribute for a path can be in one of four states: Set, Unset,
 Unspecified or set to a string, and `.value` member of `struct
-git_attr_check` records it.  There are three macros to check these:
+attr_check_item` records it.  There are three macros to check these:
 
 `ATTR_TRUE()`::
 
@@ -48,49 +53,51 @@ value of the attribute for the path.
 Querying Specific Attributes
 
 
-* Prepare an array of `struct git_attr_check` to define the list of
-  attributes you would want to check.  To populate this array, you would
-  need to define necessary attributes by calling `git_attr()` function.
+* Prepare `struct attr_check` using attr_check_initl()
+  function, enumerating the names of attributes whose values you are
+  interested in, terminated with a NULL pointer.  Alternatively, an
+  empty `struct attr_check` can be prepared by calling
+  `attr_check_alloc()` function and then attributes you want to
+  ask about can be added to it with `attr_check_append()`
+  function.
 
 * Call `git_check_attr()` to check the attributes for the path.
 
-* Inspect `git_attr_check` structure to see how each of the attribute in
-  the array is defined for the path.
+* Inspect `attr_check` structure to see how each of the
+  attribute in the array is defined for the path.
 
 
 Example
 ---
 
-To see how attributes "crlf" and "indent" are set for different paths.
+To see how attributes "crlf" and "ident" are set for different paths.
 
-. Prepare an array of `struct git_attr_check` with two elements (because
-  we are checking two attributes).  Initialize their `attr` member with
-  pointers to `struct git_attr` obtained by calling `git_attr()`:
+. Prepare a `struct attr_check` with two elements (because
+  we are checking two attributes):
 
 
-static struct git_attr_check check[2];
+static struct attr_check *check;
 static void setup_check(void)
 {
-   if (check[0].attr)
+   if (check)
return; /* already done */
-   check[0].attr = git_attr("crlf");
-   check[1].attr = git_attr("ident");
+   check = attr_check_initl("crlf", "ident", NULL);
 }
 
 
-. Call `git_check_attr()` with the prepared array of `struct git_attr_check`:
+. Call `git_check_attr()` with the prepared `struct attr_check`:
 
 
const char *path;
 
setup_check();
-   git_check_attr(path, ARRAY_SIZE(check), check);
+   git_check_attr(path, check);
 
 
-. Act on `.value` member of the result, left in `check[]`:
+. Act on `.value` member of the result, left in `check->check[]`:
 
 
-   const char *value = check[0].value;
+   const char *value = check->check[0].value;
 
if (ATTR_TRUE(value)) {
The attribute is Set, by listing only the name of the
@@ -109,20 +116,39 @@ static void setup_check(void)
}
 
 
+To see how attributes in argv[] are set for different paths, only
+the first step in the above would be different.
+
+
+static struct attr_check *check;
+static void setup_check(const char **argv)
+{
+   check = attr_check_alloc();
+   while (*argv) {
+   struct git_attr *attr = git_attr(*argv);
+   attr_check_append(check, attr);
+   argv++;
+   }
+}
+
+
 
 Querying All Attributes
 ---
 
 To get the values of all attributes associated with a file:
 
-* Call `git_all_attrs()`, which returns an array

[PATCH v2 06/27] attr.c: mark where #if DEBUG ends more clearly

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 9bdf87a6f..17297fffe 100644
--- a/attr.c
+++ b/attr.c
@@ -469,7 +469,7 @@ static void debug_set(const char *what, const char *match, 
struct git_attr *attr
 #define debug_push(a) do { ; } while (0)
 #define debug_pop(a) do { ; } while (0)
 #define debug_set(a,b,c,d) do { ; } while (0)
-#endif
+#endif /* DEBUG_ATTR */
 
 static void drop_attr_stack(void)
 {
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 17/27] attr: convert git_check_attrs() callers to use the new API

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

The remaining callers are all simple "I have N attributes I am
interested in.  I'll ask about them with various paths one by one".

After this step, no caller to git_check_attrs() remains.  After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 archive.c  | 24 ++--
 builtin/pack-objects.c | 19 +--
 convert.c  | 17 ++---
 ll-merge.c | 33 ++---
 userdiff.c | 19 ---
 ws.c   | 19 ++-
 6 files changed, 45 insertions(+), 86 deletions(-)

diff --git a/archive.c b/archive.c
index b76bd4691..3591f7d55 100644
--- a/archive.c
+++ b/archive.c
@@ -87,19 +87,6 @@ void *sha1_file_to_archive(const struct archiver_args *args,
return buffer;
 }
 
-static void setup_archive_check(struct attr_check_item *check)
-{
-   static struct git_attr *attr_export_ignore;
-   static struct git_attr *attr_export_subst;
-
-   if (!attr_export_ignore) {
-   attr_export_ignore = git_attr("export-ignore");
-   attr_export_subst = git_attr("export-subst");
-   }
-   check[0].attr = attr_export_ignore;
-   check[1].attr = attr_export_subst;
-}
-
 struct directory {
struct directory *up;
struct object_id oid;
@@ -120,10 +107,10 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
void *context)
 {
static struct strbuf path = STRBUF_INIT;
+   static struct attr_check *check;
struct archiver_context *c = context;
struct archiver_args *args = c->args;
write_archive_entry_fn_t write_entry = c->write_entry;
-   struct attr_check_item check[2];
const char *path_without_prefix;
int err;
 
@@ -137,11 +124,12 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
strbuf_addch(, '/');
path_without_prefix = path.buf + args->baselen;
 
-   setup_archive_check(check);
-   if (!git_check_attrs(path_without_prefix, ARRAY_SIZE(check), check)) {
-   if (ATTR_TRUE(check[0].value))
+   if (!check)
+   check = attr_check_initl("export-ignore", "export-subst", NULL);
+   if (!git_check_attr(path_without_prefix, check)) {
+   if (ATTR_TRUE(check->check[0].value))
return 0;
-   args->convert = ATTR_TRUE(check[1].value);
+   args->convert = ATTR_TRUE(check->check[1].value);
}
 
if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8b8fbd814..ff8b3c12d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -894,24 +894,15 @@ static void write_pack_file(void)
written, nr_result);
 }
 
-static void setup_delta_attr_check(struct attr_check_item *check)
-{
-   static struct git_attr *attr_delta;
-
-   if (!attr_delta)
-   attr_delta = git_attr("delta");
-
-   check[0].attr = attr_delta;
-}
-
 static int no_try_delta(const char *path)
 {
-   struct attr_check_item check[1];
+   static struct attr_check *check;
 
-   setup_delta_attr_check(check);
-   if (git_check_attrs(path, ARRAY_SIZE(check), check))
+   if (!check)
+   check = attr_check_initl("delta", NULL);
+   if (git_check_attr(path, check))
return 0;
-   if (ATTR_FALSE(check->value))
+   if (ATTR_FALSE(check->check[0].value))
return 1;
return 0;
 }
diff --git a/convert.c b/convert.c
index 1b9829279..affd8ce9b 100644
--- a/convert.c
+++ b/convert.c
@@ -1085,24 +1085,19 @@ struct conv_attrs {
int ident;
 };
 
-static const char *conv_attr_name[] = {
-   "crlf", "ident", "filter", "eol", "text",
-};
-#define NUM_CONV_ATTRS ARRAY_SIZE(conv_attr_name)
-
 static void convert_attrs(struct conv_attrs *ca, const char *path)
 {
-   int i;
-   static struct attr_check_item ccheck[NUM_CONV_ATTRS];
+   static struct attr_check *check;
 
-   if (!ccheck[0].attr) {
-   for (i = 0; i < NUM_CONV_ATTRS; i++)
-   ccheck[i].attr = git_attr(conv_attr_name[i]);
+   if (!check) {
+   check = attr_check_initl("crlf", "ident", "filter",
+"eol", "text", NULL);
user_convert_tail = _convert;
git_config(read_convert_config, NULL);
}
 
-   if (!git_check_attrs(path, NUM_CONV_ATTRS, ccheck)) {
+   if (!git_check_attr(path, check)) {
+   struct attr_check_item *ccheck = check->check;

[PATCH v2 15/27] attr: (re)introduce git_check_attr() and struct attr_check

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

A common pattern to check N attributes for many paths is to

 (1) prepare an array A of N attr_check_item items;
 (2) call git_attr() to intern the N attribute names and fill A;
 (3) repeatedly call git_check_attrs() for path with N and A;

A look-up for these N attributes for a single path P scans the
entire attr_stack, starting from the .git/info/attributes file and
then .gitattributes file in the directory the path P is in, going
upwards to find .gitattributes file found in parent directories.

An earlier commit 06a604e6 (attr: avoid heavy work when we know the
specified attr is not defined, 2014-12-28) tried to optimize out
this scanning for one trivial special case: when the attribute being
sought is known not to exist, we do not have to scan for it.  While
this may be a cheap and effective heuristic, it would not work well
when N is (much) more than 1.

What we would want is a more customized way to skip irrelevant
entries in the attribute stack, and the definition of irrelevance
is tied to the set of attributes passed to git_check_attrs() call,
i.e. the set of attributes being sought.  The data necessary for
this optimization needs to live alongside the set of attributes, but
a simple array of git_attr_check_elem simply does not have any place
for that.

Introduce "struct attr_check" that contains N, the number of
attributes being sought, and A, the array that holds N
attr_check_item items, and a function git_check_attr() that
takes a path P and this structure as its parameters.  This structure
can later be extended to hold extra data necessary for optimization.

Also, to make it easier to write the first two steps in common
cases, introduce git_attr_check_initl() helper function, which takes
a NULL-terminated list of attribute names and initialize this
structure.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 74 ++
 attr.h | 17 +++
 2 files changed, 91 insertions(+)

diff --git a/attr.c b/attr.c
index 2f180d609..be9e398e9 100644
--- a/attr.c
+++ b/attr.c
@@ -865,6 +865,80 @@ int git_all_attrs(const char *path, int *num, struct 
attr_check_item **check)
return 0;
 }
 
+struct attr_check *attr_check_alloc(void)
+{
+   return xcalloc(1, sizeof(struct attr_check));
+}
+
+int git_check_attr(const char *path, struct attr_check *check)
+{
+   return git_check_attrs(path, check->check_nr, check->check);
+}
+
+struct attr_check *attr_check_initl(const char *one, ...)
+{
+   struct attr_check *check;
+   int cnt;
+   va_list params;
+   const char *param;
+
+   va_start(params, one);
+   for (cnt = 1; (param = va_arg(params, const char *)) != NULL; cnt++)
+   ;
+   va_end(params);
+
+   check = attr_check_alloc();
+   check->check_nr = cnt;
+   check->check_alloc = cnt;
+   check->check = xcalloc(cnt, sizeof(struct attr_check_item));
+
+   check->check[0].attr = git_attr(one);
+   va_start(params, one);
+   for (cnt = 1; cnt < check->check_nr; cnt++) {
+   struct git_attr *attr;
+   param = va_arg(params, const char *);
+   if (!param)
+   die("BUG: counted %d != ended at %d",
+   check->check_nr, cnt);
+   attr = git_attr(param);
+   if (!attr)
+   die("BUG: %s: not a valid attribute name", param);
+   check->check[cnt].attr = attr;
+   }
+   va_end(params);
+   return check;
+}
+
+struct attr_check_item *attr_check_append(struct attr_check *check,
+ const struct git_attr *attr)
+{
+   struct attr_check_item *item;
+
+   ALLOC_GROW(check->check, check->check_nr + 1, check->check_alloc);
+   item = >check[check->check_nr++];
+   item->attr = attr;
+   return item;
+}
+
+void attr_check_reset(struct attr_check *check)
+{
+   check->check_nr = 0;
+}
+
+void attr_check_clear(struct attr_check *check)
+{
+   free(check->check);
+   check->check = NULL;
+   check->check_alloc = 0;
+   check->check_nr = 0;
+}
+
+void attr_check_free(struct attr_check *check)
+{
+   attr_check_clear(check);
+   free(check);
+}
+
 void git_attr_set_direction(enum git_attr_direction new, struct index_state 
*istate)
 {
enum git_attr_direction old = direction;
diff --git a/attr.h b/attr.h
index efc7bb3b3..459347f4b 100644
--- a/attr.h
+++ b/attr.h
@@ -29,6 +29,22 @@ struct attr_check_item {
const char *value;
 };
 
+struct attr_check {
+   int check_nr;
+   int check_alloc;
+   struct attr_check_item *check;
+};
+
+extern struct attr_check *attr_check_alloc(void);
+extern struct attr_check *attr_check_initl(const char *, ...);
+
+extern struct attr_check_item

[PATCH v2 08/27] attr.c: tighten constness around "git_attr" structure

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

It holds an interned string, and git_attr_name() is a way to peek
into it.  Make sure the involved pointer types are pointer-to-const.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 2 +-
 attr.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/attr.c b/attr.c
index e42f931b3..f7cf7ae30 100644
--- a/attr.c
+++ b/attr.c
@@ -43,7 +43,7 @@ static int cannot_trust_maybe_real;
 static struct git_attr_check *check_all_attr;
 static struct git_attr *(git_attr_hash[HASHSIZE]);
 
-char *git_attr_name(struct git_attr *attr)
+const char *git_attr_name(const struct git_attr *attr)
 {
return attr->name;
 }
diff --git a/attr.h b/attr.h
index 8b08d33af..00d7a662c 100644
--- a/attr.h
+++ b/attr.h
@@ -25,7 +25,7 @@ extern const char git_attr__false[];
  * Unset one is returned as NULL.
  */
 struct git_attr_check {
-   struct git_attr *attr;
+   const struct git_attr *attr;
const char *value;
 };
 
@@ -34,7 +34,7 @@ struct git_attr_check {
  * return value is a pointer to a null-delimited string that is part
  * of the internal data structure; it should not be modified or freed.
  */
-char *git_attr_name(struct git_attr *);
+extern const char *git_attr_name(const struct git_attr *);
 
 int git_check_attr(const char *path, int, struct git_attr_check *);
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 21/27] attr: use hashmap for attribute dictionary

2017-01-23 Thread Brandon Williams

The current implementation of the attribute dictionary uses a custom
hashtable.  This modernizes the dictionary by converting it to the builtin
'hashmap' structure.

Also, in order to enable a threaded API in the future add an
accompanying mutex which must be acquired prior to accessing the
dictionary of interned attributes.

Signed-off-by: Brandon Williams 
---
 attr.c| 173 +++---
 attr.h|   2 +
 common-main.c |   3 +
 3 files changed, 133 insertions(+), 45 deletions(-)

diff --git a/attr.c b/attr.c
index 5399e1cb3..d2ece4eba 100644
--- a/attr.c
+++ b/attr.c
@@ -14,6 +14,7 @@
 #include "dir.h"
 #include "utf8.h"
 #include "quote.h"
+#include "thread-utils.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
@@ -23,28 +24,17 @@ static const char git_attr__unknown[] = "(builtin)unknown";
 #define ATTR__UNSET NULL
 #define ATTR__UNKNOWN git_attr__unknown
 
-/* This is a randomly chosen prime. */
-#define HASHSIZE 257
-
 #ifndef DEBUG_ATTR
 #define DEBUG_ATTR 0
 #endif
 
-/*
- * NEEDSWORK: the global dictionary of the interned attributes
- * must stay a singleton even after we become thread-ready.
- * Access to these must be surrounded with mutex when it happens.
- */
 struct git_attr {
-   struct git_attr *next;
-   unsigned h;
-   int attr_nr;
+   int attr_nr; /* unique attribute number */
int maybe_macro;
int maybe_real;
-   char name[FLEX_ARRAY];
+   char name[FLEX_ARRAY]; /* attribute name */
 };
 static int attr_nr;
-static struct git_attr *(git_attr_hash[HASHSIZE]);
 
 /*
  * NEEDSWORK: maybe-real, maybe-macro are not property of
@@ -63,15 +53,94 @@ const char *git_attr_name(const struct git_attr *attr)
return attr->name;
 }
 
-static unsigned hash_name(const char *name, int namelen)
+struct attr_hashmap {
+   struct hashmap map;
+#ifndef NO_PTHREADS
+   pthread_mutex_t mutex;
+#endif
+};
+
+static inline void hashmap_lock(struct attr_hashmap *map)
+{
+#ifndef NO_PTHREADS
+   pthread_mutex_lock(>mutex);
+#endif
+}
+
+static inline void hashmap_unlock(struct attr_hashmap *map)
 {
-   unsigned val = 0, c;
+#ifndef NO_PTHREADS
+   pthread_mutex_unlock(>mutex);
+#endif
+}
 
-   while (namelen--) {
-   c = *name++;
-   val = ((val << 7) | (val >> 22)) ^ c;
-   }
-   return val;
+/*
+ * The global dictionary of all interned attributes.  This
+ * is a singleton object which is shared between threads.
+ * Access to this dictionary must be surrounded with a mutex.
+ */
+static struct attr_hashmap g_attr_hashmap;
+
+/* The container for objects stored in "struct attr_hashmap" */
+struct attr_hash_entry {
+   struct hashmap_entry ent; /* must be the first member! */
+   const char *key; /* the key; memory should be owned by value */
+   size_t keylen; /* length of the key */
+   void *value; /* the stored value */
+};
+
+/* attr_hashmap comparison function */
+static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
+  const struct attr_hash_entry *b,
+  void *unused)
+{
+   return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
+}
+
+/* Initialize an 'attr_hashmap' object */
+static void attr_hashmap_init(struct attr_hashmap *map)
+{
+   hashmap_init(>map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
+}
+
+/*
+ * Retrieve the 'value' stored in a hashmap given the provided 'key'.
+ * If there is no matching entry, return NULL.
+ */
+static void *attr_hashmap_get(struct attr_hashmap *map,
+ const char *key, size_t keylen)
+{
+   struct attr_hash_entry k;
+   struct attr_hash_entry *e;
+
+   if (!map->map.tablesize)
+   attr_hashmap_init(map);
+
+   hashmap_entry_init(, memhash(key, keylen));
+   k.key = key;
+   k.keylen = keylen;
+   e = hashmap_get(>map, , NULL);
+
+   return e ? e->value : NULL;
+}
+
+/* Add 'value' to a hashmap based on the provided 'key'. */
+static void attr_hashmap_add(struct attr_hashmap *map,
+const char *key, size_t keylen,
+void *value)
+{
+   struct attr_hash_entry *e;
+
+   if (!map->map.tablesize)
+   attr_hashmap_init(map);
+
+   e = xmalloc(sizeof(struct attr_hash_entry));
+   hashmap_entry_init(e, memhash(key, keylen));
+   e->key = key;
+   e->keylen = keylen;
+   e->value = value;
+
+   hashmap_add(>map, e);
 }
 
 static int attr_name_valid(const char *name, size_t namelen)
@@ -103,37 +172,44 @@ static void report_invalid_attr(const char *name, size_t 
len,
strbuf_release();
 }
 
-static struct git_attr *git_attr_internal(const char *name, int len)
+/*
+ * Given a 'name', lookup and return the corresponding attribute in the global
+ * dictionary.  If no entry is

[PATCH v2 23/27] attr: remove maybe-real, maybe-macro from git_attr

2017-01-23 Thread Brandon Williams

Whether or not a git attribute is real or a macro isn't a property of
the attribute but rather it depends on the attribute stack (which
.gitattribute files were read).

This patch removes the 'maybe_real' and 'maybe_macro' fields in a
git_attr and instead adds the 'macro' field to a attr_check_item.  The
'macro' indicates (if non-NULL) that a particular attribute is a macro
for the given attribute stack.  It's populated, through a quick scan of
the attribute stack, with the match_attr that corresponds to the macro's
definition.  This way the attribute stack only needs to be scanned a
single time prior to attribute collection instead of each time a macro
needs to be expanded.

Signed-off-by: Brandon Williams 
---
 attr.c | 69 ++
 attr.h |  6 ++
 2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index 6d2468790..ed9ba3756 100644
--- a/attr.c
+++ b/attr.c
@@ -30,20 +30,9 @@ static const char git_attr__unknown[] = "(builtin)unknown";
 
 struct git_attr {
int attr_nr; /* unique attribute number */
-   int maybe_macro;
-   int maybe_real;
char name[FLEX_ARRAY]; /* attribute name */
 };
 
-/*
- * NEEDSWORK: maybe-real, maybe-macro are not property of
- * an attribute, as it depends on what .gitattributes are
- * read.  Once we introduce per git_attr_check attr_stack
- * and check_all_attr, the optimization based on them will
- * become unnecessary and can go away.  So is this variable.
- */
-static int cannot_trust_maybe_real;
-
 const char *git_attr_name(const struct git_attr *attr)
 {
return attr->name;
@@ -182,6 +171,7 @@ static void all_attrs_init(struct attr_hashmap *map, struct 
attr_check *check)
 */
for (i = 0; i < check->all_attrs_nr; i++) {
check->all_attrs[i].value = ATTR__UNKNOWN;
+   check->all_attrs[i].macro = NULL;
}
 }
 
@@ -233,8 +223,6 @@ static struct git_attr *git_attr_internal(const char *name, 
int namelen)
if (!a) {
FLEX_ALLOC_MEM(a, name, name, namelen);
a->attr_nr = g_attr_hashmap.map.size;
-   a->maybe_real = 0;
-   a->maybe_macro = 0;
 
attr_hashmap_add(_attr_hashmap, a->name, namelen, a);
assert(a->attr_nr == (g_attr_hashmap.map.size - 1));
@@ -397,7 +385,6 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
  (is_macro ? 0 : namelen + 1));
if (is_macro) {
res->u.attr = git_attr_internal(name, namelen);
-   res->u.attr->maybe_macro = 1;
} else {
char *p = (char *)&(res->state[num_attr]);
memcpy(p, name, namelen);
@@ -418,10 +405,6 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
/* Second pass to fill the attr_states */
for (cp = states, i = 0; *cp; i++) {
cp = parse_attr(src, lineno, cp, &(res->state[i]));
-   if (!is_macro)
-   res->state[i].attr->maybe_real = 1;
-   if (res->state[i].attr->maybe_macro)
-   cannot_trust_maybe_real = 1;
}
 
strbuf_release();
@@ -826,7 +809,7 @@ static int path_matches(const char *pathname, int pathlen,
 static int macroexpand_one(struct attr_check_item *all_attrs, int nr, int rem);
 
 static int fill_one(const char *what, struct attr_check_item *all_attrs,
-   struct match_attr *a, int rem)
+   const struct match_attr *a, int rem)
 {
int i;
 
@@ -867,24 +850,34 @@ static int fill(const char *path, int pathlen, int 
basename_offset,
 
 static int macroexpand_one(struct attr_check_item *all_attrs, int nr, int rem)
 {
-   struct attr_stack *stk;
-   int i;
+   const struct attr_check_item *item = _attrs[nr];
 
-   if (all_attrs[nr].value != ATTR__TRUE ||
-   !all_attrs[nr].attr->maybe_macro)
+   if (item->macro && item->value == ATTR__TRUE)
+   return fill_one("expand", all_attrs, item->macro, rem);
+   else
return rem;
+}
 
-   for (stk = attr_stack; stk; stk = stk->prev) {
-   for (i = stk->num_matches - 1; 0 <= i; i--) {
-   struct match_attr *ma = stk->attrs[i];
-   if (!ma->is_macro)
-   continue;
-   if (ma->u.attr->attr_nr == nr)
-   return fill_one("expand", all_attrs, ma, rem);
+/*
+ * Marks the attributes which are macros based on the attribute stack.
+ * This prevents having to search through the attribute stack each time
+ * a macro needs to be expanded during the fill stage.
+ */
+static void determine_macros(struct attr_check_item *all_attrs,
+const struct attr_stack *stack)
+{
+   for (; stack; stack = stack->prev) {
+

[PATCH v2 16/27] attr: convert git_all_attrs() to use "struct attr_check"

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

This updates the other two ways the attribute check is done via an
array of "struct attr_check_item" elements.  These two niches
appear only in "git check-attr".

 * The caller does not know offhand what attributes it wants to ask
   about and cannot use attr_check_initl() to prepare the
   attr_check structure.

 * The caller may not know what attributes it wants to ask at all,
   and instead wants to learn everything that the given path has.

Such a caller can call attr_check_alloc() to allocate an empty
attr_check, and then call attr_check_append() to add attribute names
one by one.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c   | 38 -
 attr.h   |  9 +++-
 builtin/check-attr.c | 60 ++--
 3 files changed, 47 insertions(+), 60 deletions(-)

diff --git a/attr.c b/attr.c
index be9e398e9..d2eaa0410 100644
--- a/attr.c
+++ b/attr.c
@@ -837,42 +837,32 @@ int git_check_attrs(const char *path, int num, struct 
attr_check_item *check)
return 0;
 }
 
-int git_all_attrs(const char *path, int *num, struct attr_check_item **check)
+void git_all_attrs(const char *path, struct attr_check *check)
 {
-   int i, count, j;
+   int i;
 
-   collect_some_attrs(path, 0, NULL);
+   attr_check_reset(check);
+   collect_some_attrs(path, check->check_nr, check->check);
 
-   /* Count the number of attributes that are set. */
-   count = 0;
-   for (i = 0; i < attr_nr; i++) {
-   const char *value = check_all_attr[i].value;
-   if (value != ATTR__UNSET && value != ATTR__UNKNOWN)
-   ++count;
-   }
-   *num = count;
-   ALLOC_ARRAY(*check, count);
-   j = 0;
for (i = 0; i < attr_nr; i++) {
+   const char *name = check_all_attr[i].attr->name;
const char *value = check_all_attr[i].value;
-   if (value != ATTR__UNSET && value != ATTR__UNKNOWN) {
-   (*check)[j].attr = check_all_attr[i].attr;
-   (*check)[j].value = value;
-   ++j;
-   }
+   struct attr_check_item *item;
+   if (value == ATTR__UNSET || value == ATTR__UNKNOWN)
+   continue;
+   item = attr_check_append(check, git_attr(name));
+   item->value = value;
}
-
-   return 0;
 }
 
-struct attr_check *attr_check_alloc(void)
+int git_check_attr(const char *path, struct attr_check *check)
 {
-   return xcalloc(1, sizeof(struct attr_check));
+   return git_check_attrs(path, check->check_nr, check->check);
 }
 
-int git_check_attr(const char *path, struct attr_check *check)
+struct attr_check *attr_check_alloc(void)
 {
-   return git_check_attrs(path, check->check_nr, check->check);
+   return xcalloc(1, sizeof(struct attr_check));
 }
 
 struct attr_check *attr_check_initl(const char *one, ...)
diff --git a/attr.h b/attr.h
index 459347f4b..971bb9a38 100644
--- a/attr.h
+++ b/attr.h
@@ -56,13 +56,10 @@ int git_check_attrs(const char *path, int, struct 
attr_check_item *);
 extern int git_check_attr(const char *path, struct attr_check *check);
 
 /*
- * Retrieve all attributes that apply to the specified path.  *num
- * will be set to the number of attributes on the path; **check will
- * be set to point at a newly-allocated array of git_attr_check
- * objects describing the attributes and their values.  *check must be
- * free()ed by the caller.
+ * Retrieve all attributes that apply to the specified path.
+ * check holds the attributes and their values.
  */
-int git_all_attrs(const char *path, int *num, struct attr_check_item **check);
+void git_all_attrs(const char *path, struct attr_check *check);
 
 enum git_attr_direction {
GIT_ATTR_CHECKIN,
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 889264a5b..3d4704be5 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -24,12 +24,13 @@ static const struct option check_attr_options[] = {
OPT_END()
 };
 
-static void output_attr(int cnt, struct attr_check_item *check,
-   const char *file)
+static void output_attr(struct attr_check *check, const char *file)
 {
int j;
+   int cnt = check->check_nr;
+
for (j = 0; j < cnt; j++) {
-   const char *value = check[j].value;
+   const char *value = check->check[j].value;
 
if (ATTR_TRUE(value))
value = "set";
@@ -42,36 +43,38 @@ static void output_attr(int cnt, struct attr_check_item 
*check,
printf("%s%c" /* path */
   "%s%c" /* attrname */
   "%s%c" /* attrvalue */,
-

[PATCH v2 22/27] attr: eliminate global check_all_attr array

2017-01-23 Thread Brandon Williams

Currently there is a reliance on 'check_all_attr' which is a global
array of 'attr_check_item' items which is used to store the value of
each attribute during the collection process.

This patch eliminates this global and instead creates an array per
'attr_check' instance which is then used in the attribute collection
process.  This brings the attribute system one step closer to being
thread-safe.

Signed-off-by: Brandon Williams 
---
 attr.c | 114 +++--
 attr.h |   2 ++
 2 files changed, 78 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index d2ece4eba..6d2468790 100644
--- a/attr.c
+++ b/attr.c
@@ -34,7 +34,6 @@ struct git_attr {
int maybe_real;
char name[FLEX_ARRAY]; /* attribute name */
 };
-static int attr_nr;
 
 /*
  * NEEDSWORK: maybe-real, maybe-macro are not property of
@@ -45,9 +44,6 @@ static int attr_nr;
  */
 static int cannot_trust_maybe_real;
 
-/* NEEDSWORK: This will become per git_attr_check */
-static struct attr_check_item *check_all_attr;
-
 const char *git_attr_name(const struct git_attr *attr)
 {
return attr->name;
@@ -143,6 +139,52 @@ static void attr_hashmap_add(struct attr_hashmap *map,
hashmap_add(>map, e);
 }
 
+/*
+ * Reallocate and reinitialize the array of all attributes (which is used in
+ * the attribute collection process) in 'check' based on the global dictionary
+ * of attributes.
+ */
+static void all_attrs_init(struct attr_hashmap *map, struct attr_check *check)
+{
+   int i;
+
+   hashmap_lock(map);
+
+   if (map->map.size < check->all_attrs_nr)
+   die("BUG: interned attributes shouldn't be deleted");
+
+   /*
+* If the number of attributes in the global dictionary has increased
+* (or this attr_check instance doesn't have an initialized all_attrs
+* field), reallocate the provided attr_check instance's all_attrs
+* field and fill each entry with its corresponding git_attr.
+*/
+   if (map->map.size != check->all_attrs_nr) {
+   struct attr_hash_entry *e;
+   struct hashmap_iter iter;
+   hashmap_iter_init(>map, );
+
+   REALLOC_ARRAY(check->all_attrs, map->map.size);
+   check->all_attrs_nr = map->map.size;
+
+   while ((e = hashmap_iter_next())) {
+   const struct git_attr *a = e->value;
+   check->all_attrs[a->attr_nr].attr = a;
+   }
+   }
+
+   hashmap_unlock(map);
+
+   /*
+* Re-initialize every entry in check->all_attrs.
+* This re-initialization can live outside of the locked region since
+* the attribute dictionary is no longer being accessed.
+*/
+   for (i = 0; i < check->all_attrs_nr; i++) {
+   check->all_attrs[i].value = ATTR__UNKNOWN;
+   }
+}
+
 static int attr_name_valid(const char *name, size_t namelen)
 {
/*
@@ -196,16 +238,6 @@ static struct git_attr *git_attr_internal(const char 
*name, int namelen)
 
attr_hashmap_add(_attr_hashmap, a->name, namelen, a);
assert(a->attr_nr == (g_attr_hashmap.map.size - 1));
-
-   /*
-* NEEDSWORK: per git_attr_check check_all_attr
-* will be initialized a lot more lazily, not
-* like this, and not here.
-*/
-   REALLOC_ARRAY(check_all_attr, ++attr_nr);
-   check_all_attr[a->attr_nr].attr = a;
-   check_all_attr[a->attr_nr].value = ATTR__UNKNOWN;
-   assert(a->attr_nr == (attr_nr - 1));
}
 
hashmap_unlock(_attr_hashmap);
@@ -791,16 +823,16 @@ static int path_matches(const char *pathname, int pathlen,
  pattern, prefix, pat->patternlen, pat->flags);
 }
 
-static int macroexpand_one(int attr_nr, int rem);
+static int macroexpand_one(struct attr_check_item *all_attrs, int nr, int rem);
 
-static int fill_one(const char *what, struct match_attr *a, int rem)
+static int fill_one(const char *what, struct attr_check_item *all_attrs,
+   struct match_attr *a, int rem)
 {
-   struct attr_check_item *check = check_all_attr;
int i;
 
-   for (i = a->num_attr - 1; 0 < rem && 0 <= i; i--) {
+   for (i = a->num_attr - 1; rem > 0 && i >= 0; i--) {
struct git_attr *attr = a->state[i].attr;
-   const char **n = &(check[attr->attr_nr].value);
+   const char **n = &(all_attrs[attr->attr_nr].value);
const char *v = a->state[i].setto;
 
if (*n == ATTR__UNKNOWN) {
@@ -809,14 +841,15 @@ static int fill_one(const char *what, struct match_attr 
*a, int rem)
  attr, v);
*n = v;
rem--;
-   rem = macroexpand_one(attr->attr_nr, rem);
+

[PATCH v2 24/27] attr: tighten const correctness with git_attr and match_attr

2017-01-23 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 attr.c   | 14 +++---
 attr.h   |  2 +-
 builtin/check-attr.c |  3 ++-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index ed9ba3756..95456503e 100644
--- a/attr.c
+++ b/attr.c
@@ -209,7 +209,7 @@ static void report_invalid_attr(const char *name, size_t 
len,
  * dictionary.  If no entry is found, create a new attribute and store it in
  * the dictionary.
  */
-static struct git_attr *git_attr_internal(const char *name, int namelen)
+static const struct git_attr *git_attr_internal(const char *name, int namelen)
 {
struct git_attr *a;
 
@@ -233,14 +233,14 @@ static struct git_attr *git_attr_internal(const char 
*name, int namelen)
return a;
 }
 
-struct git_attr *git_attr(const char *name)
+const struct git_attr *git_attr(const char *name)
 {
return git_attr_internal(name, strlen(name));
 }
 
 /* What does a matched pattern decide? */
 struct attr_state {
-   struct git_attr *attr;
+   const struct git_attr *attr;
const char *setto;
 };
 
@@ -267,7 +267,7 @@ struct pattern {
 struct match_attr {
union {
struct pattern pat;
-   struct git_attr *attr;
+   const struct git_attr *attr;
} u;
char is_macro;
unsigned num_attr;
@@ -814,7 +814,7 @@ static int fill_one(const char *what, struct 
attr_check_item *all_attrs,
int i;
 
for (i = a->num_attr - 1; rem > 0 && i >= 0; i--) {
-   struct git_attr *attr = a->state[i].attr;
+   const struct git_attr *attr = a->state[i].attr;
const char **n = &(all_attrs[attr->attr_nr].value);
const char *v = a->state[i].setto;
 
@@ -838,7 +838,7 @@ static int fill(const char *path, int pathlen, int 
basename_offset,
const char *base = stk->origin ? stk->origin : "";
 
for (i = stk->num_matches - 1; 0 < rem && 0 <= i; i--) {
-   struct match_attr *a = stk->attrs[i];
+   const struct match_attr *a = stk->attrs[i];
if (a->is_macro)
continue;
if (path_matches(path, pathlen, basename_offset,
@@ -988,7 +988,7 @@ struct attr_check *attr_check_initl(const char *one, ...)
check->check[0].attr = git_attr(one);
va_start(params, one);
for (cnt = 1; cnt < check->check_nr; cnt++) {
-   struct git_attr *attr;
+   const struct git_attr *attr;
param = va_arg(params, const char *);
if (!param)
die("BUG: counted %d != ended at %d",
diff --git a/attr.h b/attr.h
index f40524875..9b4dc07d8 100644
--- a/attr.h
+++ b/attr.h
@@ -8,7 +8,7 @@ struct git_attr;
  * Given a string, return the gitattribute object that
  * corresponds to it.
  */
-struct git_attr *git_attr(const char *);
+const struct git_attr *git_attr(const char *);
 
 /* Internal use */
 extern const char git_attr__true[];
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 3d4704be5..cc6caf7ac 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -166,7 +166,8 @@ int cmd_check_attr(int argc, const char **argv, const char 
*prefix)
check = attr_check_alloc();
if (!all_attrs) {
for (i = 0; i < cnt; i++) {
-   struct git_attr *a = git_attr(argv[i]);
+   const struct git_attr *a = git_attr(argv[i]);
+
if (!a)
return error("%s: not a valid attribute name",
 argv[i]);
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 20/27] attr: change validity check for attribute names to use positive logic

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Convert 'invalid_attr_name()' to 'attr_name_valid()' and use positive
logic for the return value.  In addition create a helper function that
prints out an error message when an invalid attribute name is used.

We could later update the message to exactly spell out what the
rules for a good attribute name are, etc.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/attr.c b/attr.c
index e58fa340c..5399e1cb3 100644
--- a/attr.c
+++ b/attr.c
@@ -74,23 +74,33 @@ static unsigned hash_name(const char *name, int namelen)
return val;
 }
 
-static int invalid_attr_name(const char *name, int namelen)
+static int attr_name_valid(const char *name, size_t namelen)
 {
/*
 * Attribute name cannot begin with '-' and must consist of
 * characters from [-A-Za-z0-9_.].
 */
if (namelen <= 0 || *name == '-')
-   return -1;
+   return 0;
while (namelen--) {
char ch = *name++;
if (! (ch == '-' || ch == '.' || ch == '_' ||
   ('0' <= ch && ch <= '9') ||
   ('a' <= ch && ch <= 'z') ||
   ('A' <= ch && ch <= 'Z')) )
-   return -1;
+   return 0;
}
-   return 0;
+   return 1;
+}
+
+static void report_invalid_attr(const char *name, size_t len,
+   const char *src, int lineno)
+{
+   struct strbuf err = STRBUF_INIT;
+   strbuf_addf(, _("%.*s is not a valid attribute name"),
+   (int) len, name);
+   fprintf(stderr, "%s: %s:%d\n", err.buf, src, lineno);
+   strbuf_release();
 }
 
 static struct git_attr *git_attr_internal(const char *name, int len)
@@ -105,7 +115,7 @@ static struct git_attr *git_attr_internal(const char *name, 
int len)
return a;
}
 
-   if (invalid_attr_name(name, len))
+   if (!attr_name_valid(name, len))
return NULL;
 
FLEX_ALLOC_MEM(a, name, name, len);
@@ -196,17 +206,15 @@ static const char *parse_attr(const char *src, int 
lineno, const char *cp,
cp++;
len--;
}
-   if (invalid_attr_name(cp, len)) {
-   fprintf(stderr,
-   "%.*s is not a valid attribute name: %s:%d\n",
-   len, cp, src, lineno);
+   if (!attr_name_valid(cp, len)) {
+   report_invalid_attr(cp, len, src, lineno);
return NULL;
}
} else {
/*
 * As this function is always called twice, once with
 * e == NULL in the first pass and then e != NULL in
-* the second pass, no need for invalid_attr_name()
+* the second pass, no need for attr_name_valid()
 * check here.
 */
if (*cp == '-' || *cp == '!') {
@@ -258,10 +266,8 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
name += strlen(ATTRIBUTE_MACRO_PREFIX);
name += strspn(name, blank);
namelen = strcspn(name, blank);
-   if (invalid_attr_name(name, namelen)) {
-   fprintf(stderr,
-   "%.*s is not a valid attribute name: %s:%d\n",
-   namelen, name, src, lineno);
+   if (!attr_name_valid(name, namelen)) {
+   report_invalid_attr(name, namelen, src, lineno);
goto fail_return;
}
}
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 14/27] attr: rename function and struct related to checking attributes

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.

In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 archive.c  |  6 +++---
 attr.c | 12 ++--
 attr.h |  8 
 builtin/check-attr.c   | 19 ++-
 builtin/pack-objects.c |  6 +++---
 convert.c  | 12 ++--
 ll-merge.c | 10 +-
 userdiff.c |  4 ++--
 ws.c   |  6 +++---
 9 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/archive.c b/archive.c
index 01751e574..b76bd4691 100644
--- a/archive.c
+++ b/archive.c
@@ -87,7 +87,7 @@ void *sha1_file_to_archive(const struct archiver_args *args,
return buffer;
 }
 
-static void setup_archive_check(struct git_attr_check *check)
+static void setup_archive_check(struct attr_check_item *check)
 {
static struct git_attr *attr_export_ignore;
static struct git_attr *attr_export_subst;
@@ -123,7 +123,7 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
struct archiver_context *c = context;
struct archiver_args *args = c->args;
write_archive_entry_fn_t write_entry = c->write_entry;
-   struct git_attr_check check[2];
+   struct attr_check_item check[2];
const char *path_without_prefix;
int err;
 
@@ -138,7 +138,7 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
path_without_prefix = path.buf + args->baselen;
 
setup_archive_check(check);
-   if (!git_check_attr(path_without_prefix, ARRAY_SIZE(check), check)) {
+   if (!git_check_attrs(path_without_prefix, ARRAY_SIZE(check), check)) {
if (ATTR_TRUE(check[0].value))
return 0;
args->convert = ATTR_TRUE(check[1].value);
diff --git a/attr.c b/attr.c
index 50e5ee393..2f180d609 100644
--- a/attr.c
+++ b/attr.c
@@ -56,7 +56,7 @@ static struct git_attr *(git_attr_hash[HASHSIZE]);
 static int cannot_trust_maybe_real;
 
 /* NEEDSWORK: This will become per git_attr_check */
-static struct git_attr_check *check_all_attr;
+static struct attr_check_item *check_all_attr;
 
 const char *git_attr_name(const struct git_attr *attr)
 {
@@ -713,7 +713,7 @@ static int macroexpand_one(int attr_nr, int rem);
 
 static int fill_one(const char *what, struct match_attr *a, int rem)
 {
-   struct git_attr_check *check = check_all_attr;
+   struct attr_check_item *check = check_all_attr;
int i;
 
for (i = a->num_attr - 1; 0 < rem && 0 <= i; i--) {
@@ -778,7 +778,7 @@ static int macroexpand_one(int nr, int rem)
  * collected. Otherwise all attributes are collected.
  */
 static void collect_some_attrs(const char *path, int num,
-  struct git_attr_check *check)
+  struct attr_check_item *check)
 
 {
struct attr_stack *stk;
@@ -806,7 +806,7 @@ static void collect_some_attrs(const char *path, int num,
rem = 0;
for (i = 0; i < num; i++) {
if (!check[i].attr->maybe_real) {
-   struct git_attr_check *c;
+   struct attr_check_item *c;
c = check_all_attr + check[i].attr->attr_nr;
c->value = ATTR__UNSET;
rem++;
@@ -821,7 +821,7 @@ static void collect_some_attrs(const char *path, int num,
rem = fill(path, pathlen, basename_offset, stk, rem);
 }
 
-int git_check_attr(const char *path, int num, struct git_attr_check *check)
+int git_check_attrs(const char *path, int num, struct attr_check_item *check)
 {
int i;
 
@@ -837,7 +837,7 @@ int git_check_attr(const char *path, int num, struct 
git_attr_check *check)
return 0;
 }
 
-int git_all_attrs(const char *path, int *num, struct git_attr_check **check)
+int git_all_attrs(const char *path, int *num, struct attr_check_item **check)
 {
int i, count, j;
 
diff --git a/attr.h b/attr.h
index 00d7a662c..efc7bb3b3 100644
--- a/attr.h
+++ b/attr.h
@@ -20,11 +20,11 @@ extern const char git_attr__false[];
 #define ATTR_UNSET(v) ((v) == NULL)
 
 /*
- * Send one or more git_attr_check to git_check_attr(), and
+ * Send one or more git_attr_check to git_check_attrs(), and
  * each 'value' member tells what its value is.
  * Unset one is returned as NULL.
  */
-struct git_attr_check {
+struct attr_check_item {

[PATCH v2 11/27] attr.c: add push_stack() helper

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

There are too many repetitious "I have this new attr_stack element;
push it at the top of the stack" sequence.  The new helper function
push_stack() gives us a way to express what is going on at these
places, and as a side effect, halves the number of times we mention
the attr_stack global variable.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 71 +++---
 1 file changed, 33 insertions(+), 38 deletions(-)

diff --git a/attr.c b/attr.c
index e1c630f79..8026d68bd 100644
--- a/attr.c
+++ b/attr.c
@@ -510,6 +510,18 @@ static int git_attr_system(void)
 
 static GIT_PATH_FUNC(git_path_info_attributes, INFOATTRIBUTES_FILE)
 
+static void push_stack(struct attr_stack **attr_stack_p,
+  struct attr_stack *elem, char *origin, size_t originlen)
+{
+   if (elem) {
+   elem->origin = origin;
+   if (origin)
+   elem->originlen = originlen;
+   elem->prev = *attr_stack_p;
+   *attr_stack_p = elem;
+   }
+}
+
 static void bootstrap_attr_stack(void)
 {
struct attr_stack *elem;
@@ -517,37 +529,23 @@ static void bootstrap_attr_stack(void)
if (attr_stack)
return;
 
-   elem = read_attr_from_array(builtin_attr);
-   elem->origin = NULL;
-   elem->prev = attr_stack;
-   attr_stack = elem;
-
-   if (git_attr_system()) {
-   elem = read_attr_from_file(git_etc_gitattributes(), 1);
-   if (elem) {
-   elem->origin = NULL;
-   elem->prev = attr_stack;
-   attr_stack = elem;
-   }
-   }
+   push_stack(_stack, read_attr_from_array(builtin_attr), NULL, 0);
+
+   if (git_attr_system())
+   push_stack(_stack,
+  read_attr_from_file(git_etc_gitattributes(), 1),
+  NULL, 0);
 
if (!git_attributes_file)
git_attributes_file = xdg_config_home("attributes");
-   if (git_attributes_file) {
-   elem = read_attr_from_file(git_attributes_file, 1);
-   if (elem) {
-   elem->origin = NULL;
-   elem->prev = attr_stack;
-   attr_stack = elem;
-   }
-   }
+   if (git_attributes_file)
+   push_stack(_stack,
+  read_attr_from_file(git_attributes_file, 1),
+  NULL, 0);
 
if (!is_bare_repository() || direction == GIT_ATTR_INDEX) {
elem = read_attr(GITATTRIBUTES_FILE, 1);
-   elem->origin = xstrdup("");
-   elem->originlen = 0;
-   elem->prev = attr_stack;
-   attr_stack = elem;
+   push_stack(_stack, elem, xstrdup(""), 0);
debug_push(elem);
}
 
@@ -558,15 +556,12 @@ static void bootstrap_attr_stack(void)
 
if (!elem)
elem = xcalloc(1, sizeof(*elem));
-   elem->origin = NULL;
-   elem->prev = attr_stack;
-   attr_stack = elem;
+   push_stack(_stack, elem, NULL, 0);
 }
 
 static void prepare_attr_stack(const char *path, int dirlen)
 {
struct attr_stack *elem, *info;
-   int len;
const char *cp;
 
/*
@@ -626,20 +621,21 @@ static void prepare_attr_stack(const char *path, int 
dirlen)
 
assert(attr_stack->origin);
while (1) {
-   len = strlen(attr_stack->origin);
+   size_t len = strlen(attr_stack->origin);
+   char *origin;
+
if (dirlen <= len)
break;
cp = memchr(path + len + 1, '/', dirlen - len - 1);
if (!cp)
cp = path + dirlen;
-   strbuf_add(, path, cp - path);
-   strbuf_addch(, '/');
-   strbuf_addstr(, GITATTRIBUTES_FILE);
+   strbuf_addf(,
+   "%.*s/%s", (int)(cp - path), path,
+   GITATTRIBUTES_FILE);
elem = read_attr(pathbuf.buf, 0);
strbuf_setlen(, cp - path);
-   elem->origin = strbuf_detach(, 
>originlen);
-   elem->prev = attr_stack;
-   attr_stack = elem;
+   origin = strbuf_detach(, );
+   push_stack(_stack, elem, origin, len);
debug_push(elem);
}
 
@@ -649,8 +645,7 @@ static void prepare_attr_stack(const char *path, int dirlen)
/*
 * Finally push the "info" one at the top of the stack.

[PATCH v2 02/27] attr.c: use strchrnul() to scan for one line

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/attr.c b/attr.c
index 1fcf042b8..04d24334e 100644
--- a/attr.c
+++ b/attr.c
@@ -402,8 +402,8 @@ static struct attr_stack *read_attr_from_index(const char 
*path, int macro_ok)
for (sp = buf; *sp; ) {
char *ep;
int more;
-   for (ep = sp; *ep && *ep != '\n'; ep++)
-   ;
+
+   ep = strchrnul(sp, '\n');
more = (*ep == '\n');
*ep = '\0';
handle_attr_line(res, sp, path, ++lineno, macro_ok);
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 00/27] Revamp the attribute system; another round

2017-01-23 Thread Brandon Williams

Changes in v2:
* surround the mutex initializer calls by #ifdef
* mark file-local symbol static
* handling of attribute stacks.  Instead of storing each stack frame in a
  hashmap, there is a stack per attr_check instance.  This will allow for
  easier optimizing of the stack in future patches as well as eliminates the
  potential for memory to grow unbounded.  This is also more inline with the
  original vision of the attribute system refactor.

Brandon Williams (8):
  attr: pass struct attr_check to collect_some_attrs
  attr: use hashmap for attribute dictionary
  attr: eliminate global check_all_attr array
  attr: remove maybe-real, maybe-macro from git_attr
  attr: tighten const correctness with git_attr and match_attr
  attr: store attribute stack in attr_check structure
  attr: push the bare repo check into read_attr()
  attr: reformat git_attr_set_direction() function

Junio C Hamano (17):
  commit.c: use strchrnul() to scan for one line
  attr.c: use strchrnul() to scan for one line
  attr.c: update a stale comment on "struct match_attr"
  attr.c: explain the lack of attr-name syntax check in parse_attr()
  attr.c: complete a sentence in a comment
  attr.c: mark where #if DEBUG ends more clearly
  attr.c: simplify macroexpand_one()
  attr.c: tighten constness around "git_attr" structure
  attr.c: plug small leak in parse_attr_line()
  attr.c: add push_stack() helper
  attr.c: outline the future plans by heavily commenting
  attr: rename function and struct related to checking attributes
  attr: (re)introduce git_check_attr() and struct attr_check
  attr: convert git_all_attrs() to use "struct attr_check"
  attr: convert git_check_attrs() callers to use the new API
  attr: retire git_check_attrs() API
  attr: change validity check for attribute names to use positive logic

Nguyễn Thái Ngọc Duy (1):
  attr: support quoting pathname patterns in C style

Stefan Beller (1):
  Documentation: fix a typo

 Documentation/gitattributes.txt   |  10 +-
 Documentation/technical/api-gitattributes.txt |  86 ++-
 archive.c |  24 +-
 attr.c| 854 ++
 attr.h|  53 +-
 builtin/check-attr.c  |  66 +-
 builtin/pack-objects.c|  19 +-
 commit.c  |   3 +-
 common-main.c |   3 +
 convert.c |  25 +-
 ll-merge.c|  33 +-
 t/t0003-attributes.sh |  26 +
 userdiff.c|  19 +-
 ws.c  |  19 +-
 14 files changed, 800 insertions(+), 440 deletions(-)

-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 10/27] attr: support quoting pathname patterns in C style

2017-01-23 Thread Brandon Williams

From: Nguyễn Thái Ngọc Duy 

Full pattern must be quoted. So 'pat"t"ern attr' will give exactly
'pat"t"ern', not 'pattern'. Also clarify that leading whitespaces are
not part of the pattern and document comment syntax.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 Documentation/gitattributes.txt |  8 +---
 attr.c  | 15 +--
 t/t0003-attributes.sh   | 26 ++
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e0b66c122..3173dee7e 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -21,9 +21,11 @@ Each line in `gitattributes` file is of form:
pattern attr1 attr2 ...
 
 That is, a pattern followed by an attributes list,
-separated by whitespaces.  When the pattern matches the
-path in question, the attributes listed on the line are given to
-the path.
+separated by whitespaces. Leading and trailing whitespaces are
+ignored. Lines that begin with '#' are ignored. Patterns
+that begin with a double quote are quoted in C style.
+When the pattern matches the path in question, the attributes
+listed on the line are given to the path.
 
 Each attribute can be in one of these states for a given path:
 
diff --git a/attr.c b/attr.c
index d180c7833..e1c630f79 100644
--- a/attr.c
+++ b/attr.c
@@ -13,6 +13,7 @@
 #include "attr.h"
 #include "dir.h"
 #include "utf8.h"
+#include "quote.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
@@ -212,12 +213,21 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
const char *cp, *name, *states;
struct match_attr *res = NULL;
int is_macro;
+   struct strbuf pattern = STRBUF_INIT;
 
cp = line + strspn(line, blank);
if (!*cp || *cp == '#')
return NULL;
name = cp;
-   namelen = strcspn(name, blank);
+
+   if (*cp == '"' && !unquote_c_style(, name, )) {
+   name = pattern.buf;
+   namelen = pattern.len;
+   } else {
+   namelen = strcspn(name, blank);
+   states = name + namelen;
+   }
+
if (strlen(ATTRIBUTE_MACRO_PREFIX) < namelen &&
starts_with(name, ATTRIBUTE_MACRO_PREFIX)) {
if (!macro_ok) {
@@ -239,7 +249,6 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
else
is_macro = 0;
 
-   states = name + namelen;
states += strspn(states, blank);
 
/* First pass to count the attr_states */
@@ -282,9 +291,11 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
cannot_trust_maybe_real = 1;
}
 
+   strbuf_release();
return res;
 
 fail_return:
+   strbuf_release();
free(res);
return NULL;
 }
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index f0fbb4255..f19ae4f8c 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -13,10 +13,31 @@ attr_check () {
test_line_count = 0 err
 }
 
+attr_check_quote () {
+
+   path="$1"
+   quoted_path="$2"
+   expect="$3"
+
+   git check-attr test -- "$path" >actual &&
+   echo "\"$quoted_path\": test: $expect" >expect &&
+   test_cmp expect actual
+
+}
+
+test_expect_success 'open-quoted pathname' '
+   echo "\"a test=a" >.gitattributes &&
+   test_must_fail attr_check a a
+'
+
+
 test_expect_success 'setup' '
mkdir -p a/b/d a/c b &&
(
echo "[attr]notest !test"
+   echo "\" d \"   test=d"
+   echo " etest=e"
+   echo " e\"  test=e"
echo "f test=f"
echo "a/i test=a/i"
echo "onoff test -test"
@@ -69,6 +90,11 @@ test_expect_success 'command line checks' '
 '
 
 test_expect_success 'attribute test' '
+
+   attr_check " d " d &&
+   attr_check e e &&
+   attr_check_quote e\" e\\\" e &&
+
attr_check f f &&
attr_check a/f f &&
attr_check a/c/f f &&
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 12/27] Documentation: fix a typo

2017-01-23 Thread Brandon Williams

From: Stefan Beller 

Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 Documentation/gitattributes.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 3173dee7e..a53d093ca 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -88,7 +88,7 @@ is either not set or empty, $HOME/.config/git/attributes is 
used instead.
 Attributes for all users on a system should be placed in the
 `$(prefix)/etc/gitattributes` file.
 
-Sometimes you would need to override an setting of an attribute
+Sometimes you would need to override a setting of an attribute
 for a path to `Unspecified` state.  This can be done by listing
 the name of the attribute prefixed with an exclamation point `!`.
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 13/27] attr.c: outline the future plans by heavily commenting

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 8026d68bd..50e5ee393 100644
--- a/attr.c
+++ b/attr.c
@@ -30,6 +30,11 @@ static const char git_attr__unknown[] = "(builtin)unknown";
 #define DEBUG_ATTR 0
 #endif
 
+/*
+ * NEEDSWORK: the global dictionary of the interned attributes
+ * must stay a singleton even after we become thread-ready.
+ * Access to these must be surrounded with mutex when it happens.
+ */
 struct git_attr {
struct git_attr *next;
unsigned h;
@@ -39,10 +44,19 @@ struct git_attr {
char name[FLEX_ARRAY];
 };
 static int attr_nr;
+static struct git_attr *(git_attr_hash[HASHSIZE]);
+
+/*
+ * NEEDSWORK: maybe-real, maybe-macro are not property of
+ * an attribute, as it depends on what .gitattributes are
+ * read.  Once we introduce per git_attr_check attr_stack
+ * and check_all_attr, the optimization based on them will
+ * become unnecessary and can go away.  So is this variable.
+ */
 static int cannot_trust_maybe_real;
 
+/* NEEDSWORK: This will become per git_attr_check */
 static struct git_attr_check *check_all_attr;
-static struct git_attr *(git_attr_hash[HASHSIZE]);
 
 const char *git_attr_name(const struct git_attr *attr)
 {
@@ -102,6 +116,11 @@ static struct git_attr *git_attr_internal(const char 
*name, int len)
a->maybe_real = 0;
git_attr_hash[pos] = a;
 
+   /*
+* NEEDSWORK: per git_attr_check check_all_attr
+* will be initialized a lot more lazily, not
+* like this, and not here.
+*/
REALLOC_ARRAY(check_all_attr, attr_nr);
check_all_attr[a->attr_nr].attr = a;
check_all_attr[a->attr_nr].value = ATTR__UNKNOWN;
@@ -318,6 +337,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
  * .gitignore file and info/excludes file as a fallback.
  */
 
+/* NEEDSWORK: This will become per git_attr_check */
 static struct attr_stack {
struct attr_stack *prev;
char *origin;
@@ -382,6 +402,24 @@ static struct attr_stack *read_attr_from_array(const char 
**list)
return res;
 }
 
+/*
+ * NEEDSWORK: these two are tricky.  The callers assume there is a
+ * single, system-wide global state "where we read attributes from?"
+ * and when the state is flipped by calling git_attr_set_direction(),
+ * attr_stack is discarded so that subsequent attr_check will lazily
+ * read from the right place.  And they do not know or care who called
+ * by them uses the attribute subsystem, hence have no knowledge of
+ * existing git_attr_check instances or future ones that will be
+ * created).
+ *
+ * Probably we need a thread_local that holds these two variables,
+ * and a list of git_attr_check instances (which need to be maintained
+ * by hooking into git_attr_check_alloc(), git_attr_check_initl(), and
+ * git_attr_check_clear().  Then git_attr_set_direction() updates the
+ * fields in that thread_local for these two variables, iterate over
+ * all the active git_attr_check instances and discard the attr_stack
+ * they hold.  Yuck, but it sounds doable.
+ */
 static enum git_attr_direction direction;
 static struct index_state *use_index;
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 09/27] attr.c: plug small leak in parse_attr_line()

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

If any error is noticed after the match_attr structure is allocated,
we shouldn't just return NULL from this function.

Add a fail_return label that frees the allocated structure and
returns NULL, and consistently jump there when we want to return
NULL after cleaning up.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/attr.c b/attr.c
index f7cf7ae30..d180c7833 100644
--- a/attr.c
+++ b/attr.c
@@ -223,7 +223,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
if (!macro_ok) {
fprintf(stderr, "%s not allowed: %s:%d\n",
name, src, lineno);
-   return NULL;
+   goto fail_return;
}
is_macro = 1;
name += strlen(ATTRIBUTE_MACRO_PREFIX);
@@ -233,7 +233,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
fprintf(stderr,
"%.*s is not a valid attribute name: %s:%d\n",
namelen, name, src, lineno);
-   return NULL;
+   goto fail_return;
}
}
else
@@ -246,7 +246,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
for (cp = states, num_attr = 0; *cp; num_attr++) {
cp = parse_attr(src, lineno, cp, NULL);
if (!cp)
-   return NULL;
+   goto fail_return;
}
 
res = xcalloc(1,
@@ -267,7 +267,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
if (res->u.pat.flags & EXC_FLAG_NEGATIVE) {
warning(_("Negative patterns are ignored in git 
attributes\n"
  "Use '\\!' for literal leading 
exclamation."));
-   return NULL;
+   goto fail_return;
}
}
res->is_macro = is_macro;
@@ -283,6 +283,10 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
}
 
return res;
+
+fail_return:
+   free(res);
+   return NULL;
 }
 
 /*
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 04/27] attr.c: explain the lack of attr-name syntax check in parse_attr()

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/attr.c b/attr.c
index 007f1a299..6b55a57ef 100644
--- a/attr.c
+++ b/attr.c
@@ -183,6 +183,12 @@ static const char *parse_attr(const char *src, int lineno, 
const char *cp,
return NULL;
}
} else {
+   /*
+* As this function is always called twice, once with
+* e == NULL in the first pass and then e != NULL in
+* the second pass, no need for invalid_attr_name()
+* check here.
+*/
if (*cp == '-' || *cp == '!') {
e->setto = (*cp == '-') ? ATTR__FALSE : ATTR__UNSET;
cp++;
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 05/27] attr.c: complete a sentence in a comment

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/attr.c b/attr.c
index 6b55a57ef..9bdf87a6f 100644
--- a/attr.c
+++ b/attr.c
@@ -300,7 +300,7 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
  * directory (again, reading the file from top to bottom) down to the
  * current directory, and then scan the list backwards to find the first match.
  * This is exactly the same as what is_excluded() does in dir.c to deal with
- * .gitignore
+ * .gitignore file and info/excludes file as a fallback.
  */
 
 static struct attr_stack {
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 01/27] commit.c: use strchrnul() to scan for one line

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 commit.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 2cf85158b..0c4ee3de4 100644
--- a/commit.c
+++ b/commit.c
@@ -415,8 +415,7 @@ int find_commit_subject(const char *commit_buffer, const 
char **subject)
p++;
if (*p) {
p = skip_blank_lines(p + 2);
-   for (eol = p; *eol && *eol != '\n'; eol++)
-   ; /* do nothing */
+   eol = strchrnul(p, '\n');
} else
eol = p;
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 07/27] attr.c: simplify macroexpand_one()

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

The double-loop wants to do an early return immediately when one
matching macro is found.  Eliminate the extra variable 'a' used for
that purpose and rewrite the "assign the found item to 'a' to make
it non-NULL and force the loop(s) to terminate" with a direct return
from there.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/attr.c b/attr.c
index 17297fffe..e42f931b3 100644
--- a/attr.c
+++ b/attr.c
@@ -705,24 +705,21 @@ static int fill(const char *path, int pathlen, int 
basename_offset,
 static int macroexpand_one(int nr, int rem)
 {
struct attr_stack *stk;
-   struct match_attr *a = NULL;
int i;
 
if (check_all_attr[nr].value != ATTR__TRUE ||
!check_all_attr[nr].attr->maybe_macro)
return rem;
 
-   for (stk = attr_stack; !a && stk; stk = stk->prev)
-   for (i = stk->num_matches - 1; !a && 0 <= i; i--) {
+   for (stk = attr_stack; stk; stk = stk->prev) {
+   for (i = stk->num_matches - 1; 0 <= i; i--) {
struct match_attr *ma = stk->attrs[i];
if (!ma->is_macro)
continue;
if (ma->u.attr->attr_nr == nr)
-   a = ma;
+   return fill_one("expand", ma, rem);
}
-
-   if (a)
-   rem = fill_one("expand", a, rem);
+   }
 
return rem;
 }
-- 
2.11.0.483.g087da7b7c-goog

[PATCH v2 03/27] attr.c: update a stale comment on "struct match_attr"

2017-01-23 Thread Brandon Williams

From: Junio C Hamano 

When 82dce998 (attr: more matching optimizations from .gitignore,
2012-10-15) changed a pointer to a string "*pattern" into an
embedded "struct pattern" in struct match_attr, it forgot to update
the comment that describes the structure.

Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Brandon Williams 
---
 attr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/attr.c b/attr.c
index 04d24334e..007f1a299 100644
--- a/attr.c
+++ b/attr.c
@@ -131,9 +131,8 @@ struct pattern {
  * If is_macro is true, then u.attr is a pointer to the git_attr being
  * defined.
  *
- * If is_macro is false, then u.pattern points at the filename pattern
- * to which the rule applies.  (The memory pointed to is part of the
- * memory block allocated for the match_attr instance.)
+ * If is_macro is false, then u.pat is the filename pattern to which the
+ * rule applies.
  *
  * In either case, num_attr is the number of attributes affected by
  * this rule, and state is an array listing them.  The attributes are
-- 
2.11.0.483.g087da7b7c-goog

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano  wrote:
>>
>> Should we plan to extend this to the interactive backend that is
>> shared between rebase -i and rebase -m, too?  Or is this patch
>> already sufficient to cover them?
>
> AFAIK this is sufficient for both, in the sense that I've used it with
> git rebase -i and it works.

That is a good news and at the same time a bit awkard one ;-)  

The mention of "passed to 'git am'" twice in the documentation and
help text would lead people to think "rebase -i" would not be
affected and (1) would need more work to do so, or (2) the user does
not want "rebase -i" to be unaffected for whatever reason, and gets
surprised to see that it actually does get affected.

In any case, will queue as-is so that we won't lose the patch while
waiting for people to raise their opinions.

Thanks.

Re: [RFC PATCH] Option to allow cherry-pick to skip empty commits

2017-01-23 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> ... I still don't see how to force a complete reread of the index
> after running a git reset (which I need for the --skip command), ...

Do you mean discard_index() or discard_cache() followed by
read_index() or read_cache(), or do you mean something more
elaborate?

Re: [RFC PATCH] Option to allow cherry-pick to skip empty commits

2017-01-23 Thread Giuseppe Bilotta

On Mon, Jan 23, 2017 at 7:15 PM, Junio C Hamano  wrote:
> Giuseppe Bilotta  writes:
>
>> By the way, I noticed going over the code that the -allow options are
>> not stored, so that in case of interruption they will be reset, is
>> this intentional or a bug?
>
> I do not know offhand, but given the history of the two commands, I
> would guess it was a bug simply overlooked when people bolted "a
> series of commits" mode onto these commands that originally worked
> only on a single commit.

It seems like a bug to me. I'll prepare a new series that also fixes
this. I still don't see how to force a complete reread of the index
after running a git reset (which I need for the --skip command), but
maybe I'll add a WIP of what I have for --skip so far and people can
comment on that.

-- 
Giuseppe "Oblomov" Bilotta

Re: [PATCH v3 0/5] show-ref: Allow -d, --head to work with --verify

2017-01-23 Thread Junio C Hamano

Vladimir Panteleev  writes:

> Third iteration, according to Junio's comments. This time we keep
> show_ref and show_one separate, accept HEAD with --verify even without
> --head, and add tests for dangling ref validation with --verify.

I am no longer a neutral judge but to me the resulting code looks a
lot more naturally structured than the original.

Thanks.  Will queue.

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Giuseppe Bilotta

On Mon, Jan 23, 2017 at 7:13 PM, Junio C Hamano  wrote:
>
> Should we plan to extend this to the interactive backend that is
> shared between rebase -i and rebase -m, too?  Or is this patch
> already sufficient to cover them?

AFAIK this is sufficient for both, in the sense that I've used it with
git rebase -i and it works.

Re: [PATCH v3 08/21] Documentation/git-update-index: talk about core.splitIndex config var

2017-01-23 Thread Junio C Hamano

Christian Couder  writes:

>>> Perhaps we should declare that config will be the one and only way
>>> in the future and start deprecating the command line option way.
>>> That will remove the need for two to interact with each other.
>
> That would be my preferred solution as I think it is the simplest in
> the end for users.
> Also, as Duy wrote above, one can always use something like "git -c
> core.splitIndex= ...", which by the way can work for any command, not
> just "update-index".
>
> Anyway we would have to introduce core.splitIndex first, and it
> wouldn't make sense to have a different behavior between
> --[no-]split-index and --[no-]untracked-cache in the meantime before
> they are deprecated and eventually removed.
>
> So let's just go with the implementation in this series, which is
> similar to --[no-]untracked-cache, and let's plan to deprecate
> --[no-]split-index and --[no-]untracked-cache in a later patch series.

Sounds like we have a plan.  Thanks.

Re: [PATCH v1 0/2] urlmatch: allow regexp-based matches

2017-01-23 Thread Junio C Hamano

Patrick Steinhardt  writes:

> This patch is mostly a request for comments. The use case is to
> be able to configure an HTTP proxy for all subdomains of a
> certain domain where there are hundreds of subdomains. The most
> flexible way I could imagine was by using regular expressions for
> the matching, which is how I implemented it for now. So users can
> now create a configuration key like
> `http.?http://.*\\.example\\.com.*` to apply settings for all
> subdomains of `example.com`.

While reading 2/2, I got an impression that this is "too" flexible
and possibly operates at a wrong level.  I would have expected that
the wildcarding to be limited to the host part only and hook into
match_urls(), allowing the users of the new feature to still take
advantage of the existing support of "http://m...@example.com; that
limits the match to the case that the connection is authenticated
for a user, for example, by newly allowing "http://me@*.example.com;
or something like that.

Because you cannot have a literal '*' in your hostname, I would
imagine that supporting a match pattern "http://me@*.example.com;
would be already backward compatible without requiring a leading
question-mark.

I also personally would prefer these textual matching to be done
with glob not with regexp, by the way, as the above description of
mine shows.

Thanks.

Re: [RFC] Case insensitive Git attributes

2017-01-23 Thread Junio C Hamano

Junio C Hamano  writes:

> So you are worried about the case where somebody on a case
> insensitive but case preserving system would do
>
> $ edit file.txt
> $ edit .gitattributes
> $ git add file.txt .gitattributes
>
> and adds "*.TXT   someattr=true" to the attributes file, which
> would set someattr to true on his system for file.txt, but when the
> result is checked out on a case sensitive system, it would behave
> differently because "*.TXT" does not match "file.txt"?
>
> How do other systems address it?  Your Java, Ruby, etc. sources may
> refer to another file with "import" and the derivation of the file
> names from class names or package names would have the same issue,
> isn't it?  Do they have an option that lets you say
>
> Even though the import statements may say "import a.b.C", we
> know that the source tarball was prepared on a case insensitive
> system, and I want you to look for a/b/C.java and a/b/c.java and
> use what was found.
>
> or something like that?  Same for anything that records other
> filenames in the content to refer to them, like "Makefile".
>
> My knee jerk reaction to that is that .gitattributes and .gitignore
> should not be instructed to go case insensitive on case sensitive
> systems.  If the system is case insensitive but case preserving,
> it probably would make sense not to do case insensitive matching,
> which would prevent the issue from happening in the first place.

Sorry, but there is a slight leap in the above that makes it hard to
track my thought, so let me clarify a bit.  

In the above, I am guessing the answer to the "How do other systems
address it?" question to be "nothing".  And that leads to the
conclusion that it is better to do "nothing on case sensitive
systems, and probably become evem more strict on case insensitive
but case preserving systems", because that will give us a chance to
expose the problem earlier, hopefully even on the originating
system.

Re: [RFC] Case insensitive Git attributes

2017-01-23 Thread Junio C Hamano

Lars Schneider  writes:

> Problem:
> Git attributes for path names are generally case sensitive. However, on 
> a case insensitive file system (e.g. macOS/Windows) they appear to be
> case insensitive (`*.bar` would match `foo.bar` and `foo.BAR`). That 
> works great until a Git users joins the party with a case sensitive file 
> system. For this Git user the attributes pattern only matches files with
> the exact case (*.bar` would match only `foo.bar`).
>
> Question/Proposal:
> Could we introduce some flag to signal that certain attribute patterns
> are always case insensitive? 
>
> Thread:
> http://public-inbox.org/git/c83be22d-eac8-49e2-aee3-22d4a99ae...@gmail.com/#t

Thanks for a pointer.

So you are worried about the case where somebody on a case
insensitive but case preserving system would do

$ edit file.txt
$ edit .gitattributes
$ git add file.txt .gitattributes

and adds "*.TXT someattr=true" to the attributes file, which
would set someattr to true on his system for file.txt, but when the
result is checked out on a case sensitive system, it would behave
differently because "*.TXT" does not match "file.txt"?

How do other systems address it?  Your Java, Ruby, etc. sources may
refer to another file with "import" and the derivation of the file
names from class names or package names would have the same issue,
isn't it?  Do they have an option that lets you say

Even though the import statements may say "import a.b.C", we
know that the source tarball was prepared on a case insensitive
system, and I want you to look for a/b/C.java and a/b/c.java and
use what was found.

or something like that?  Same for anything that records other
filenames in the content to refer to them, like "Makefile".

My knee jerk reaction to that is that .gitattributes and .gitignore
should not be instructed to go case insensitive on case sensitive
systems.  If the system is case insensitive but case preserving,
it probably would make sense not to do case insensitive matching,
which would prevent the issue from happening in the first place.

Re: [DEMO][PATCH v2 6/5] compat: add a qsort_s() implementation based on GNU's qsort_r(1)

2017-01-23 Thread Junio C Hamano

René Scharfe  writes:

> Implement qsort_s() as a wrapper to the GNU version of qsort_r(1) and
> use it on Linux.  Performance increases slightly:
>
> Test HEAD^ HEAD
> 
> 0071.2: sort(1)  0.10(0.20+0.02)   0.10(0.21+0.01) +0.0%
> 0071.3: string_list_sort()   0.17(0.15+0.01)   0.16(0.15+0.01) -5.9%
>
> Additionally the unstripped size of compat/qsort_s.o falls from 24576
> to 16544 bytes in my build.
>
> IMHO these savings aren't worth the increased complexity of having to
> support two implementations.

I do worry about having to support more implementations in the
future that have different function signature for the comparison
callbacks, which will make things ugly, but this addition alone
doesn't look too bad to me.

Thanks.

> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
> index 52d1f0a73d..763ee1faae 100644
> --- a/compat/qsort_s.c
> +++ b/compat/qsort_s.c
> @@ -1,5 +1,21 @@
>  #include "../git-compat-util.h"
>  
> +#if defined HAVE_GNU_QSORT_R
> +
> +int git_qsort_s(void *b, size_t n, size_t s,
> + int (*cmp)(const void *, const void *, void *), void *ctx)
> +{
> + if (!n)
> + return 0;
> + if (!b || !cmp)
> + return -1;
> +
> + qsort_r(b, n, s, cmp, ctx);
> + return 0;
> +}
> +
> +#else
> +
>  /*
>   * A merge sort implementation, simplified from the qsort implementation
>   * by Mike Haertel, which is a part of the GNU C Library.
> @@ -67,3 +83,5 @@ int git_qsort_s(void *b, size_t n, size_t s,
>   }
>   return 0;
>  }
> +
> +#endif
> diff --git a/config.mak.uname b/config.mak.uname
> index 447f36ac2e..a1858f54ff 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -37,6 +37,7 @@ ifeq ($(uname_S),Linux)
>   NEEDS_LIBRT = YesPlease
>   HAVE_GETDELIM = YesPlease
>   SANE_TEXT_GREP=-a
> + HAVE_GNU_QSORT_R = YesPlease
>  endif
>  ifeq ($(uname_S),GNU/kFreeBSD)
>   HAVE_ALLOCA_H = YesPlease

Re: [PATCH v2 0/7] Macros for Asciidoctor support

2017-01-23 Thread Junio C Hamano

"brian m. carlson"  writes:

> There are two major processors of AsciiDoc: AsciiDoc itself, and Asciidoctor.
> Both have advantages and disadvantages, but traditionally the documentation 
> has
> been built with AsciiDoc, leading to some surprising breakage when building 
> with
> Asciidoctor.  Partially, this is due to the need to specify a significant 
> number
> of macros on the command line when building with Asciidoctor.
>
> This series cleans up some issues building the documentation with Asciidoctor
> and provides two knobs, USE_ASCIIDOCTOR, which controls building with
> Asciidoctor, and ASCIIDOCTOR_EXTENSIONS_LAB, which controls the location of 
> the
> Asciidoctor Extensions Lab, which is necessary to expand the linkgit macro.
>
> The need for the extensions could be replaced with a small amount of Ruby 
> code,
> if that's considered desirable.  Previous opinions on doing so were negative,
> however.
>
> In the process, I found several issues with cat-texi.perl, which have been
> fixed.  It has also been modernized to use strict, warnings, and lexical file
> handles.  I also made an attempt to produce more diffable texi files; I may
> follow up with additional series along this line to make the documentation 
> build
> reproducibly.

Thanks.  We'd probably want INSTALL to talk about Asciidoctor once
this matures, as it is very simple requirement for the builder to
have to just set USE_ASCIIDOCTOR, but the version requirement and
stuff might be still confusing.

Re: [PATCH v3 14/21] read-cache: touch shared index files when used

2017-01-23 Thread Junio C Hamano

Christian Couder  writes:

> Also in general the split-index mode is useful when you often write
> new indexes, and in this case shared index files that are used will
> often be freshened, so the risk of deleting interesting shared index
> files should be low.
> ...
>> Not that I think freshening would actually fail in a repository
>> where you can actually write into to update the index or its refs to
>> make a difference (iow, even if we make it die() loudly when shared
>> index cannot be "touched" because we are paranoid, no real life
>> usage will trigger that die(), and if a repository does trigger the
>> die(), I think you would really want to know about it).
>
> As I wrote above, I think if we can actually write the shared index
> file but its freshening fails, it probably means that the shared index
> file has been removed behind us, and this case is equivalent as when
> loose files have been removed behind us.

OK, so it is unlikely to happen, and when it happens it leads to a
catastrophic failure---do we just ignore or do we report an error?

Re: [PATCH 3/3] stash: support filename argument

2017-01-23 Thread Junio C Hamano

Thomas Gummerer  writes:

> diff --git a/git-stash.sh b/git-stash.sh
> index d6b4ae3290..7dcce629bd 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -41,7 +41,7 @@ no_changes () {
>  untracked_files () {
>   excl_opt=--exclude-standard
>   test "$untracked" = "all" && excl_opt=
> - git ls-files -o -z $excl_opt
> + git ls-files -o -z $excl_opt -- $1

Does $1 need to be quoted to prevent it from split at $IFS?

> @@ -56,6 +56,23 @@ clear_stash () {
>  }
>  
>  create_stash () {
> + files=
> + while test $# != 0
> + do
> + case "$1" in
> + --)
> + shift
> + break
> + ;;
> + --files)
> + ;;
> + *)
> + files="$1 $files"
> + ;;

Hmph.  What is this "no-op" option about?  Did you mean to say
something like this instead?

case "$1" in
...
--file)
case $# in
1)
die "--file needs a pathspec" ;;
*)
shift
files="$files$1 " ;;
esac
;;

Another thing I noticed.  We won't support filenames with embedded
$IFS characters at all?

I somehow had an impression that the script was carefully done
(e.g. by using -z option where appropriate) to add such a
limitation.

Perhaps we have broken it over time and it no longer matters
(i.e. there already may be existing breakages), but this troubles
me somehow.

By the way, in addition to "push" thing that corrects the argument
convention by requiring "-m" before the message, we need to correct
create_stash that is used internally from "stash push" somehow?

Re: [RFC] Case insensitive Git attributes

2017-01-23 Thread Lars Schneider

> On 23 Jan 2017, at 19:35, Junio C Hamano  wrote:
> 
> Dakota Hawkins  writes:
> 
>> Apologies for the delayed bump. I think because we're talking about
>> affecting the behavior of .gitattributes that it would be better to
>> have a distinct .gitattributes option, whether or not you also have a
>> similar config option.
> 
> As I know I am on the To: line of the message I am responding to,
> let me quicly let you know that I won't be responding to this thread
> for a while as I don't recall what the discussion was about.  I will
> after I'll dig and find out what the thread was about but it won't
> happen immediately.  Sorry about that.

Problem:
Git attributes for path names are generally case sensitive. However, on 
a case insensitive file system (e.g. macOS/Windows) they appear to be
case insensitive (`*.bar` would match `foo.bar` and `foo.BAR`). That 
works great until a Git users joins the party with a case sensitive file 
system. For this Git user the attributes pattern only matches files with
the exact case (*.bar` would match only `foo.bar`).

Question/Proposal:
Could we introduce some flag to signal that certain attribute patterns
are always case insensitive? 

Thread:
http://public-inbox.org/git/c83be22d-eac8-49e2-aee3-22d4a99ae...@gmail.com/#t

Cheers,
Lars

Re: [PATCH v2] travis-ci: fix Perforce install on macOS

2017-01-23 Thread Junio C Hamano

Lars Schneider  writes:

> The `perforce` and `perforce-server` package were moved from brew [1][2]
> to cask [3]. Teach TravisCI the new location.
>
> Perforce updates their binaries without version bumps. That made the
> brew install (legitimately!) fail due to checksum mismatches. The
> workaround is not necessary anymore as Cask [4] allows to disable the
> checksum test for individual formulas.
>
> [1] 
> https://github.com/Homebrew/homebrew-binary/commit/1394e42de04d07445f82f9512627e864ff4ca4c6
> [2] 
> https://github.com/Homebrew/homebrew-binary/commit/f8da22d6b8dbcfcfdb2dfa9ac1a5e5d8e05aac2b
> [3] https://github.com/caskroom/homebrew-cask/pull/29180
> [4] https://caskroom.github.io/
>
> Signed-off-by: Lars Schneider 
> ---
>
> Hi,
>
> this small update removes one more unnecessary line and makes the
> formula name lower case (no functional reason - just looks better ;-).
>
> @Junio: Do you prefer such a v2 as "--in-reply-to" to v1 or as separate
> thread? What eases your workflow?

It does not make that much of a difference if the second one comes
within a few days since the first one but in general it probably
would help if the follow-up is threaded _and_ made it clear upfront
that if somebody is reading v2 then v1 can be ignored ;-).

Thanks.

Re: [PATCH] blame: add option to print tips (--tips)

2017-01-23 Thread Junio C Hamano

Pranit Bauva  writes:

> We can probably make it useful with some extended efforts. I use
> git-blame and I sometimes find that I don't need things like the name
> of the author, time, timezone and not even the file name and I have to
> use a bigger terminal. If we could somehow remove those fields then
> maybe this would be a useful feature.

I admit that I didn't recall the option until somebody else told me,
but I think "blame -s" or something like that for that purpose ;-)

Re: [PATCH v1] travis-ci: fix Perforce install on macOS

2017-01-23 Thread Lars Schneider


> On 23 Jan 2017, at 19:22, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> Could you fast track the patch to `maint` if it works without trouble on
>> `next` (as it should!)?
>> 
>> Notes:
>>Base Commit: 787f75f056 (master)
> 
> I do not think there is any difference between 'maint' and 'master'
> for this file right now, but I still would have appreciated if this
> line also said 'maint', not 'master', if you want it to go to
> 'maint' eventually ;-) As https://travis-ci.org/git/git/builds seems
> to be doing 'pu', too, hopefully we'll catch any issues there first.

You're right! My own automation betrayed me :-)

I made a tiny change, though. Can you queue v2?
http://public-inbox.org/git/2017015550.28422-1-larsxschnei...@gmail.com/

Thanks,
Lars

Re: [RFC] Case insensitive Git attributes

2017-01-23 Thread Junio C Hamano

Dakota Hawkins  writes:

> Apologies for the delayed bump. I think because we're talking about
> affecting the behavior of .gitattributes that it would be better to
> have a distinct .gitattributes option, whether or not you also have a
> similar config option.

As I know I am on the To: line of the message I am responding to,
let me quicly let you know that I won't be responding to this thread
for a while as I don't recall what the discussion was about.  I will
after I'll dig and find out what the thread was about but it won't
happen immediately.  Sorry about that.

Re: [PATCH 2/3] stash: introduce push verb

2017-01-23 Thread Junio C Hamano

Thomas Gummerer  writes:

> + stash_msg="$*"
> +
> + if test -z stash_msg

A dollar-sign is missing here, I think.

> + then
> + push_stash $push_options
> + else
> + push_stash $push_options -m "$stash_msg"
> + fi
> +}

Re: [PATCH v1] travis-ci: fix Perforce install on macOS

2017-01-23 Thread Junio C Hamano

larsxschnei...@gmail.com writes:

> Could you fast track the patch to `maint` if it works without trouble on
> `next` (as it should!)?
>
> Notes:
> Base Commit: 787f75f056 (master)

I do not think there is any difference between 'maint' and 'master'
for this file right now, but I still would have appreciated if this
line also said 'maint', not 'master', if you want it to go to
'maint' eventually ;-) As https://travis-ci.org/git/git/builds seems
to be doing 'pu', too, hopefully we'll catch any issues there first.

Thanks.

> Diff on Web: https://github.com/larsxschneider/git/commit/ec7106339d
> Checkout:git fetch https://github.com/larsxschneider/git 
> travisci/brew-perforce-fix-v1 && git checkout ec7106339d
>
>  .travis.yml | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index 3843967a69..c6ba8c8ec5 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -75,20 +75,13 @@ before_install:
>popd
>;;
>  osx)
> -  brew_force_set_latest_binary_hash () {
> -FORMULA=$1
> -SHA=$(brew fetch --force $FORMULA 2>&1 | grep ^SHA256: | cut -d ' ' 
> -f 2)
> -sed -E -i.bak "s/sha256 \"[0-9a-f]{64}\"/sha256 \"$SHA\"/g" \
> -  "$(brew --repository homebrew/homebrew-binary)/$FORMULA.rb"
> -  }
>brew update --quiet
>brew tap homebrew/binary --quiet
> -  brew_force_set_latest_binary_hash perforce
> -  brew_force_set_latest_binary_hash perforce-server
># Uncomment this if you want to run perf tests:
># brew install gnu-time
> -  brew install git-lfs perforce-server perforce gettext
> +  brew install git-lfs gettext
>brew link --force gettext
> +  brew install Caskroom/cask/perforce
>;;
>  esac;
>  echo "$(tput setaf 6)Perforce Server Version$(tput sgr0)";
> --
> 2.11.0

Re: [RFC PATCH] Option to allow cherry-pick to skip empty commits

2017-01-23 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> By the way, I noticed going over the code that the -allow options are
> not stored, so that in case of interruption they will be reset, is
> this intentional or a bug?

I do not know offhand, but given the history of the two commands, I
would guess it was a bug simply overlooked when people bolted "a
series of commits" mode onto these commands that originally worked
only on a single commit.

Re: [PATCH v3 14/21] read-cache: touch shared index files when used

2017-01-23 Thread Christian Couder

On Thu, Jan 19, 2017 at 8:00 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Mon, Jan 9, 2017 at 9:34 PM, Junio C Hamano  wrote:
>>> Duy Nguyen  writes:
>>>
 On Sun, Jan 8, 2017 at 4:46 AM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> So what should we do if freshen_file() returns 0 which means that the
>> freshening failed?
>
> You tell me ;-)  as you are the one who is proposing this feature.

Before the above question I already had given my opinion about what we
should do.

There are the following cases:

- Freshening failed because the shared index file does not exist
anymore. In this case it could have been removed for a good reason
(for example maybe the user wants to remove all the shared index
files), or it could be a bug somewhere else. Anyway we cannot know and
the user will get an error if the shared index file that disappeared
is read from afterwards, so I don't think we need to warn or do
anything.

- Freshening failed because the mtime of the shared index cannot be
changed. You already in a previous email wrote that we shoudn't warn
if the file system is read only, and I agree with that, as anyway if
the file system is read only, the shared index file cannot be deleted,
so there is no risk from the current user. Also the split index mode
is useful to speed up index writing at the cost of making index
reading a little slower, so its use in a read only mode should not be
the primary way it is used.

So my opinion is that there are good reasons not to do anything if
freshening fails.

 My answer is, we are not worse than freshening loose objects case
 (especially since I took the idea from there).
>>>
>>> I do not think so, unfortunately.  Loose object files with stale
>>> timestamps are not removed as long as objects are still reachable.

As the current index is read, which freshens its shared index file,
before a new index is created, the number of shared index files
doesn't go below 2. This can be seen in the tests changed in patch
19/21. So the risk of deleting interesting shared index files is quite
low in my opinion.

Also in general the split-index mode is useful when you often write
new indexes, and in this case shared index files that are used will
often be freshened, so the risk of deleting interesting shared index
files should be low.

>> But there are plenty of unreachable loose objects, added in index,
>> then got replaced with new versions. cache-tree can create loose trees
>> too and it's been run more often, behind user's back, to take
>> advantage of the shortcut in unpack-trees.
>
> I am not sure if I follow.  Aren't objects reachable from the
> cache-tree in the index protected from gc?
>
> Not that I think freshening would actually fail in a repository
> where you can actually write into to update the index or its refs to
> make a difference (iow, even if we make it die() loudly when shared
> index cannot be "touched" because we are paranoid, no real life
> usage will trigger that die(), and if a repository does trigger the
> die(), I think you would really want to know about it).

As I wrote above, I think if we can actually write the shared index
file but its freshening fails, it probably means that the shared index
file has been removed behind us, and this case is equivalent as when
loose files have been removed behind us.

Re: [PATCH] rebase: pass --signoff option to git am

2017-01-23 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> Signed-off-by: Giuseppe Bilotta 
> ---
>  Documentation/git-rebase.txt | 5 +
>  git-rebase.sh| 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)

Should we plan to extend this to the interactive backend that is
shared between rebase -i and rebase -m, too?  Or is this patch
already sufficient to cover them?

> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 67d48e6883..e6f0b93337 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -385,6 +385,11 @@ have the long commit hash prepended to the format.
>   Recreate merge commits instead of flattening the history by replaying
>   commits a merge commit introduces. Merge conflict resolutions or manual
>   amendments to merge commits are not preserved.
> +
> +--signoff::
> + This flag is passed to 'git am' to sign off all the rebased
> + commits (see linkgit:git-am[1]).
> +
>  +
>  This uses the `--interactive` machinery internally, but combining it
>  with the `--interactive` option explicitly is generally not a good
> diff --git a/git-rebase.sh b/git-rebase.sh
> index 48d7c5ded4..e468a061f9 100755
> --- a/git-rebase.sh
> +++ b/git-rebase.sh
> @@ -34,6 +34,7 @@
>  autosquash move commits that begin with squash!/fixup! under -i
>  committer-date-is-author-date! passed to 'git am'
>  ignore-date!   passed to 'git am'
> +signoff!   passed to 'git am'
>  whitespace=!   passed to 'git apply'
>  ignore-whitespace! passed to 'git apply'
>  C=!passed to 'git apply'
> @@ -321,7 +322,7 @@ run_pre_rebase_hook ()
>   --ignore-whitespace)
>   git_am_opt="$git_am_opt $1"
>   ;;
> - --committer-date-is-author-date|--ignore-date)
> + --committer-date-is-author-date|--ignore-date|--signoff)
>   git_am_opt="$git_am_opt $1"
>   force_rebase=t
>   ;;

Re: [PATCH 25/27] attr: store attribute stacks in hashmap

2017-01-23 Thread Brandon Williams

On 01/18, Brandon Williams wrote:
> On 01/13, Junio C Hamano wrote:
> > Brandon Williams  writes:
> > 
> > > The last big hurdle towards a thread-safe API for the attribute system
> > > is the reliance on a global attribute stack that is modified during each
> > > call into the attribute system.
> > >
> > > This patch removes this global stack and instead a stack is retrieved or
> > > constructed locally.  Since each of these stacks is only used as a
> > > read-only structure once constructed, they can be stored in a hashmap
> > > and shared between threads.
> > 
> > Very good.
> > 
> > The reason why the original code used a stack was because it wanted
> > to keep only the info read from releavant files in-core, discarding
> > ones from files no-longer relevant (because the traversal switched
> > to another subdirectory of the same parent directory), to avoid the
> > memory consumption grow unbounded.  It probably was a premature
> > "optimization" that we can do without, so keeping everything we have
> > read so far in a hashmap (which is my understanding of what is going
> > on in this patch) is probably OK.
> > 
> > I suspect that this hashmap may eventually need to become per
> > attr_check if we want to follow through the optimization envisioned
> > by patch 15/27.
> > 
> > Inside fill(), path_matches() is called for the number of match_attr
> > in the entire attribute stack but it is wasteful to check if the
> > path matches with the a.u.pat if none of the a.state[] entries talk
> > about attributes and macros that are eventually get used by the
> > caller of check_attr().  By introducing a wrapping structure, 15/27
> > wanted to make sure that we have a place to store a "reduced"
> > attribute stack that is kept per attr_check that has only entries
> > from the files that talk about the attributes the particular
> > attr_check wants to learn about.
> > 
> > I need to think about this a bit more, but I do not offhand think
> > that it makes future such enhancement to make it per-check harder to
> > move from a global stack to a global hashmap, i.e. the above is not
> > an objection to this step.
> 
> If we want to continue through and do the optimization you originally
> envisioned then I may need to rethink this patch.  One thing we did talk
> about offline was doing another check prior to the path_match() function
> call which looks through the list of state structs to see if one of
> those states would actually have an affect on the array being used to
> collect attributes.  Though that may be an optimization which can be
> done in addition to creating a reduced stack.
> 
> The one difficulty (which you pointed out in comment form) is if we have
> a reduced attribute stack that is stored per attr_check then handling
> the cleanup when the direction is changed may be messy.

After thinking about this some more I'm going to redo this patch in the
series and instead of storing all of the frames in a shared hashmap,
we'll have an attribute stack stored per attr_check instance like you
originally envisioned.  I think that having a hashmap of all the stack
frames may make it more difficult to do optimizations in the future.  At
least this way (simply pushing the stack into the attr_check) makes it
more straight forward to do optimizations and doesn't have the potential
for memory to grow unbounded.

I'll try to get out a v2 later today.

-- 
Brandon Williams

[PATCH v3 5/5] show-ref: Remove dead `if (verify)' check

2017-01-23 Thread Vladimir Panteleev

As show_ref is only ever called on the path where --verify is not
specified, `verify' can never possibly be true here.

Signed-off-by: Vladimir Panteleev 
---
 builtin/show-ref.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 107d05fe0..2dfcb5634 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -73,9 +73,6 @@ static int show_ref(const char *refname, const struct 
object_id *oid,
continue;
if (len == reflen)
goto match;
-   /* "--verify" requires an exact match */
-   if (verify)
-   continue;
if (refname[reflen - len - 1] == '/')
goto match;
}
-- 
2.11.0

[PATCH v3 2/5] show-ref: Allow -d to work with --verify

2017-01-23 Thread Vladimir Panteleev

Move handling of -d into show_one, so that it takes effect when
--verify is present as well as when it is absent. This is useful when
the user wishes to avoid the costly iteration of refs.

Signed-off-by: Vladimir Panteleev 
---
 builtin/show-ref.c  | 23 ---
 t/t1403-show-ref.sh |  9 +
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 0e53e3da4..a72a626b1 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -19,19 +19,27 @@ static const char *exclude_existing_arg;
 
 static void show_one(const char *refname, const struct object_id *oid)
 {
-   const char *hex = find_unique_abbrev(oid->hash, abbrev);
+   const char *hex;
+   struct object_id peeled;
+
+   hex = find_unique_abbrev(oid->hash, abbrev);
if (hash_only)
printf("%s\n", hex);
else
printf("%s %s\n", hex, refname);
+
+   if (!deref_tags)
+   return;
+
+   if (!peel_ref(refname, peeled.hash)) {
+   hex = find_unique_abbrev(peeled.hash, abbrev);
+   printf("%s %s^{}\n", hex, refname);
+   }
 }
 
 static int show_ref(const char *refname, const struct object_id *oid,
int flag, void *cbdata)
 {
-   const char *hex;
-   struct object_id peeled;
-
if (show_head && !strcmp(refname, "HEAD"))
goto match;
 
@@ -79,13 +87,6 @@ static int show_ref(const char *refname, const struct 
object_id *oid,
 
show_one(refname, oid);
 
-   if (!deref_tags)
-   return 0;
-
-   if (!peel_ref(refname, peeled.hash)) {
-   hex = find_unique_abbrev(peeled.hash, abbrev);
-   printf("%s %s^{}\n", hex, refname);
-   }
return 0;
 }
 
diff --git a/t/t1403-show-ref.sh b/t/t1403-show-ref.sh
index 5932beada..c6872bd96 100755
--- a/t/t1403-show-ref.sh
+++ b/t/t1403-show-ref.sh
@@ -97,6 +97,9 @@ test_expect_success 'show-ref -d' '
git show-ref -d refs/tags/A refs/tags/C >actual &&
test_cmp expect actual &&
 
+   git show-ref --verify -d refs/tags/A refs/tags/C >actual &&
+   test_cmp expect actual &&
+
echo $(git rev-parse refs/heads/master) refs/heads/master >expect &&
git show-ref -d master >actual &&
test_cmp expect actual &&
@@ -116,6 +119,12 @@ test_expect_success 'show-ref -d' '
test_cmp expect actual &&
 
test_must_fail git show-ref -d --verify heads/master >actual &&
+   test_cmp expect actual &&
+
+   test_must_fail git show-ref --verify -d A C >actual &&
+   test_cmp expect actual &&
+
+   test_must_fail git show-ref --verify -d tags/A tags/C >actual &&
test_cmp expect actual
 
 '
-- 
2.11.0

[PATCH v3 3/5] show-ref: Move --quiet handling into show_one

2017-01-23 Thread Vladimir Panteleev

Do the same with --quiet as was done with -d, to remove the need to
perform this check at show_one's call site from the --verify branch.

Signed-off-by: Vladimir Panteleev 
---
 builtin/show-ref.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index a72a626b1..ab8e0dc41 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -22,6 +22,9 @@ static void show_one(const char *refname, const struct 
object_id *oid)
const char *hex;
struct object_id peeled;
 
+   if (quiet)
+   return;
+
hex = find_unique_abbrev(oid->hash, abbrev);
if (hash_only)
printf("%s\n", hex);
@@ -82,9 +85,6 @@ static int show_ref(const char *refname, const struct 
object_id *oid,
die("git show-ref: bad ref %s (%s)", refname,
oid_to_hex(oid));
 
-   if (quiet)
-   return 0;
-
show_one(refname, oid);
 
return 0;
@@ -205,8 +205,7 @@ int cmd_show_ref(int argc, const char **argv, const char 
*prefix)
 
if ((starts_with(*pattern, "refs/") || 
!strcmp(*pattern, "HEAD")) &&
!read_ref(*pattern, oid.hash)) {
-   if (!quiet)
-   show_one(*pattern, );
+   show_one(*pattern, );
}
else if (!quiet)
die("'%s' - not a valid ref", *pattern);
-- 
2.11.0

[PATCH v3 1/5] show-ref: Accept HEAD with --verify

2017-01-23 Thread Vladimir Panteleev

Previously, when --verify was specified, show-ref would use a separate
code path which did not handle HEAD and treated it as an invalid
ref. Thus, "git show-ref --verify HEAD" (where "--verify" is used
because the user is not interested in seeing refs/remotes/origin/HEAD)
did not work as expected.

Instead of insisting that the input begins with "refs/", allow "HEAD"
as well in the codepath that handles "--verify", so that all valid
full refnames including HEAD are passed to the same output machinery.

Signed-off-by: Vladimir Panteleev 
---
 builtin/show-ref.c  |  2 +-
 t/t1403-show-ref.sh | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index 6d4e66900..0e53e3da4 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -202,7 +202,7 @@ int cmd_show_ref(int argc, const char **argv, const char 
*prefix)
while (*pattern) {
struct object_id oid;
 
-   if (starts_with(*pattern, "refs/") &&
+   if ((starts_with(*pattern, "refs/") || 
!strcmp(*pattern, "HEAD")) &&
!read_ref(*pattern, oid.hash)) {
if (!quiet)
show_one(*pattern, );
diff --git a/t/t1403-show-ref.sh b/t/t1403-show-ref.sh
index 7e10bcfe3..5932beada 100755
--- a/t/t1403-show-ref.sh
+++ b/t/t1403-show-ref.sh
@@ -164,4 +164,15 @@ test_expect_success 'show-ref --heads, --tags, --head, 
pattern' '
test_cmp expect actual
 '
 
+test_expect_success 'show-ref --verify HEAD' '
+   echo $(git rev-parse HEAD) HEAD >expect &&
+   git show-ref --verify HEAD >actual &&
+   test_cmp expect actual &&
+
+   >expect &&
+
+   git show-ref --verify -q HEAD >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.11.0

[PATCH v3 4/5] show-ref: Detect dangling refs under --verify as well

2017-01-23 Thread Vladimir Panteleev

Move detection of dangling refs into show_one, so that they are
detected when --verify is present as well as when it is absent.

Signed-off-by: Vladimir Panteleev 
---
 builtin/show-ref.c  | 16 
 t/t1403-show-ref.sh | 22 ++
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/builtin/show-ref.c b/builtin/show-ref.c
index ab8e0dc41..107d05fe0 100644
--- a/builtin/show-ref.c
+++ b/builtin/show-ref.c
@@ -22,6 +22,14 @@ static void show_one(const char *refname, const struct 
object_id *oid)
const char *hex;
struct object_id peeled;
 
+   /* This changes the semantics slightly that even under quiet we
+* detect and return error if the repository is corrupt and
+* ref points at a nonexistent object.
+*/
+   if (!has_sha1_file(oid->hash))
+   die("git show-ref: bad ref %s (%s)", refname,
+   oid_to_hex(oid));
+
if (quiet)
return;
 
@@ -77,14 +85,6 @@ static int show_ref(const char *refname, const struct 
object_id *oid,
 match:
found_match++;
 
-   /* This changes the semantics slightly that even under quiet we
-* detect and return error if the repository is corrupt and
-* ref points at a nonexistent object.
-*/
-   if (!has_sha1_file(oid->hash))
-   die("git show-ref: bad ref %s (%s)", refname,
-   oid_to_hex(oid));
-
show_one(refname, oid);
 
return 0;
diff --git a/t/t1403-show-ref.sh b/t/t1403-show-ref.sh
index c6872bd96..30354fd26 100755
--- a/t/t1403-show-ref.sh
+++ b/t/t1403-show-ref.sh
@@ -184,4 +184,26 @@ test_expect_success 'show-ref --verify HEAD' '
test_cmp expect actual
 '
 
+test_expect_success 'show-ref --verify with dangling ref' '
+   sha1_file() {
+   echo "$*" | sed "s#..#.git/objects/&/#"
+   } &&
+
+   remove_object() {
+   file=$(sha1_file "$*") &&
+   test -e "$file" &&
+   rm -f "$file"
+   } &&
+
+   test_when_finished "rm -rf dangling" &&
+   (
+   git init dangling &&
+   cd dangling &&
+   test_commit dangling &&
+   sha=$(git rev-parse refs/tags/dangling) &&
+   remove_object $sha &&
+   test_must_fail git show-ref --verify refs/tags/dangling
+   )
+'
+
 test_done
-- 
2.11.0

[PATCH v3 0/5] show-ref: Allow -d, --head to work with --verify

2017-01-23 Thread Vladimir Panteleev

Third iteration, according to Junio's comments. This time we keep
show_ref and show_one separate, accept HEAD with --verify even without
--head, and add tests for dangling ref validation with --verify.

Re: [PATCH] blame: add option to print tips (--tips)

2017-01-23 Thread Pranit Bauva

Hey Junio,

On Mon, Jan 23, 2017 at 5:05 AM, Junio C Hamano  wrote:
> That is too early to tell.  At this point we only know there are me
> who won't use it and you who will, among all the other people in the
> world.

We can probably make it useful with some extended efforts. I use
git-blame and I sometimes find that I don't need things like the name
of the author, time, timezone and not even the file name and I have to
use a bigger terminal. If we could somehow remove those fields then
maybe this would be a useful feature.

Idea: Make git-blame understand `format`

git-log has a format option in which we can configure what all things
we need and what we don't. We could probably do the same here also.
After carefully using the format specification with git-log each
person can get a good look at however he seems to view. But I am not
sure whether this is worth the effort. I personally find this `format`
feature useful.

Regards,
Pranit Bauva

Re: [PATCH v3 08/21] Documentation/git-update-index: talk about core.splitIndex config var

2017-01-23 Thread Christian Couder

On Mon, Jan 9, 2017 at 12:18 PM, Duy Nguyen  wrote:
> On Sun, Jan 8, 2017 at 4:38 AM, Junio C Hamano  wrote:
>> Christian Couder  writes:
>>
>>> It feels strange that when I do things one way, you suggest another
>>> way, and the next time in a similar situation when I do things the way
>>> you suggested previously, then you suggest the way I did it initially
>>> the first time...
>>
>> Perhaps because neither is quite satisfactory and I am being forced
>> to choose between the two unsatifactory designs?  In any case, I
>> mostly am and was pointing out the issues and not saying the other
>> one is the most preferred solution in these threads.
>>
>> I think there should just be one authoritative source of the truth,
>
> Either that, or we make sure all sources of truth are consistent. In
> this case, 'update --split-index' could update core.splitIndex if it
> finds that the config tells a different story. As a plumbing though, I
> rather leave update-index do simple things, even if it means the user
> has to clean up after it (or before it) with "git config -unset
> core.splitIndex". Another option is refuse to execute --split-index in
> the presence of (conflicting) core.splitIndex. We kind of force the
> user to keep all sources of truth consistent this way while leaving a
> back door ("git -c core.splitIndex= update-index") for those who need
> tools to recover from a bad case.
>
>> and I have always thought it should be the bit set in the index file
>> when the command line option is used, because that was the way the
>> feature was introduced first and I am superstitious about end-user
>> inertia.  And from that point of view, no matter how you make this
>> new "config" thing interact with it, it would always give a strange
>> and unsatifactory end-user experience, at least to me.
>>
>> Perhaps we should declare that config will be the one and only way
>> in the future and start deprecating the command line option way.
>> That will remove the need for two to interact with each other.

That would be my preferred solution as I think it is the simplest in
the end for users.
Also, as Duy wrote above, one can always use something like "git -c
core.splitIndex= ...", which by the way can work for any command, not
just "update-index".

Anyway we would have to introduce core.splitIndex first, and it
wouldn't make sense to have a different behavior between
--[no-]split-index and --[no-]untracked-cache in the meantime before
they are deprecated and eventually removed.

So let's just go with the implementation in this series, which is
similar to --[no-]untracked-cache, and let's plan to deprecate
--[no-]split-index and --[no-]untracked-cache in a later patch series.

Re: sparse checkout : ignore paths

2017-01-23 Thread Johannes Schindelin

Hi,

On Mon, 23 Jan 2017, Tushar Kapila wrote:

> When we clone/ pull with : config core.sparsecheckout true
> 
> We can specify paths to include. Would be good to explicitly specify
> paths to exclude too.

That is already possible by using "negative patterns", i.e. patterns
preceded by an exclamation point.

Ciao,
Johannes

GSoC 2017: application open, deadline = February 9, 2017

2017-01-23 Thread Matthieu Moy

Hi,

The Google Summer of Code 2017 program is launched
(https://summerofcode.withgoogle.com/).

Last year, Pranit Bauva worked on porting "git bisect" from shell to C,
mentored by Christian and Lars (I didn't follow closely, but essentially
many preparatory steps, cleanups and tests were merged in master, and
patches starting the real port are still queued in pu). The org admins
were Peff and I.

The application deadline is February 9, 2017, i.e. a bit more than 2
weeks from now. So, we should decide quickly whether or not to
participate, and if so work on the application.

On my side, I've been struggling to find some time budget to allocate to
Git since last year and I couldn't even keep up with discussions on this
list :-(. Last summer, I thought that being an org co-admin would help,
but it didn't. So, as much as possible, I'd like to avoid being an org
admin this year. It's not a lot of work (much, much less than being a
mentor!), but if I manage to get some time to work for Git, I'd rather
do that on coding and reviewing this year.

So, a bunch of unanswered questions:

* Does the git.git community want to participate in GSoC this year?
  Actually, I have mixed feelings about this: I do like GSoC, but it
  seems we lack reviewer time more than coder time, and GSoC does not
  make it better. OTOH, a GSoC participant is a potential reviewer in
  the long run ...

* Does the libgit2 community want to participate in GSoC? If so, we
  should clarify the application process. Last year, I wrote (too late):

  > It's essentially too late to change this for this year, but next
  > time we should discuss earlier about how we want to organize this
  > git.git/libgit2 thing. For example, I think it would make little sense
  > to have a git.git microproject and then apply for a libgit2 project
  > since we have very different ways of interacting. And honestly, right
  > now the application is really git.git-centric so I don't think it
  > attracts students towards libgit2. So, if you want to attract more
  > students, we should work on that.

If the answer to one of the above question is yes, then:

* Who's willing to mentor? and to admin?

* We need to write the application, i.e. essentially polish and update
  the text here: https://git.github.io/SoC-2016-Org-Application/ and
  update the list of project ideas and microprojects :
  https://git.github.io/SoC-2017-Ideas/
  https://git.github.io/SoC-2016-Microprojects/

Cheers,

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

1 2 >

1 - 100 of 108 matches

Mail list logo