Re: [PATCH 1/5] add SWAP macro

2017-05-01 Thread René Scharfe
Am 30.04.2017 um 05:11 schrieb Jeff King:
> On Sat, Apr 29, 2017 at 08:16:17PM +0200, René Scharfe wrote:
> 
>>> I dunno. I could go either way. Or we could leave it as-is, and let
>>> valgrind find the problem. That has zero run-time cost, but of course
>>> nobody bothers to run valgrind outside of the test suite, so the inputs
>>> are not usually very exotic.
>>
>> It would be  problematic on platforms where memcpy has to erase the
>> destination before writing new values (I don't know any example).
>>
>> We could use two temporary buffers.  The object code is the same with
>> GCC around 5 and Clang 3.2 or higher -- at least for prio-queue.c.
> 
> Hmm, yeah, that's an easy solution that covers the overlap case, too. If
> the generated code is the same, that seems like it might not be bad (I
> am a little sad how complex this simple swap operation is getting,
> though).

Patch below.

All is well if the compiler can (and is allowed to) see through the
memcpy calls, otherwise we get a performance hit and this change makes
that slow path slower.  FWIW, I didn't see any slowdown in our perf
tests, though.

It's not too late to switch to a macro that takes a type parameter, as
Dscho suggested earlier:

  #define swap_t(T, a, b) do {T t_ = (a); (a) = (b); (b) = t_;} while (0)

Much simpler and with full type check, but harder to use (needs correct
type parameter and its other parameters must not have side effects).

-- >8 --
Subject: [PATCH] avoid calling memcpy on overlapping objects in SWAP

Copy both objects into their own buffers before swapping to make sure we
don't call memcpy with overlapping memory ranges, as that's undefined.
Compilers that inline the memcpy calls optimize the extra operations
away.  That allows calls like SWAP(x, x) without ill effect and without
extra checks.

Signed-off-by: Rene Scharfe 
---
 git-compat-util.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index bd04564..a843f04 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -528,13 +528,15 @@ static inline int ends_with(const char *str, const char 
*suffix)
 }
 
 #define SWAP(a, b) do {\
-   void *_swap_a_ptr = &(a);   \
-   void *_swap_b_ptr = &(b);   \
-   unsigned char _swap_buffer[sizeof(a)];  \
-   memcpy(_swap_buffer, _swap_a_ptr, sizeof(a));   \
-   memcpy(_swap_a_ptr, _swap_b_ptr, sizeof(a) +\
-  BUILD_ASSERT_OR_ZERO(sizeof(a) == sizeof(b)));   \
-   memcpy(_swap_b_ptr, _swap_buffer, sizeof(a));   \
+   void *swap_a_ptr_ = &(a);   \
+   void *swap_b_ptr_ = &(b);   \
+   unsigned char swap_a_buffer_[sizeof(a)];\
+   unsigned char swap_b_buffer_[sizeof(a) +\
+   BUILD_ASSERT_OR_ZERO(sizeof(a) == sizeof(b))];  \
+   memcpy(swap_a_buffer_, swap_a_ptr_, sizeof(a)); \
+   memcpy(swap_b_buffer_, swap_b_ptr_, sizeof(a)); \
+   memcpy(swap_a_ptr_, swap_b_buffer_, sizeof(a)); \
+   memcpy(swap_b_ptr_, swap_a_buffer_, sizeof(a)); \
 } while (0)
 
 #if defined(NO_MMAP) || defined(USE_WIN32_MMAP)
-- 
2.1.4


Re: [PATCH 1/5] submodule_move_head: fix leak of struct child_process

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 8:07 PM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> On 05/01, Stefan Beller wrote:
>>> While fixing the leak of `cp`, reuse it instead of having to declare
>>> another struct child_process.
>>>
>>> Signed-off-by: Stefan Beller 
>>
>> This shouldn't be needed as 'finish_command' does the cleanup for you by
>> calling 'child_prcoess_clear()'.
>
> Yes, indeed.
>
> It might not hurt to update the code so that the cp is reused in the
> second run instead of using a separate instance cp1, but that is a
> topic separate from fixing a non-existing leak.

ok, I'll drop the patch and may send another patch later for re-using cp

Thanks,
Stefan


Re: [RFC] [GSoC] Port git-add--interactive.perl:status_cmd to C

2017-05-01 Thread Junio C Hamano
Junio C Hamano  writes:

> "Daniel Ferreira (theiostream)"  writes:
>
>> Reproducing either of these comparisons "natively" would simply
>> require running run_diff_index() or run_diff_files() with
>> DIFF_FORMAT_NUMSTAT and tweaking diff_flush() to format appropriately
>> for the "interactive--add case".
>
> A more usual way to drive the diff machinery and react to its
> findings is to use DIFF_FORMAT_CALLBACK interface, and walk the
> collected diff_queue to work on the paths discovered.  The way
> wt-status.c builds "Changes to be committed" list out of the
> "diff-index --cached" it internally runs and "Changes not staged for
> commit" out of the "diff-files" it internally runs would be a good
> example to study.

Ahh, you'd also want to show the numstat equivalent and eventually
you'd need to spool the actual textual diff in-core, so that you can
present hunks to the user and have them accepted/rejected
interactively. CALLBACK interface is not a good match for that task.

Of course, using a temporary file to buffer output from diff would
be an obvious and simple workaround (and this is not a performance
critical part of the system---we are talking about end user choosing
hunks interactively after all).  A cleaner may be to spawn diff-index
and diff-files and read from them via pipe into core.



Re: [PATCH 2/5] submodule_move_head: prepare env for child correctly

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 7:04 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> We forgot to prepare the submodule env, which is only a problem for
>> nested submodules. See 2e5d6503bd (ls-files: fix recurse-submodules
>> with nested submodules, 2017-04-13) for further explanation.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>
> Sounds good (if only because this makes it similar to other
> codepaths).
>
> Is this something whose breakage before the patch is easily
> demonstrated with a test?

I'll try to come up with a test in a reroll.

Thanks,
Stefan


Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 6:36 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This applies to origin/master.
>>
>> For better readability and understandability for newcomers it is a good idea
>> to not offer 2 APIs doing the same thing with on being the #define of the 
>> other.
>>
>> In the long run we may want to drop the macros guarded by
>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>
> Why?  Why should we keep typing _index, when most of the time we
> are given _the_ index and working on it?

As someone knowledgeable with the code base you know that the cache_*
and index_* functions only differ by an index argument. A newcomer may not
know this, so they wonder why we have (A) so many functions [and which is the
right function to use]; it is an issue of ease of use of the code base.

Anything you do In submodule land today needs to spawn new processes in
the submodule. This is cumbersome and not performant. So in the far future
we may want to have an abstraction of a repo (B), i.e. all repository state in
one struct/class. That way we can open a submodule in-process and perform
the required actions without spawning a process.

The road to (B) is a long road, but we have to art somewhere. And this seemed
like a good place by introducing a dedicated argument for the
repository. In a follow
up in the future we may want to replace _index by "the_main_repo.its_index"
and then could also run the commands on other (submodule) indexes. But more
importantly, all these commands would operate on a repository object.

In such a far future we would have functions like the cmd_* functions
that would take a repository object instead of doing its setup discovery
on their own.

Another reason may be its current velocity (or absence of it) w.r.t. to these
functions, such that fewer merge conflicts may arise.

---
This discussion is similar to the "free memory at the end of cmd_*" discussion,
as it aims to make code reusable, and accepting a minor drawback for it.
Typing "the_index" re-enforces the object thinking model and may have people
start on thinking if they would like to declare yet another global variable.

Thanks,
Stefan


Re: [PATCH v2 09/25] mailinfo & mailsplit: check for EOF while parsing

2017-05-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> While POSIX states that it is okay to pass EOF to isspace() (and it seems
> to be implied that EOF should *not* be treated as whitespace), and also to
> pass EOF to ungetc() (which seems to be intended to fail without buffering
> the character), it is much better to handle these cases explicitly. Not
> only does it reduce head-scratching (and helps static analysis avoid
> reporting false positives), it also lets us handle files containing
> nothing but whitespace by erroring out.
>
> Reported via Coverity.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/mailsplit.c |  3 ++-
>  mailinfo.c  | 15 +++
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 30681681c13..9b3efc8e987 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -233,7 +233,8 @@ static int split_mbox(const char *file, const char *dir, 
> int allow_bare,
>   do {
>   peek = fgetc(f);
>   } while (isspace(peek));
> - ungetc(peek, f);
> + if (peek != EOF)
> + ungetc(peek, f);

I agree more with the first sentence in the proposed log message
than what this code actually does.  I.e. breaking upon seeing an EOF
explicitly would be nice, just like the change to mailinfo.c in this
patch we see below.

> @@ -1094,14 +1097,18 @@ int mailinfo(struct mailinfo *mi, const char *msg, 
> const char *patch)
>   return -1;
>   }
>  
> - mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data)));
> - mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data)));
> -
>   do {
>   peek = fgetc(mi->input);
> + if (peek == EOF) {
> + fclose(cmitmsg);
> + return error("empty patch: '%s'", patch);
> + }
>   } while (isspace(peek));
>   ungetc(peek, mi->input);

The handling of EOF is improved, but is it correct to move the
allocation of p/s_hdr_data down?

Among the two callers, builtin/am.c just dies when it sees
mailinfo() returns an error, but builtin/mailinfo.c tries to be
nicer and calls clear_mailinfo().  Wouldn't this make that codepath
dereference a NULL pointer?

I think the moral of the story is that people tend to get sloppier
when doing "while we are at it" than the main task, and a reviewer
needs to be more careful while reviewing the "while we are at it"
part of the change than the primary thing a patch wants to do ;-)

> + mi->p_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->p_hdr_data)));
> + mi->s_hdr_data = xcalloc(MAX_HDR_PARSED, sizeof(*(mi->s_hdr_data)));
> +
>   /* process the email header */
>   while (read_one_header_line(, mi->input))
>   check_header(mi, , mi->p_hdr_data, 1);


Re: [PATCH 0/5] Some submodule bugfixes and "reattaching detached HEADs"

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 6:35 PM, Junio C Hamano  wrote:
> Brandon Williams  writes:
>
>> I don't know why submodules were originally designed to be in a
>> detached HEAD state but I much prefer working on branches (as I'm sure
>> many other developers do) so the prospect of this becoming the norm is
>> exciting! :D
>
> The reason is because the superproject already records where the
> HEAD in the submodule should be, when any of its commits is checked
> out.  The tip of a branch (which one???)

The one as configured (submodule.NAME.branch

>  in a submodule may match
> that commit, in which case there is nothing gained by having you on
> that branch,

Being on a branch has some advantages, e.g. easier pushing, not
worrying about losing commits due to gc, an easier "name" in a literal sense.


>  or it may not match that commit, in which case it is
> unclear what should happen.

Yes, I anticipate this to be the main point of discussion.

>  Leaving your submodule on the branch
> would mean the state of your tree as a whole does not match what the
> checkout operation in the superproject wanted to create.  Resetting
> the branch would mean you may lose the history of the branch.

or telling the user via die(), that there is a mismatch.
(you may want to run git submodule update --remote to fix the situation)

> Thinking of the detached HEAD state as an implicit unnamed branch
> that matches the state the superproject checkout expects was
> conceptually one of the cleanest choices.

Assuming this is the cleanest design, we may want to change the
message of git-status, then.
Unlike the scary detached HEAD message (incl long hint), we may just
want to say

$ git status
In submodule 'foo'
You're detached exactly as the superprojects wants you to be.
Nothing to worry.



> But all of the above concentrates on what should happen immediately
> after you do a checkout in the superproject, and it would be OK for
> a sight-seer.  Once you want to work in the submodules to build on
> their histories, not being on a branch does become awkward.  For one
> thing, after you are done with the work in your submodule, you would
> want to update the superproject and make a commit that records the
> commit in the submodule, and then you would want to publish both the
> superproject and the submodule because both of them are now
> updated.  The commit in the superproject may be on the branch that
> is currently checked out, but we don't know what branch the commit
> in the submoudule should go.
>
> The submodule..branch configuration may be a good source to
> learn that piece of information,

Glad we agree up to this point.

> but it does not fully solve the
> issue.  It is OK for the tip of that branch to be at or behind the
> commit the superproject records, but it is unclear what should
> happen if the local tip of that branch is ahead of the commit in the
> superproject when checkout happens.

right. It is still unclear to me as well. I'll have to think about the
various modes of operation.

> By the way, how does this topic work with the various checkout modes
> that can be specified with submodule..update?

This series currently does not touch git-submodule, but in principle
we could just run "submodule--helper reattach-HEAD" after any operation
and then see if we can attach a HEAD (likely for "update=checkout",
but in  "merge" we may want to fast-forward the branch, and in "rebase"
we might want to reset (noff) to the tip.

I'll think about this more.
Thanks,
Stefan


[PATCH v3 3/6] rebase -i: add short command-name in --autosquash

2017-05-01 Thread Liam Beguin
teach `git rebase -i` to recognise short command-names when using the
'--autosquash' option. This allows commit with titles beginning with
"s! ..." and "f! ..." to be treated the same way as "squash! ..." and
"fixup! ..." respectively.

Signed-off-by: Liam Beguin 
---
 Documentation/git-rebase.txt | 2 ++
 git-rebase--interactive.sh   | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 53f4e14a..3e49d8b046ca 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -437,6 +437,8 @@ without an explicit `--interactive`.
commit from `pick` to `squash` (or `fixup`).  Ignores subsequent
"fixup! " or "squash! " after the first, in case you referred to an
earlier fixup/squash with `git commit --fixup/--squash`.
+   Note that their short counterparts, namely "s! ..." and "f! ..."
+   behave the same way.
 +
 This option is only valid when the `--interactive` option is used.
 +
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 4fa621062cdf..61450064c5c4 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -790,7 +790,7 @@ rearrange_squash () {
do
test -z "${format}" || message=$(git log -n 1 --format="%s" 
${sha1})
case "$message" in
-   "squash! "*|"fixup! "*)
+   "squash! "*|"s! "*|"fixup! "*|"f! "*)
action="${message%%!*}"
rest=$message
prefix=
@@ -798,7 +798,7 @@ rearrange_squash () {
while :
do
case "$rest" in
-   "squash! "*|"fixup! "*)
+   "squash! "*|"s! "*|"fixup! "*|"f! "*)
prefix="$prefix${rest%%!*},"
rest="${rest#*! }"
;;
-- 
2.9.3



[PATCH v3 0/6] rebase -i: add config to abbreviate command-names

2017-05-01 Thread Liam Beguin
Add the 'rebase.abbreviateCommands' configuration option to allow
`git rebase -i` to default to the single-letter command-names in
the todo list.

Using single-letter command-names can present two benefits.
First, it makes it easier to change the action since you only need to
replace a single character (i.e.: in vim "r" instead of
"ciw").
Second, using this with a large enough value of 'core.abbrev' enables the
lines of the todo list to remain aligned making the files easier to
read.

Changes from v1 to v2:
 - Improve Documentation and commit message

Changes from v2 to v3:
 - Transform a single patch into a series
 - change option name from 'rebase.abbrevCmd' to 'rebase.abbreviateCommands'
 - abbreviate all commands (not just pick)
 - teach `git rebase -i --autosquash` to recognise single-letter command-names
 - move rebase configuration documentation to Documentation/rebase-config.txt
 - update Documentation to use the preferred naming for the todo list
 - update Documentation and commit messages according to feedback

Liam Beguin (6):
  rebase -i: add abbreviated command-names handling
  rebase -i: add abbreviate_commands function
  rebase -i: add short command-name in --autosquash
  Documentation: move rebase.* config variables to a separate
rebase-config.txt
  Documentation: use prefered name for the 'todo list' script
  Documentation: document the rebase.abbreviateCommands option

 Documentation/config.txt| 31 +---
 Documentation/git-rebase.txt| 21 +++-
 Documentation/rebase-config.txt | 53 +
 git-rebase--interactive.sh  | 24 
 4 files changed, 78 insertions(+), 52 deletions(-)
 create mode 100644 Documentation/rebase-config.txt

-- 
2.9.3



[PATCH v3 1/6] rebase -i: add abbreviated command-names handling

2017-05-01 Thread Liam Beguin
make sure 'add_exec_commands' and 'transform_todo_ids' also understand
the abbreviated versions of the command-names.

Signed-off-by: Liam Beguin 
---
 git-rebase--interactive.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2c9c0165b5ab..9b8a030ff045 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -754,7 +754,7 @@ transform_todo_ids () {
while read -r command rest
do
case "$command" in
-   "$comment_char"* | exec)
+   "$comment_char"* |x|exec)
# Be careful for oddball commands like 'exec'
# that do not have a SHA-1 at the beginning of $rest.
;;
@@ -871,7 +871,7 @@ add_exec_commands () {
while read -r insn rest
do
case $insn in
-   pick)
+   p|pick)
test -n "$first" ||
printf "%s" "$cmd"
;;
-- 
2.9.3



[PATCH v3 2/6] rebase -i: add abbreviate_commands function

2017-05-01 Thread Liam Beguin
Once the rest of the processing is done, the `abbreviate_commands`
function is called. If the 'rebase.abbreviateCommands' option is set to
true, the function will replace each command-name by its abbreviated
form.

Signed-off-by: Liam Beguin 
---
 git-rebase--interactive.sh | 16 
 1 file changed, 17 insertions(+)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 9b8a030ff045..4fa621062cdf 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -884,6 +884,20 @@ add_exec_commands () {
mv "$1.new" "$1"
 }
 
+abbreviate_commands () {
+   test "$(git config --bool rebase.abbreviateCommands)" = true || return
+
+   while read -r command rest
+   do
+   case $command in
+   x|exec) command=x ;;
+   *)  command=${command%${command#?}} ;;
+   esac
+   printf "%s\n" "$command $rest"
+   done <"$1" >"$1.new" &&
+   mv -f "$1.new" "$1"
+}
+
 # Check if the SHA-1 passed as an argument is a
 # correct one, if not then print $2 in "$todo".badsha
 # $1: the SHA-1 to test
@@ -1143,6 +1158,7 @@ edit-todo)
git stripspace --strip-comments <"$todo" >"$todo".new
mv -f "$todo".new "$todo"
collapse_todo_ids
+   abbreviate_commands "$todo"
append_todo_help
gettext "
 You are editing the todo file of an ongoing interactive rebase.
@@ -1281,6 +1297,7 @@ fi
 test -s "$todo" || echo noop >> "$todo"
 test -n "$autosquash" && rearrange_squash "$todo"
 test -n "$cmd" && add_exec_commands "$todo"
+abbreviate_commands "$todo"
 
 todocount=$(git stripspace --strip-comments <"$todo" | wc -l)
 todocount=${todocount##* }
-- 
2.9.3



[PATCH v3 4/6] Documentation: move rebase.* config variables to a separate rebase-config.txt

2017-05-01 Thread Liam Beguin
Move configuration variables to a separate file in order to remove
duplicates, and include it in config.txt and git-rebase.txt.
The new descriptions are taken from config.txt as they are more verbose.

Signed-off-by: Liam Beguin 
---
 Documentation/config.txt| 31 +--
 Documentation/git-rebase.txt| 19 +--
 Documentation/rebase-config.txt | 30 ++
 3 files changed, 32 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/rebase-config.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d5155..6b647c504e8f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2583,36 +2583,7 @@ push.recurseSubmodules::
is retained. You may override this configuration at time of push by
specifying '--recurse-submodules=check|on-demand|no'.
 
-rebase.stat::
-   Whether to show a diffstat of what changed upstream since the last
-   rebase. False by default.
-
-rebase.autoSquash::
-   If set to true enable `--autosquash` option by default.
-
-rebase.autoStash::
-   When set to true, automatically create a temporary stash
-   before the operation begins, and apply it after the operation
-   ends.  This means that you can run rebase on a dirty worktree.
-   However, use with care: the final stash application after a
-   successful rebase might result in non-trivial conflicts.
-   Defaults to false.
-
-rebase.missingCommitsCheck::
-   If set to "warn", git rebase -i will print a warning if some
-   commits are removed (e.g. a line was deleted), however the
-   rebase will still proceed. If set to "error", it will print
-   the previous warning and stop the rebase, 'git rebase
-   --edit-todo' can then be used to correct the error. If set to
-   "ignore", no checking is done.
-   To drop a commit without warning or error, use the `drop`
-   command in the todo-list.
-   Defaults to "ignore".
-
-rebase.instructionFormat::
-   A format string, as specified in linkgit:git-log[1], to be used for
-   the instruction list during an interactive rebase.  The format will 
automatically
-   have the long commit hash prepended to the format.
+include::rebase-config.txt[]
 
 receive.advertiseAtomic::
By default, git-receive-pack will advertise the atomic push
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 3e49d8b046ca..702a46adfa18 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -203,24 +203,7 @@ Alternatively, you can undo the 'git rebase' with
 CONFIGURATION
 -
 
-rebase.stat::
-   Whether to show a diffstat of what changed upstream since the last
-   rebase. False by default.
-
-rebase.autoSquash::
-   If set to true enable `--autosquash` option by default.
-
-rebase.autoStash::
-   If set to true enable `--autostash` option by default.
-
-rebase.missingCommitsCheck::
-   If set to "warn", print warnings about removed commits in
-   interactive mode. If set to "error", print the warnings and
-   stop the rebase. If set to "ignore", no checking is
-   done. "ignore" by default.
-
-rebase.instructionFormat::
-   Custom commit list format to use during an `--interactive` rebase.
+include::rebase-config.txt[]
 
 OPTIONS
 ---
diff --git a/Documentation/rebase-config.txt b/Documentation/rebase-config.txt
new file mode 100644
index ..71872131
--- /dev/null
+++ b/Documentation/rebase-config.txt
@@ -0,0 +1,30 @@
+rebase.stat::
+   Whether to show a diffstat of what changed upstream since the last
+   rebase. False by default.
+
+rebase.autoSquash::
+   If set to true enable `--autosquash` option by default.
+
+rebase.autoStash::
+   When set to true, automatically create a temporary stash
+   before the operation begins, and apply it after the operation
+   ends.  This means that you can run rebase on a dirty worktree.
+   However, use with care: the final stash application after a
+   successful rebase might result in non-trivial conflicts.
+   Defaults to false.
+
+rebase.missingCommitsCheck::
+   If set to "warn", git rebase -i will print a warning if some
+   commits are removed (e.g. a line was deleted), however the
+   rebase will still proceed. If set to "error", it will print
+   the previous warning and stop the rebase, 'git rebase
+   --edit-todo' can then be used to correct the error. If set to
+   "ignore", no checking is done.
+   To drop a commit without warning or error, use the `drop`
+   command in the todo-list.
+   Defaults to "ignore".
+
+rebase.instructionFormat::
+   A format string, as specified in linkgit:git-log[1], to be used for
+   the instruction list during an interactive rebase.  The format will 
automatically
+   have the long commit hash 

[PATCH v3 5/6] Documentation: use preferred name for the 'todo list' script

2017-05-01 Thread Liam Beguin
Use "todo list" instead of "instruction list" or "todo-list" to
reduce further confusion regarding the name of this script.

Signed-off-by: Liam Beguin 
---
 Documentation/rebase-config.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/rebase-config.txt b/Documentation/rebase-config.txt
index 71872131..a9b1d496e63a 100644
--- a/Documentation/rebase-config.txt
+++ b/Documentation/rebase-config.txt
@@ -21,10 +21,10 @@ rebase.missingCommitsCheck::
--edit-todo' can then be used to correct the error. If set to
"ignore", no checking is done.
To drop a commit without warning or error, use the `drop`
-   command in the todo-list.
+   command in the todo list.
Defaults to "ignore".
 
 rebase.instructionFormat::
A format string, as specified in linkgit:git-log[1], to be used for
-   the instruction list during an interactive rebase.  The format will 
automatically
+   the todo list during an interactive rebase.  The format will 
automatically
have the long commit hash prepended to the format.
-- 
2.9.3



[PATCH v3 6/6] Documentation: document the rebase.abbreviateCommands option

2017-05-01 Thread Liam Beguin
Signed-off-by: Liam Beguin 
---
 Documentation/rebase-config.txt | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/Documentation/rebase-config.txt b/Documentation/rebase-config.txt
index a9b1d496e63a..0f29b7d0b89a 100644
--- a/Documentation/rebase-config.txt
+++ b/Documentation/rebase-config.txt
@@ -28,3 +28,26 @@ rebase.instructionFormat::
A format string, as specified in linkgit:git-log[1], to be used for
the todo list during an interactive rebase.  The format will 
automatically
have the long commit hash prepended to the format.
+
+rebase.abbreviateCommands::
+   If set to true, `git rebase -i` will abbreviate all command-names in the
+   todo list resulting in something like this:
+---
+   p ae6c43a first commit title
+   f 0694310 fixup! first commit title
+   p bf25ea8 second commit title
+   s e8fbbfd squash! second commit title
+   ...
+---
+   instead of:
+---
+   pick ae6c43a first commit title
+   fixup 0694310 fixup! first commit title
+   pick bf25ea8 second commit title
+   squash e8fbbfd squash! second commit title
+   ...
+---
+   As shown above, using single-letter command-names better aligns the
+   todo list when full names have different lengths. Additionally, combined
+   with a large enough value of 'core.abbrev' (say 12), the todo list is
+   guaranteed to be fully aligned.
-- 
2.9.3



Re: [PATCH v2 14/25] setup_discovered_git_dir(): help static analysis

2017-05-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> Coverity reported a memory leak in this function. However, it can only
> be called once, as setup_git_directory() changes global state and hence
> is not reentrant.
>
> Mark the variable as static to indicate that this is a singleton.
>
> Signed-off-by: Johannes Schindelin 
> ---

Does something different from what is explained above.  Rebase gotcha?

>  setup.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index 0320a9ad14c..12efca85a41 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -703,11 +703,16 @@ static const char *setup_discovered_git_dir(const char 
> *gitdir,
>  
>   /* --work-tree is set without --git-dir; use discovered one */
>   if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
> + char *p = NULL;
> + const char *ret;
> +
>   if (offset != cwd->len && !is_absolute_path(gitdir))
> - gitdir = real_pathdup(gitdir, 1);
> + gitdir = p = real_pathdup(gitdir, 1);
>   if (chdir(cwd->buf))
>   die_errno("Could not come back to cwd");
> - return setup_explicit_git_dir(gitdir, cwd, nongit_ok);
> + ret = setup_explicit_git_dir(gitdir, cwd, nongit_ok);
> + free(p);
> + return ret;
>   }
>  
>   /* #16.2, #17.2, #20.2, #21.2, #24, #25, #28, #29 (see t1510) */


Re: [PATCH v2 23/25] name-rev: avoid leaking memory in the `deref` case

2017-05-01 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> When the `name_rev()` function is asked to dereference the tip name, it
>> allocates memory. But when it turns out that another tip already
>> described the commit better than the current one, we forgot to release
>> the memory.
>
> Very well explained.
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index 92a5d8a5d26..a4ce73fb1e9 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -28,6 +28,7 @@ static void name_rev(struct commit *commit,
>>  struct rev_name *name = (struct rev_name *)commit->util;
>>  struct commit_list *parents;
>>  int parent_number = 1;
>> +char *p = NULL;
>>  
>>  parse_commit(commit);
>>  
>> @@ -35,7 +36,7 @@ static void name_rev(struct commit *commit,
>>  return;
>>  
>>  if (deref) {
>> -tip_name = xstrfmt("%s^0", tip_name);
>> +tip_name = p = xstrfmt("%s^0", tip_name);

I'll rename 'p' to 'to_free' while queuing, though.  Without a
descriptive name, it was confusing to view while resolving conflicts
with another in-flight topic.


Re: [PATCH v2 23/25] name-rev: avoid leaking memory in the `deref` case

2017-05-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> When the `name_rev()` function is asked to dereference the tip name, it
> allocates memory. But when it turns out that another tip already
> described the commit better than the current one, we forgot to release
> the memory.

Very well explained.

>
> Pointed out by Coverity.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/name-rev.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 92a5d8a5d26..a4ce73fb1e9 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -28,6 +28,7 @@ static void name_rev(struct commit *commit,
>   struct rev_name *name = (struct rev_name *)commit->util;
>   struct commit_list *parents;
>   int parent_number = 1;
> + char *p = NULL;
>  
>   parse_commit(commit);
>  
> @@ -35,7 +36,7 @@ static void name_rev(struct commit *commit,
>   return;
>  
>   if (deref) {
> - tip_name = xstrfmt("%s^0", tip_name);
> + tip_name = p = xstrfmt("%s^0", tip_name);
>  
>   if (generation)
>   die("generation: %d, but deref?", generation);
> @@ -53,8 +54,10 @@ static void name_rev(struct commit *commit,
>   name->taggerdate = taggerdate;
>   name->generation = generation;
>   name->distance = distance;
> - } else
> + } else {
> + free(p);
>   return;
> + }
>  
>   for (parents = commit->parents;
>   parents;


Re: [PATCH v2 24/25] show_worktree(): plug memory leak

2017-05-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> The buffer allocated by shorten_unambiguous_ref() needs to be released.

Yes.  Looks good.

>
> Discovered by Coverity.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/worktree.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 1722a9bdc2a..ff5dfd2b102 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -414,9 +414,11 @@ static void show_worktree(struct worktree *wt, int 
> path_maxlen, int abbrev_len)
>   find_unique_abbrev(wt->head_sha1, 
> DEFAULT_ABBREV));
>   if (wt->is_detached)
>   strbuf_addstr(, "(detached HEAD)");
> - else if (wt->head_ref)
> - strbuf_addf(, "[%s]", 
> shorten_unambiguous_ref(wt->head_ref, 0));
> - else
> + else if (wt->head_ref) {
> + char *ref = shorten_unambiguous_ref(wt->head_ref, 0);
> + strbuf_addf(, "[%s]", ref);
> + free(ref);
> + } else
>   strbuf_addstr(, "(error)");
>   }
>   printf("%s\n", sb.buf);


Re: [PATCH v2 25/25] submodule_uses_worktrees(): plug memory leak

2017-05-01 Thread Junio C Hamano
Johannes Schindelin  writes:

> There is really no reason why we would need to hold onto the allocated
> string longer than necessary.

Yup.  The longer we make the duration between the allocation and the
standard release, the more likely future code would add early returns
that forget to release the memory.

Looks correct; will queue.


> Reported by Coverity.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  worktree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/worktree.c b/worktree.c
> index bae787cf8d7..89a81b13de3 100644
> --- a/worktree.c
> +++ b/worktree.c
> @@ -399,6 +399,7 @@ int submodule_uses_worktrees(const char *path)
>  
>   /* The env would be set for the superproject. */
>   get_common_dir_noenv(, submodule_gitdir);
> + free(submodule_gitdir);
>  
>   /*
>* The check below is only known to be good for repository format
> @@ -418,7 +419,6 @@ int submodule_uses_worktrees(const char *path)
>   /* See if there is any file inside the worktrees directory. */
>   dir = opendir(sb.buf);
>   strbuf_release();
> - free(submodule_gitdir);
>  
>   if (!dir)
>   return 0;


Re: [PATCH] credential doc: make multiple-helper behavior more prominent (Re: [PATCH] clone: handle empty config values in -c)

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 05:30:10PM -0700, Jonathan Nieder wrote:

> > @@ -162,6 +152,16 @@ helper::
> > shell (so, for example, setting this to `foo --option=bar` will execute
> > `git credential-foo --option=bar` via the shell. See the manual of
> > specific helpers for examples of their use.
> > ++
> > +If there are multiple instances of the `credential.helper` configuration
> > +variable, each helper will be tried in turn, and may provide a username,
> > +password, or nothing. Once Git has acquired both a username and a
> > +password, no more helpers will be tried.
> > ++
> > +If `credential.helper` is configured to the empty string, this resets
> > +the helper list to empty (so you may override a helper set by a
> > +lower-priority config file by configuring the empty-string helper,
> 
> It's not necessarily obvious to a new user what "lower-priority" means.
> 
> Since this text is an example, maybe it should say something like "so,
> for example, you can override a helper set in /etc/gitconfig by
> configuring the empty-string helper followed by whatever set of
> helpers you would like in ~/.gitconfig".
> 
> That's orthogonal to this patch but it should be straightforward to
> roll in if it makes sense.

I'm not sure we want to get into explaining last-one-wins versus lists
versus list-resets for each option. The "FILES" section of git-config(1)
does cover this, though I could well believe it doesn't go into enough
detail or is too hard to find (my mind is sufficiently poisoned that it
all makes sense to me, but that says little about an average Git user).

The phrase "lower-priority config file" is used for push.gpgSign, too
(though there it really does seem like repeating things that are true
for all the config).

-Peff


Re: [PATCH] credential doc: make multiple-helper behavior more prominent (Re: [PATCH] clone: handle empty config values in -c)

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 05:21:14PM -0700, Jonathan Nieder wrote:

> Subject: credential doc: make multiple-helper behavior more prominent
> 
> Git's configuration system works by reading multiple configuration
> files in order, from general to specific:
> 
>  - first, the system configuration /etc/gitconfig
>  - then the user's configuration (~/.gitconfig or ~/.config/git/config)
>  - then the repository configuration (.git/config)
> 
> For single-valued configuration items, the latest value wins.  For
> multi-valued configuration items, values accumulate in that order.
> 
> For example, this allows setting a credential helper globally in
> ~/.gitconfig that git will try to use in all repositories, regardless
> of whether they additionally provide another helper.  This is usually
> a nice thing --- e.g. I can install helpers to use my OS keychain and
> to cache credentials for a short period of time globally.
> 
> Sometimes people want to be able to override an inherited setting.
> For the credential.helper setting, this is done by setting the
> configuration item to empty before giving it a new value.  This is
> already documented by the documentation is hard to find ---
> git-config(1) says to look at gitcredentials(7) and the config
> reference in gitcredentials(7) doesn't mention this issue.
> 
> Move the documentation to the config reference to make it easier to
> find.

I know I am the last person in the world to criticize somebody for being
too verbose in a commit message, but... :)

I think this background about how multi-values are handled isn't that
useful here, because we're not changing anything to do with that. We're
just moving the documentation around. So I think a more compelling
explanation would focus on the documentation locations. Like:

  The behavior of multiple credential.helper config values is explained
  in the rather-large "AVOIDING REPETITION" section of
  gitcredentials(7), but not mentions at all in the "CONFIGURATION
  OPTIONS" section. That's OK if the user is reading the manpage from
  top to bottom, but users often don't do that. The entry for
  credential.helper in git-config(1) points them to gitcredentials(7),
  which would make it reasonable for them to skip straight to the
  "CONFIGURATION OPTIONS" section.

With or without the suggested commit message, this looks like an
improvement to me.

> > After reading this I'm still a little fuzzy on why the empty helper line
> > is needed to avoid using the credential helper from /etc/gitconfig.
> 
> See "git help credentials":
> 
>If there are multiple instances of the credential.helper configuration
>variable, each helper will be tried in turn, and may provide a
>username, password, or nothing. Once Git has acquired both a username
>and a password, no more helpers will be tried.
> 
>If credential.helper is configured to the empty string, this resets the
>helper list to empty (so you may override a helper set by a
>lower-priority config file by configuring the empty-string helper,
>followed by whatever set of helpers you would like).
> 
> That's a bit obscure, though --- I didn't find it when I looked in "git
> help config".  How about this patch?

So I think this looks fine, but I wonder if we should discuss multi-vars
some in Documentation/config.txt. It would be really nice if we could
claim "the usual behavior for multi-vars is to reset the list upon
seeing an empty entry". But probably we should make sure more of them
handle that before making such a claim. :)

-Peff


Re: [PATCH 1/5] submodule_move_head: fix leak of struct child_process

2017-05-01 Thread Junio C Hamano
Brandon Williams  writes:

> On 05/01, Stefan Beller wrote:
>> While fixing the leak of `cp`, reuse it instead of having to declare
>> another struct child_process.
>> 
>> Signed-off-by: Stefan Beller 
>
> This shouldn't be needed as 'finish_command' does the cleanup for you by
> calling 'child_prcoess_clear()'.

Yes, indeed.  

It might not hurt to update the code so that the cp is reused in the
second run instead of using a separate instance cp1, but that is a
topic separate from fixing a non-existing leak.

>> ---
>>  submodule.c | 15 +--
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>> 
>> diff --git a/submodule.c b/submodule.c
>> index d3299e29c0..cd098cf12b 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1466,17 +1466,19 @@ int submodule_move_head(const char *path,
>>  goto out;
>>  }
>>  
>> +child_process_clear();
>> +
>>  if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
>>  if (new) {
>> -struct child_process cp1 = CHILD_PROCESS_INIT;
>> +child_process_init();
>>  /* also set the HEAD accordingly */
>> -cp1.git_cmd = 1;
>> -cp1.no_stdin = 1;
>> -cp1.dir = path;
>> +cp.git_cmd = 1;
>> +cp.no_stdin = 1;
>> +cp.dir = path;
>>  
>> -argv_array_pushl(, "update-ref", "HEAD", new, 
>> NULL);
>> +argv_array_pushl(, "update-ref", "HEAD", new, 
>> NULL);
>>  
>> -if (run_command()) {
>> +if (run_command()) {
>>  ret = -1;
>>  goto out;
>>  }
>> @@ -1492,6 +1494,7 @@ int submodule_move_head(const char *path,
>>  }
>>  }
>>  out:
>> +child_process_clear();
>>  return ret;
>>  }
>>  
>> -- 
>> 2.13.0.rc1.1.gbc33f0f778
>> 


Re: [PATCH] clone: handle empty config values in -c

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 05:05:15PM -0700, Jonathan Nieder wrote:

> "git clone --config" uses the following incantation to add an item to
> a config file, instead of replacing an existing value:
> 
>   git_config_set_multivar_gently(key, value, "^$", 0)
> 
> As long as no existing value matches the regex ^$, that works as
> intended and adds to the config.  When a value is empty, though, it
> replaces the existing value.
> [...]
> ---
> Thoughts?

Yeah, I think this is the exact reason we introduced CONFIG_REGEX_NONE
in the first place. At the time we fixed "config --add", but "clone -c"
needs the same treatment.

Grepping around, it looks like we should probably be using this in other
places, too:

  - writing fetch refspecs in clone

  - throughout remote.c for urls and refspecs

I don't think an empty variable has meaning in those places, so probably
nobody really cares. But passing CONFIG_REGEX_NONE seems to better match
the original intent. And in the long run I think we probably ought to
make an empty "remote.foo.url" mean "reset the url list to empty", as
we've started to do with other multivars (like credential helpers).

So your patch looks fine, but I'd be pleased if you wanted to take it
further and eradicate this "^$" anti-pattern through the code base.

-Peff

PS I notice that the documentation for "config --add" explicitly
   mentions that it behaves like "^$". This isn't accurate anymore. We
   should probably update that. I wondered if callers of git-config
   would need some way to specify "--replace-all --no-value" or
   something, but that is precisely what "--add" is. So I think it's
   just a documentation problem.


Re: [PATCH v2] send-email: new options to walkaround email server limits

2017-05-01 Thread Junio C Hamano
xiaoqiang zhao  writes:

> Some email server(e.g. smtp.163.com) limits a fixed number emails to
> be send per session(connection) and this will lead to a send faliure.
>
> With --batch-size= option, an auto reconnection will occur when
> number of sent email reaches  and the problem is solved.
>
> --relogin-delay option will make some delay between two successive
> email server login.

Here is how I would have written the above..

send-email: --batch-size to work around some SMTP server limit

Some email servers (e.g. smtp.163.com) limit the number emails to be
sent per session(connection) and this will lead to a faliure when
sending many messages.

Teach send-email to disconnect after sending a number of messages
(configurable via the --batch-size= option), wait for a few
seconds (configurable via the --relogin-delay= option) and
reconnect, to work around such a limit.


But I am having a huge problem seeing how this patch is correct.  It
always is troubling to see a patch that makes the behaviour of a
program change even when the optional feature it implements is not
being used at all.  Why does it even have to touch smtp_auth_maybe?
Why does the updated smtp_auth_maybe have to do quite different
things even when batch-size is not defined from the original?
What is that new "Auth use saved password. \n" message about?

After reading the problem description in the proposed log message,
the most natural update to achieve the stated goal is to add code to
the loop that has the only caller to send_message() function, I
would think.  The loop goes over the input files and prepares the
variables used in send_message() and have the function send a single
message, initializing $smtp as necessary but otherwise reusing $smtp
the previous round has prepared.  So just after $message_id is
undefed in the loop, I expected that you would count "number of
messages sent so far during this session", and when that number
exceeds the batch size, disconnect $smtp and unset the variable,
and sleep for a bit, without having to change anything else.

Puzzled.

>  sub smtp_auth_maybe {
> - if (!defined $smtp_authuser || $auth) {
> + if (!defined $smtp_authuser || $num_sent != 0) {
>   return 1;
>   }
>  
> + if ($auth && $num_sent == 0) {
> + print "Auth use saved password. \n";
> + return !!$smtp->auth($smtp_authuser, $smtp_authpass);
> + }
> +
>   # Workaround AUTH PLAIN/LOGIN interaction defect
>   # with Authen::SASL::Cyrus
>   eval {
> @@ -1187,6 +1201,7 @@ sub smtp_auth_maybe {
>   'password' => $smtp_authpass
>   }, sub {
>   my $cred = shift;
> + $smtp_authpass = $cred->{'password'};
>  
>   if ($smtp_auth) {
>   my $sasl = Authen::SASL->new(
> @@ -1442,6 +1457,15 @@ EOF
>   }
>   }
>  
> + $num_sent++;
> + if ($num_sent == $batch_size) {
> + $smtp->quit;
> + $smtp = undef;
> + $num_sent = 0;
> + print "Reconnect SMTP server required. \n";
> + sleep($relogin_delay);
> + }
> +
>   return 1;
>  }


Re: [PATCH 2/5] submodule_move_head: prepare env for child correctly

2017-05-01 Thread Junio C Hamano
Stefan Beller  writes:

> We forgot to prepare the submodule env, which is only a problem for
> nested submodules. See 2e5d6503bd (ls-files: fix recurse-submodules
> with nested submodules, 2017-04-13) for further explanation.
>
> Signed-off-by: Stefan Beller 
> ---

Sounds good (if only because this makes it similar to other
codepaths).

Is this something whose breakage before the patch is easily
demonstrated with a test?

>  submodule.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/submodule.c b/submodule.c
> index cd098cf12b..c7a7a33916 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1476,6 +1476,7 @@ int submodule_move_head(const char *path,
>   cp.no_stdin = 1;
>   cp.dir = path;
>  
> + prepare_submodule_repo_env(_array);
>   argv_array_pushl(, "update-ref", "HEAD", new, 
> NULL);
>  
>   if (run_command()) {


Re: Proposal for missing blob support in Git repos

2017-05-01 Thread Junio C Hamano
Jonathan Tan  writes:

> On 05/01/2017 04:29 PM, Junio C Hamano wrote:
>> Jonathan Tan  writes:
>>
>>> Thanks for your comments. If you're referring to the codepath
>>> involving write_sha1_file() (for example, builtin/hash-object ->
>>> index_fd or builtin/unpack-objects), that is fine because
>>> write_sha1_file() invokes freshen_packed_object() and
>>> freshen_loose_object() directly to check if the object already exists
>>> (and thus does not invoke the new mechanism in this patch).
>>
>> Is that a good thing, though?  It means that you an attacker can
>> feed one version to the remote object store your "grab blob" hook
>> gets the blobs from, and have you add a colliding object locally,
>> and the usual "are we recording the same object as existing one?"
>> check is bypassed.
>
> If I understand this correctly, what you mean is the situation where
> the hook adds an object to the local repo, overriding another object
> of the same name?

No.  

write_sha1_file() pays attention to objects already in the local
object store to avoid hash collisions that can be used to replace a
known-to-be-good object and that is done as a security measure.
What I am reading in your response was that this new mechanism
bypasses that, and I was wondering if that is a good thing.



Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

2017-05-01 Thread Junio C Hamano
Stefan Beller  writes:

> This applies to origin/master.
>
> For better readability and understandability for newcomers it is a good idea
> to not offer 2 APIs doing the same thing with on being the #define of the 
> other.
>
> In the long run we may want to drop the macros guarded by
> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.

Why?  Why should we keep typing _index, when most of the time we
are given _the_ index and working on it?


Re: [PATCH 0/5] Some submodule bugfixes and "reattaching detached HEADs"

2017-05-01 Thread Junio C Hamano
Brandon Williams  writes:

> I don't know why submodules were originally designed to be in a
> detached HEAD state but I much prefer working on branches (as I'm sure
> many other developers do) so the prospect of this becoming the norm is
> exciting! :D

The reason is because the superproject already records where the
HEAD in the submodule should be, when any of its commits is checked
out.  The tip of a branch (which one???) in a submodule may match
that commit, in which case there is nothing gained by having you on
that branch, or it may not match that commit, in which case it is
unclear what should happen.  Leaving your submodule on the branch
would mean the state of your tree as a whole does not match what the
checkout operation in the superproject wanted to create.  Resetting
the branch would mean you may lose the history of the branch.

Thinking of the detached HEAD state as an implicit unnamed branch
that matches the state the superproject checkout expects was
conceptually one of the cleanest choices.

But all of the above concentrates on what should happen immediately
after you do a checkout in the superproject, and it would be OK for
a sight-seer.  Once you want to work in the submodules to build on
their histories, not being on a branch does become awkward.  For one
thing, after you are done with the work in your submodule, you would
want to update the superproject and make a commit that records the
commit in the submodule, and then you would want to publish both the
superproject and the submodule because both of them are now
updated.  The commit in the superproject may be on the branch that
is currently checked out, but we don't know what branch the commit
in the submoudule should go.

The submodule..branch configuration may be a good source to
learn that piece of information, but it does not fully solve the
issue.  It is OK for the tip of that branch to be at or behind the
commit the superproject records, but it is unclear what should
happen if the local tip of that branch is ahead of the commit in the
superproject when checkout happens.

By the way, how does this topic work with the various checkout modes
that can be specified with submodule..update?




Re: [PATCH v2 5/6] submodule: improve submodule_has_commits

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 6:02 PM, Brandon Williams  wrote:
> Teach 'submodule_has_commits()' to ensure that if a commit exists in a
> submodule, that it is also reachable from a ref.
>
> This is a preparatory step prior to merging the logic which checks for
> changed submodules when fetching or pushing.
>
> Change-Id: I4fed2acfa7e69a5fbbca534df165671e77a90f22
> Signed-off-by: Brandon Williams 
> ---
>  submodule.c | 34 ++
>  1 file changed, 34 insertions(+)
>
> diff --git a/submodule.c b/submodule.c
> index 3bcf44521..057695e64 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -644,10 +644,44 @@ static int submodule_has_commits(const char *path, 
> struct oid_array *commits)
>  {
> int has_commit = 1;
>
> +   /*
> +* Perform a cheap, but incorrect check for the existance of 
> 'commits'.
> +* This is done by adding the submodule's object store to the in-core
> +* object store, and then querying for each commit's existance.  If we
> +* do not have the commit object anywhere, there is no chance we have
> +* it in the object store of the correct submodule and have it
> +* reachable from a ref, so we can fail early without spawning 
> rev-list
> +* which is expensive.
> +*/
> if (add_submodule_odb(path))
> return 0;

Thanks for the comment!

>
> oid_array_for_each_unique(commits, check_has_commit, _commit);
> +
> +   if (has_commit) {
> +   /*
> +* Even if the submodule is checked out and the commit is
> +* present, make sure it exists in the submodule's object 
> store
> +* and that it is reachable from a ref.
> +*/
> +   struct child_process cp = CHILD_PROCESS_INIT;
> +   struct strbuf out = STRBUF_INIT;
> +
> +   argv_array_pushl(, "rev-list", "-n", "1", NULL);
> +   oid_array_for_each_unique(commits, append_oid_to_argv, 
> );
> +   argv_array_pushl(, "--not", "--all", NULL);
> +
> +   prepare_submodule_repo_env(_array);
> +   cp.git_cmd = 1;
> +   cp.no_stdin = 1;
> +   cp.dir = path;
> +
> +   if (capture_command(, , GIT_MAX_HEXSZ + 1) || out.len)

eh, I gave too much and self-contradicting feedback here earlier,
ideally I'd like to review this to be similar as:

if (capture_command(, , GIT_MAX_HEXSZ + 1)
die("cannot capture git-rev-list in submodule '%s', sub->path);

if (out.len)
has_commit = 0;

instead as that does not have a silent error. (though it errs
on the safe side, so maybe it is not to bad.)

I could understand if the callers do not want to have
`submodule_has_commits` die()-ing on them, so maybe

if (capture_command(, , GIT_MAX_HEXSZ + 1) {
warning("cannot capture git-rev-list in submodule '%s', sub->path);
has_commit = -1;
/* this would require auditing all callers and handling -1 though */
}

if (out.len)
has_commit = 0;

As the comment eludes, we'd then have
 0 -> has no commits
 1 -> has commits
-1 -> error

So to group (error || has_no_commits), we could write

if (submodule_has_commits(..) <= 0)

which is awkward. So maybe we can rename the function
to misses_submodule_commits instead, as then we could
flip the return value as well and have

 0 -> has commits
 1 -> has no commits
-1 -> error

and the lazy invoker could just go with

if (!misses_submodule_commits(..))
proceed();
else
die("missing submodule commits or errors; I don't care");

whereas the careful invoker could go with

switch (misses_submodule_commits(..)) {
case 0:
proceed(); break;
case 1:
pull_magic_trick(); break;
case -1:
make_errors_go_away_and_retry(); break;
}



---
On the longer term plan:
As you wrote about costs. Maybe instead of invoking rev-list,
we could try to have this in-core as a first try-out for
"classified-repos", looking at refs.h there is e.g.

int for_each_ref_submodule(const char *submodule_path,
  each_ref_fn fn, void *cb_data);

which we could use to obtain all submodule refs and then
use the revision walking machinery to find out ourselves if
we have or do not have the commits. (As we loaded the
odb of the submodule, this would *just work*, building one
kludgy hack upon the next.)

Thanks,
Stefan


Re: [PATCH v2 1/6] submodule: rename add_sha1_to_array

2017-05-01 Thread Brandon Williams
On 05/01, Stefan Beller wrote:
> On Mon, May 1, 2017 at 6:02 PM, Brandon Williams  wrote:
> 
> > Change-Id: Ia6d15f34cee4d0dc32f7a475c69f4cb3aa8ce5bf
> 
> Uh?

Oops! I forgot to run it through my scrubbing script...

> 
> Maybe another side project for the long todo list: get git-trailers into 
> shape,
> such that it can be configured to yell at you upon formatting the patch or
> sending email when a given trailer occurs.
> 
> Thanks,
> Stefan

-- 
Brandon Williams


Re: [PATCH v2 1/6] submodule: rename add_sha1_to_array

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 6:02 PM, Brandon Williams  wrote:

> Change-Id: Ia6d15f34cee4d0dc32f7a475c69f4cb3aa8ce5bf

Uh?

Maybe another side project for the long todo list: get git-trailers into shape,
such that it can be configured to yell at you upon formatting the patch or
sending email when a given trailer occurs.

Thanks,
Stefan


[PATCH v2 6/6] submodule: refactor logic to determine changed submodules

2017-05-01 Thread Brandon Williams
There are currently two instances (fetch and push) where we want to
determine if submodules have changed given some revision specification.
These two instances don't use the same logic to generate a list of
changed submodules and as a result there is a fair amount of code
duplication.

This patch refactors these two code paths such that they both use the
same logic to generate a list of changed submodules.  This also makes it
easier for future callers to be able to reuse this logic as they only
need to create an argv_array with the revision specification to be using
during the revision walk.

Change-Id: Ie2381bd3eaecf5cec62e127df470f27a44ea47da
Signed-off-by: Brandon Williams 
---
 submodule.c | 247 ++--
 1 file changed, 105 insertions(+), 142 deletions(-)

diff --git a/submodule.c b/submodule.c
index 057695e64..7eaa3d384 100644
--- a/submodule.c
+++ b/submodule.c
@@ -617,6 +617,94 @@ const struct submodule *submodule_from_ce(const struct 
cache_entry *ce)
return submodule_from_path(null_sha1, ce->name);
 }
 
+static struct oid_array *submodule_commits(struct string_list *submodules,
+  const char *path)
+{
+   struct string_list_item *item;
+
+   item = string_list_insert(submodules, path);
+   if (item->util)
+   return (struct oid_array *) item->util;
+
+   /* NEEDSWORK: should we have oid_array_init()? */
+   item->util = xcalloc(1, sizeof(struct oid_array));
+   return (struct oid_array *) item->util;
+}
+
+static void collect_changed_submodules_cb(struct diff_queue_struct *q,
+ struct diff_options *options,
+ void *data)
+{
+   int i;
+   struct string_list *changed = data;
+
+   for (i = 0; i < q->nr; i++) {
+   struct diff_filepair *p = q->queue[i];
+   struct oid_array *commits;
+   if (!S_ISGITLINK(p->two->mode))
+   continue;
+
+   if (S_ISGITLINK(p->one->mode)) {
+   /*
+* NEEDSWORK: We should honor the name configured in
+* the .gitmodules file of the commit we are examining
+* here to be able to correctly follow submodules
+* being moved around.
+*/
+   commits = submodule_commits(changed, p->two->path);
+   oid_array_append(commits, >two->oid);
+   } else {
+   /* Submodule is new or was moved here */
+   /*
+* NEEDSWORK: When the .git directories of submodules
+* live inside the superprojects .git directory some
+* day we should fetch new submodules directly into
+* that location too when config or options request
+* that so they can be checked out from there.
+*/
+   continue;
+   }
+   }
+}
+
+/*
+ * Collect the paths of submodules in 'changed' which have changed based on
+ * the revisions as specified in 'argv'.  Each entry in 'changed' will also
+ * have a corresponding 'struct oid_array' (in the 'util' field) which lists
+ * what the submodule pointers were updated to during the change.
+ */
+static void collect_changed_submodules(struct string_list *changed,
+  struct argv_array *argv)
+{
+   struct rev_info rev;
+   const struct commit *commit;
+
+   init_revisions(, NULL);
+   setup_revisions(argv->argc, argv->argv, , NULL);
+   if (prepare_revision_walk())
+   die("revision walk setup failed");
+
+   while ((commit = get_revision())) {
+   struct rev_info diff_rev;
+
+   init_revisions(_rev, NULL);
+   diff_rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
+   diff_rev.diffopt.format_callback = 
collect_changed_submodules_cb;
+   diff_rev.diffopt.format_callback_data = changed;
+   diff_tree_combined_merge(commit, 1, _rev);
+   }
+
+   reset_revision_walk();
+}
+
+static void free_submodules_oids(struct string_list *submodules)
+{
+   struct string_list_item *item;
+   for_each_string_list_item(item, submodules)
+   oid_array_clear((struct oid_array *) item->util);
+   string_list_clear(submodules, 1);
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
@@ -729,92 +817,31 @@ static int submodule_needs_pushing(const char *path, 
struct oid_array *commits)
return 0;
 }
 
-static struct oid_array *submodule_commits(struct string_list *submodules,
-   const char *path)
-{

[PATCH v2 4/6] submodule: change string_list changed_submodule_paths

2017-05-01 Thread Brandon Williams
Eliminate a call to 'xstrdup()' by changing the string_list
'changed_submodule_paths' to duplicated strings added to it.

Change-Id: Id4b53837a6e209c0c0837c9f5ba06c70df2ffe06
Signed-off-by: Brandon Williams 
---
 submodule.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7baa28ae0..3bcf44521 100644
--- a/submodule.c
+++ b/submodule.c
@@ -20,7 +20,7 @@
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int parallel_jobs = 1;
-static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
+static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -939,7 +939,7 @@ static void submodule_collect_changed_cb(struct 
diff_queue_struct *q,
struct string_list_item *path;
path = 
unsorted_string_list_lookup(_submodule_paths, p->two->path);
if (!path && !is_submodule_commit_present(p->two->path, 
p->two->oid.hash))
-   string_list_append(_submodule_paths, 
xstrdup(p->two->path));
+   string_list_append(_submodule_paths, 
p->two->path);
} else {
/* Submodule is new or was moved here */
/* NEEDSWORK: When the .git directories of submodules
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 2/6] submodule: rename free_submodules_sha1s

2017-05-01 Thread Brandon Williams
Rename 'free_submodules_sha1s()' to 'free_submodules_oids()' since the
function frees a 'struct string_list' which has a 'struct oid_array'
stored in the 'util' field.

Change-Id: I0c52fa3af1b1492b196bcf52f8b8cc8f5daf085d
Signed-off-by: Brandon Williams 
---
 submodule.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index be0f5d847..46abd52b1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -738,7 +738,7 @@ static void find_unpushed_submodule_commits(struct commit 
*commit,
diff_tree_combined_merge(commit, 1, );
 }
 
-static void free_submodules_sha1s(struct string_list *submodules)
+static void free_submodules_oids(struct string_list *submodules)
 {
struct string_list_item *item;
for_each_string_list_item(item, submodules)
@@ -779,7 +779,8 @@ int find_unpushed_submodules(struct oid_array *commits,
if (submodule_needs_pushing(submodule->string, commits))
string_list_insert(needs_pushing, submodule->string);
}
-   free_submodules_sha1s();
+
+   free_submodules_oids();
 
return needs_pushing->nr;
 }
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 3/6] submodule: remove add_oid_to_argv

2017-05-01 Thread Brandon Williams
The function 'add_oid_to_argv()' provides the same functionality as
'append_oid_to_argv()'.  Remove this duplicate function and instead use
'append_oid_to_argv()' where 'add_oid_to_argv()' was previously used.

Change-Id: Id0abea012707460cb7000df761e6557ba5cd88d9
Signed-off-by: Brandon Williams 
---
 submodule.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/submodule.c b/submodule.c
index 46abd52b1..7baa28ae0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -970,12 +970,6 @@ void check_for_new_submodule_commits(struct object_id *oid)
oid_array_append(_tips_after_fetch, oid);
 }
 
-static int add_oid_to_argv(const struct object_id *oid, void *data)
-{
-   argv_array_push(data, oid_to_hex(oid));
-   return 0;
-}
-
 static void calculate_changed_submodule_paths(void)
 {
struct rev_info rev;
@@ -989,10 +983,10 @@ static void calculate_changed_submodule_paths(void)
init_revisions(, NULL);
argv_array_push(, "--"); /* argv[0] program name */
oid_array_for_each_unique(_tips_after_fetch,
-  add_oid_to_argv, );
+  append_oid_to_argv, );
argv_array_push(, "--not");
oid_array_for_each_unique(_tips_before_fetch,
-  add_oid_to_argv, );
+  append_oid_to_argv, );
setup_revisions(argv.argc, argv.argv, , NULL);
if (prepare_revision_walk())
die("revision walk setup failed");
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 1/6] submodule: rename add_sha1_to_array

2017-05-01 Thread Brandon Williams
Rename 'add_sha1_to_array()' to 'append_oid_to_array()' to more
accurately describe what the function does since it handles 'struct
object_id' and not sha1 character arrays.

Change-Id: Ia6d15f34cee4d0dc32f7a475c69f4cb3aa8ce5bf
Signed-off-by: Brandon Williams 
---
 submodule.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/submodule.c b/submodule.c
index d3299e29c..be0f5d847 100644
--- a/submodule.c
+++ b/submodule.c
@@ -951,17 +951,18 @@ static void submodule_collect_changed_cb(struct 
diff_queue_struct *q,
}
 }
 
-static int add_sha1_to_array(const char *ref, const struct object_id *oid,
-int flags, void *data)
+static int append_oid_to_array(const char *ref, const struct object_id *oid,
+  int flags, void *data)
 {
-   oid_array_append(data, oid);
+   struct oid_array *array = data;
+   oid_array_append(array, oid);
return 0;
 }
 
 void check_for_new_submodule_commits(struct object_id *oid)
 {
if (!initialized_fetch_ref_tips) {
-   for_each_ref(add_sha1_to_array, _tips_before_fetch);
+   for_each_ref(append_oid_to_array, _tips_before_fetch);
initialized_fetch_ref_tips = 1;
}
 
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 5/6] submodule: improve submodule_has_commits

2017-05-01 Thread Brandon Williams
Teach 'submodule_has_commits()' to ensure that if a commit exists in a
submodule, that it is also reachable from a ref.

This is a preparatory step prior to merging the logic which checks for
changed submodules when fetching or pushing.

Change-Id: I4fed2acfa7e69a5fbbca534df165671e77a90f22
Signed-off-by: Brandon Williams 
---
 submodule.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/submodule.c b/submodule.c
index 3bcf44521..057695e64 100644
--- a/submodule.c
+++ b/submodule.c
@@ -644,10 +644,44 @@ static int submodule_has_commits(const char *path, struct 
oid_array *commits)
 {
int has_commit = 1;
 
+   /*
+* Perform a cheap, but incorrect check for the existance of 'commits'.
+* This is done by adding the submodule's object store to the in-core
+* object store, and then querying for each commit's existance.  If we
+* do not have the commit object anywhere, there is no chance we have
+* it in the object store of the correct submodule and have it
+* reachable from a ref, so we can fail early without spawning rev-list
+* which is expensive.
+*/
if (add_submodule_odb(path))
return 0;
 
oid_array_for_each_unique(commits, check_has_commit, _commit);
+
+   if (has_commit) {
+   /*
+* Even if the submodule is checked out and the commit is
+* present, make sure it exists in the submodule's object store
+* and that it is reachable from a ref.
+*/
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf out = STRBUF_INIT;
+
+   argv_array_pushl(, "rev-list", "-n", "1", NULL);
+   oid_array_for_each_unique(commits, append_oid_to_argv, 
);
+   argv_array_pushl(, "--not", "--all", NULL);
+
+   prepare_submodule_repo_env(_array);
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   if (capture_command(, , GIT_MAX_HEXSZ + 1) || out.len)
+   has_commit = 0;
+
+   strbuf_release();
+   }
+
return has_commit;
 }
 
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 0/6] changed submodules

2017-05-01 Thread Brandon Williams
Changes in v2:
- few tweaks in the commit messages of a couple of the commits
- use 'GIT_MAX_HEXSZ + 1' as the hint size in [5/6]
- added a comment in [5/6] better explaining the rational for having a quick,
  incorrect check for the existence of a commit in a submodule.

Brandon Williams (6):
  submodule: rename add_sha1_to_array
  submodule: rename free_submodules_sha1s
  submodule: remove add_oid_to_argv
  submodule: change string_list changed_submodule_paths
  submodule: improve submodule_has_commits
  submodule: refactor logic to determine changed submodules

 submodule.c | 305 +---
 1 file changed, 149 insertions(+), 156 deletions(-)

-- 
2.13.0.rc1.294.g07d810a77f-goog



Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-01 Thread Junio C Hamano
Samuel Lijin  writes:

> On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
>
>> It might not be a bad idea to teach "blame" not to pay attention to
>> any path that is marked as "-diff" (e.g. binary files) when trying
>> to see if remaining contents appeared by borrowing from them.  We do
>> not have that heuristics (yet).
>
> Could you elaborate on this? Do you mean to tell diffcore-rename to
> ignore diff_filespec objects if they're binary?

No and yes ;-).  I do not think it is a good idea to unconditionally
ignore binary in diffcore-rename.

But when we know that the rename detection is called from inside
blame.c, where by definition we would be digging line-oriented
contents, there is no point checking if the contents we are looking
for came from an existing binary file.


Re: Proposal for missing blob support in Git repos

2017-05-01 Thread Brandon Williams
On 05/01, Jonathan Tan wrote:
> On 05/01/2017 04:29 PM, Junio C Hamano wrote:
> >Jonathan Tan  writes:
> >
> >>Thanks for your comments. If you're referring to the codepath
> >>involving write_sha1_file() (for example, builtin/hash-object ->
> >>index_fd or builtin/unpack-objects), that is fine because
> >>write_sha1_file() invokes freshen_packed_object() and
> >>freshen_loose_object() directly to check if the object already exists
> >>(and thus does not invoke the new mechanism in this patch).
> >
> >Is that a good thing, though?  It means that you an attacker can
> >feed one version to the remote object store your "grab blob" hook
> >gets the blobs from, and have you add a colliding object locally,
> >and the usual "are we recording the same object as existing one?"
> >check is bypassed.
> 
> If I understand this correctly, what you mean is the situation where
> the hook adds an object to the local repo, overriding another object
> of the same name? If yes, I think that is the nature of executing an
> arbitrary command. If we really want to avoid that, we could drop
> the hook functionality (and instead, for example, provide the URL of
> a Git repo instead from which we can communicate using a new
> fetch-blob protocol), although that would reduce the usefulness of
> this, especially during the transition period in which we don't have
> any sort of batching of requests.

If I understand correctly this is where we aim to be once all is said
and done.  I guess the question is what are we willing to do during the
transition phase.

-- 
Brandon Williams


Re: Proposal for missing blob support in Git repos

2017-05-01 Thread Jonathan Tan

On 05/01/2017 04:29 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


Thanks for your comments. If you're referring to the codepath
involving write_sha1_file() (for example, builtin/hash-object ->
index_fd or builtin/unpack-objects), that is fine because
write_sha1_file() invokes freshen_packed_object() and
freshen_loose_object() directly to check if the object already exists
(and thus does not invoke the new mechanism in this patch).


Is that a good thing, though?  It means that you an attacker can
feed one version to the remote object store your "grab blob" hook
gets the blobs from, and have you add a colliding object locally,
and the usual "are we recording the same object as existing one?"
check is bypassed.


If I understand this correctly, what you mean is the situation where the 
hook adds an object to the local repo, overriding another object of the 
same name? If yes, I think that is the nature of executing an arbitrary 
command. If we really want to avoid that, we could drop the hook 
functionality (and instead, for example, provide the URL of a Git repo 
instead from which we can communicate using a new fetch-blob protocol), 
although that would reduce the usefulness of this, especially during the 
transition period in which we don't have any sort of batching of requests.


Re: [RFC] [GSoC] Port git-add--interactive.perl:status_cmd to C

2017-05-01 Thread Junio C Hamano
"Daniel Ferreira (theiostream)"  writes:

> Reproducing either of these comparisons "natively" would simply
> require running run_diff_index() or run_diff_files() with
> DIFF_FORMAT_NUMSTAT and tweaking diff_flush() to format appropriately
> for the "interactive--add case".

A more usual way to drive the diff machinery and react to its
findings is to use DIFF_FORMAT_CALLBACK interface, and walk the
collected diff_queue to work on the paths discovered.  The way
wt-status.c builds "Changes to be committed" list out of the
"diff-index --cached" it internally runs and "Changes not staged for
commit" out of the "diff-files" it internally runs would be a good
example to study.


Re: [PATCH] credential doc: make multiple-helper behavior more prominent (Re: [PATCH] clone: handle empty config values in -c)

2017-05-01 Thread Jonathan Nieder
Starting out the reviews:

Jonathan Nieder wrote:

[...]
> configuration item to empty before giving it a new value.  This is
> already documented by the documentation is hard to find ---
 ^^
s/by/but/

Sorry for the confusion.

[...]
> +++ b/Documentation/gitcredentials.txt
[...]
> @@ -162,6 +152,16 @@ helper::
>   shell (so, for example, setting this to `foo --option=bar` will execute
>   `git credential-foo --option=bar` via the shell. See the manual of
>   specific helpers for examples of their use.
> ++
> +If there are multiple instances of the `credential.helper` configuration
> +variable, each helper will be tried in turn, and may provide a username,
> +password, or nothing. Once Git has acquired both a username and a
> +password, no more helpers will be tried.
> ++
> +If `credential.helper` is configured to the empty string, this resets
> +the helper list to empty (so you may override a helper set by a
> +lower-priority config file by configuring the empty-string helper,

It's not necessarily obvious to a new user what "lower-priority" means.

Since this text is an example, maybe it should say something like "so,
for example, you can override a helper set in /etc/gitconfig by
configuring the empty-string helper followed by whatever set of
helpers you would like in ~/.gitconfig".

That's orthogonal to this patch but it should be straightforward to
roll in if it makes sense.

Thanks,
Jonathan


Re: Bug Report: .gitignore behavior is not matching in git clean and git status

2017-05-01 Thread Junio C Hamano
Samuel Lijin  writes:

> After some more digging (and familiarizing myself with the
> behind-the-scenes logic) the issue is that dir.c has this implicit
> assumption that a directory which contains only untracked and ignored
> files should itself be considered untracked. While that works fine for
> use cases where we're asking if a directory should be added to the git
> database, that decidedly does not make sense when we're asking if a
> directory can be removed from the working tree.

Thanks for digging.

> I'm not sure where to proceed from here. I see two ways forward: one,
> builtin/clean.c can collect ignored files when it calls
> dir.c:fill_directory(), and then clean -d can prune out directories
> that contain ignored files; two, path_treatment can learn about
> untracked directories which contain excluded (ignored) files.

My gut feeling is that the former approach would be of lesser
impact.  Directory A/ can be removed "clean" without "-x" when there
is nothing tracked in there in the index and there is no ignored
paths.  Having zero untracked files there or one or more untracked
file there do not matter---they are all subject to removal by
"clean".


Re: [PATCH] credential doc: make multiple-helper behavior more prominent (Re: [PATCH] clone: handle empty config values in -c)

2017-05-01 Thread Brandon Williams
On 05/01, Jonathan Nieder wrote:
> Subject: credential doc: make multiple-helper behavior more prominent
> 
> Git's configuration system works by reading multiple configuration
> files in order, from general to specific:
> 
>  - first, the system configuration /etc/gitconfig
>  - then the user's configuration (~/.gitconfig or ~/.config/git/config)
>  - then the repository configuration (.git/config)
> 
> For single-valued configuration items, the latest value wins.  For
> multi-valued configuration items, values accumulate in that order.
> 
> For example, this allows setting a credential helper globally in
> ~/.gitconfig that git will try to use in all repositories, regardless
> of whether they additionally provide another helper.  This is usually
> a nice thing --- e.g. I can install helpers to use my OS keychain and
> to cache credentials for a short period of time globally.
> 
> Sometimes people want to be able to override an inherited setting.
> For the credential.helper setting, this is done by setting the
> configuration item to empty before giving it a new value.  This is
> already documented by the documentation is hard to find ---
> git-config(1) says to look at gitcredentials(7) and the config
> reference in gitcredentials(7) doesn't mention this issue.
> 
> Move the documentation to the config reference to make it easier to
> find.
> 
> Signed-off-by: Jonathan Nieder 
> ---
> Brandon Williams wrote:
> 
> >> Noticed while trying to set credential.helper during a clone to use a
> >> specific helper without inheriting from ~/.gitconfig and
> >> /etc/gitconfig.  That is, I ran
> >>
> >>git clone -c credential.helper= \
> >>-c credential.helper=myhelper \
> >>https://example.com/repo
> >>
> >> intending to produce the configuration
> >>
> >>[credential]
> >>helper =
> >>helper = myhelper
> >>
> >> Without this patch, the 'helper =' line is not included and the
> >> credential helper from /etc/gitconfig gets used.
> >>
> >> Signed-off-by: Jonathan Nieder 
> >> ---
> >> Thoughts?
> >
> > After reading this I'm still a little fuzzy on why the empty helper line
> > is needed to avoid using the credential helper from /etc/gitconfig.
> 
> See "git help credentials":
> 
>If there are multiple instances of the credential.helper configuration
>variable, each helper will be tried in turn, and may provide a
>username, password, or nothing. Once Git has acquired both a username
>and a password, no more helpers will be tried.
> 
>If credential.helper is configured to the empty string, this resets the
>helper list to empty (so you may override a helper set by a
>lower-priority config file by configuring the empty-string helper,
>followed by whatever set of helpers you would like).
> 
> That's a bit obscure, though --- I didn't find it when I looked in "git
> help config".  How about this patch?
> 
> Tested using 'make -C Documentation gitcredentials.7'.
> 
>  Documentation/gitcredentials.txt | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/gitcredentials.txt 
> b/Documentation/gitcredentials.txt
> index f3a75d1ce1..f970196bc1 100644
> --- a/Documentation/gitcredentials.txt
> +++ b/Documentation/gitcredentials.txt
> @@ -101,16 +101,6 @@ $ git help credential-foo
>  $ git config --global credential.helper foo
>  ---
>  
> -If there are multiple instances of the `credential.helper` configuration
> -variable, each helper will be tried in turn, and may provide a username,
> -password, or nothing. Once Git has acquired both a username and a
> -password, no more helpers will be tried.
> -
> -If `credential.helper` is configured to the empty string, this resets
> -the helper list to empty (so you may override a helper set by a
> -lower-priority config file by configuring the empty-string helper,
> -followed by whatever set of helpers you would like).
> -
>  
>  CREDENTIAL CONTEXTS
>  ---
> @@ -162,6 +152,16 @@ helper::
>   shell (so, for example, setting this to `foo --option=bar` will execute
>   `git credential-foo --option=bar` via the shell. See the manual of
>   specific helpers for examples of their use.
> ++
> +If there are multiple instances of the `credential.helper` configuration
> +variable, each helper will be tried in turn, and may provide a username,
> +password, or nothing. Once Git has acquired both a username and a
> +password, no more helpers will be tried.
> ++
> +If `credential.helper` is configured to the empty string, this resets
> +the helper list to empty (so you may override a helper set by a
> +lower-priority config file by configuring the empty-string helper,
> +followed by whatever set of helpers you would like).

Thanks, this clears up the confusion.

>  
>  username::
>  
> -- 
> 2.13.0.rc1.294.g07d810a77f

[PATCH] credential doc: make multiple-helper behavior more prominent (Re: [PATCH] clone: handle empty config values in -c)

2017-05-01 Thread Jonathan Nieder
Subject: credential doc: make multiple-helper behavior more prominent

Git's configuration system works by reading multiple configuration
files in order, from general to specific:

 - first, the system configuration /etc/gitconfig
 - then the user's configuration (~/.gitconfig or ~/.config/git/config)
 - then the repository configuration (.git/config)

For single-valued configuration items, the latest value wins.  For
multi-valued configuration items, values accumulate in that order.

For example, this allows setting a credential helper globally in
~/.gitconfig that git will try to use in all repositories, regardless
of whether they additionally provide another helper.  This is usually
a nice thing --- e.g. I can install helpers to use my OS keychain and
to cache credentials for a short period of time globally.

Sometimes people want to be able to override an inherited setting.
For the credential.helper setting, this is done by setting the
configuration item to empty before giving it a new value.  This is
already documented by the documentation is hard to find ---
git-config(1) says to look at gitcredentials(7) and the config
reference in gitcredentials(7) doesn't mention this issue.

Move the documentation to the config reference to make it easier to
find.

Signed-off-by: Jonathan Nieder 
---
Brandon Williams wrote:

>> Noticed while trying to set credential.helper during a clone to use a
>> specific helper without inheriting from ~/.gitconfig and
>> /etc/gitconfig.  That is, I ran
>>
>>  git clone -c credential.helper= \
>>  -c credential.helper=myhelper \
>>  https://example.com/repo
>>
>> intending to produce the configuration
>>
>>  [credential]
>>  helper =
>>  helper = myhelper
>>
>> Without this patch, the 'helper =' line is not included and the
>> credential helper from /etc/gitconfig gets used.
>>
>> Signed-off-by: Jonathan Nieder 
>> ---
>> Thoughts?
>
> After reading this I'm still a little fuzzy on why the empty helper line
> is needed to avoid using the credential helper from /etc/gitconfig.

See "git help credentials":

   If there are multiple instances of the credential.helper configuration
   variable, each helper will be tried in turn, and may provide a
   username, password, or nothing. Once Git has acquired both a username
   and a password, no more helpers will be tried.

   If credential.helper is configured to the empty string, this resets the
   helper list to empty (so you may override a helper set by a
   lower-priority config file by configuring the empty-string helper,
   followed by whatever set of helpers you would like).

That's a bit obscure, though --- I didn't find it when I looked in "git
help config".  How about this patch?

Tested using 'make -C Documentation gitcredentials.7'.

 Documentation/gitcredentials.txt | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/Documentation/gitcredentials.txt b/Documentation/gitcredentials.txt
index f3a75d1ce1..f970196bc1 100644
--- a/Documentation/gitcredentials.txt
+++ b/Documentation/gitcredentials.txt
@@ -101,16 +101,6 @@ $ git help credential-foo
 $ git config --global credential.helper foo
 ---
 
-If there are multiple instances of the `credential.helper` configuration
-variable, each helper will be tried in turn, and may provide a username,
-password, or nothing. Once Git has acquired both a username and a
-password, no more helpers will be tried.
-
-If `credential.helper` is configured to the empty string, this resets
-the helper list to empty (so you may override a helper set by a
-lower-priority config file by configuring the empty-string helper,
-followed by whatever set of helpers you would like).
-
 
 CREDENTIAL CONTEXTS
 ---
@@ -162,6 +152,16 @@ helper::
shell (so, for example, setting this to `foo --option=bar` will execute
`git credential-foo --option=bar` via the shell. See the manual of
specific helpers for examples of their use.
++
+If there are multiple instances of the `credential.helper` configuration
+variable, each helper will be tried in turn, and may provide a username,
+password, or nothing. Once Git has acquired both a username and a
+password, no more helpers will be tried.
++
+If `credential.helper` is configured to the empty string, this resets
+the helper list to empty (so you may override a helper set by a
+lower-priority config file by configuring the empty-string helper,
+followed by whatever set of helpers you would like).
 
 username::
 
-- 
2.13.0.rc1.294.g07d810a77f



Re: [PATCH] clone: handle empty config values in -c

2017-05-01 Thread Brandon Williams
On 05/01, Jonathan Nieder wrote:
> "git clone --config" uses the following incantation to add an item to
> a config file, instead of replacing an existing value:
> 
>   git_config_set_multivar_gently(key, value, "^$", 0)
> 
> As long as no existing value matches the regex ^$, that works as
> intended and adds to the config.  When a value is empty, though, it
> replaces the existing value.
> 
> Noticed while trying to set credential.helper during a clone to use a
> specific helper without inheriting from ~/.gitconfig and
> /etc/gitconfig.  That is, I ran
> 
>   git clone -c credential.helper= \
>   -c credential.helper=myhelper \
>   https://example.com/repo
> 
> intending to produce the configuration
> 
>   [credential]
>   helper =
>   helper = myhelper
> 
> Without this patch, the 'helper =' line is not included and the
> credential helper from /etc/gitconfig gets used.
> 
> Signed-off-by: Jonathan Nieder 
> ---
> Thoughts?

After reading this I'm still a little fuzzy on why the empty helper line
is needed to avoid using the credential helper from /etc/gitconfig.

> 
> Thanks,
> Jonathan
> 
>  builtin/clone.c | 4 +++-
>  t/t5611-clone-config.sh | 8 
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index de85b85254..a6ae7d6180 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -773,7 +773,9 @@ static int checkout(int submodule_progress)
>  
>  static int write_one_config(const char *key, const char *value, void *data)
>  {
> - return git_config_set_multivar_gently(key, value ? value : "true", 
> "^$", 0);
> + return git_config_set_multivar_gently(key,
> +   value ? value : "true",
> +   CONFIG_REGEX_NONE, 0);
>  }
>  
>  static void write_config(struct string_list *config)
> diff --git a/t/t5611-clone-config.sh b/t/t5611-clone-config.sh
> index e4850b778c..39329eb7a8 100755
> --- a/t/t5611-clone-config.sh
> +++ b/t/t5611-clone-config.sh
> @@ -19,6 +19,14 @@ test_expect_success 'clone -c can set multi-keys' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'clone -c can set multi-keys, including some empty' '
> + rm -rf child &&
> + git clone -c credential.helper= -c credential.helper=hi . child &&
> + printf "%s\n" "" hi >expect &&
> + git --git-dir=child/.git config --get-all credential.helper >actual &&
> + test_cmp expect actual
> +'
> +
>  test_expect_success 'clone -c without a value is boolean true' '
>   rm -rf child &&
>   git clone -c core.foo . child &&
> -- 
> 2.13.0.rc1.294.g07d810a77f
> 

-- 
Brandon Williams


[PATCH] clone: handle empty config values in -c

2017-05-01 Thread Jonathan Nieder
"git clone --config" uses the following incantation to add an item to
a config file, instead of replacing an existing value:

git_config_set_multivar_gently(key, value, "^$", 0)

As long as no existing value matches the regex ^$, that works as
intended and adds to the config.  When a value is empty, though, it
replaces the existing value.

Noticed while trying to set credential.helper during a clone to use a
specific helper without inheriting from ~/.gitconfig and
/etc/gitconfig.  That is, I ran

git clone -c credential.helper= \
-c credential.helper=myhelper \
https://example.com/repo

intending to produce the configuration

[credential]
helper =
helper = myhelper

Without this patch, the 'helper =' line is not included and the
credential helper from /etc/gitconfig gets used.

Signed-off-by: Jonathan Nieder 
---
Thoughts?

Thanks,
Jonathan

 builtin/clone.c | 4 +++-
 t/t5611-clone-config.sh | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index de85b85254..a6ae7d6180 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -773,7 +773,9 @@ static int checkout(int submodule_progress)
 
 static int write_one_config(const char *key, const char *value, void *data)
 {
-   return git_config_set_multivar_gently(key, value ? value : "true", 
"^$", 0);
+   return git_config_set_multivar_gently(key,
+ value ? value : "true",
+ CONFIG_REGEX_NONE, 0);
 }
 
 static void write_config(struct string_list *config)
diff --git a/t/t5611-clone-config.sh b/t/t5611-clone-config.sh
index e4850b778c..39329eb7a8 100755
--- a/t/t5611-clone-config.sh
+++ b/t/t5611-clone-config.sh
@@ -19,6 +19,14 @@ test_expect_success 'clone -c can set multi-keys' '
test_cmp expect actual
 '
 
+test_expect_success 'clone -c can set multi-keys, including some empty' '
+   rm -rf child &&
+   git clone -c credential.helper= -c credential.helper=hi . child &&
+   printf "%s\n" "" hi >expect &&
+   git --git-dir=child/.git config --get-all credential.helper >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'clone -c without a value is boolean true' '
rm -rf child &&
git clone -c core.foo . child &&
-- 
2.13.0.rc1.294.g07d810a77f



Re: [PATCH v2 53/53] object: convert parse_object* to take struct object_id

2017-05-01 Thread Jonathan Tan

On 04/30/2017 07:29 PM, brian m. carlson wrote:

Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:


I've finished my review of the 53 patches and it all looks fine to me.


Re: git loses a commit after reordering.

2017-05-01 Thread Junio C Hamano
Nikita Orlov  writes:

>>On Sun, 30 Apr 2017, 1:56 +03:00 from Kevin Daudt :
>>Not sure if this is the case here, but it at least confirms that rebase
>>--preserve-merges was not meant to reorder commits.
>>
>>See [this][1] thread for more background on this limitation.
>>
>>[0]: https://git-scm.com/docs/git-rebase#_bugs
>>[1]: 
>>https://public-inbox.org/git/1mtveu4.19lvgi1c0hmham%25li...@haller-berlin.de/
>
> This is it. As I understand git-rebase--helper is an attempt to fix the bug.
> But it's still in development, isn't it?

The impression I have been getting is that "rebase -p" was abandoned
by its author, and the recent rebase--helper work is solely for
re-implementing the performance sensitive part of "-i" (interactive).


Re: Proposal for missing blob support in Git repos

2017-05-01 Thread Junio C Hamano
Jonathan Tan  writes:

> Thanks for your comments. If you're referring to the codepath
> involving write_sha1_file() (for example, builtin/hash-object ->
> index_fd or builtin/unpack-objects), that is fine because
> write_sha1_file() invokes freshen_packed_object() and
> freshen_loose_object() directly to check if the object already exists
> (and thus does not invoke the new mechanism in this patch).

Is that a good thing, though?  It means that you an attacker can
feed one version to the remote object store your "grab blob" hook
gets the blobs from, and have you add a colliding object locally,
and the usual "are we recording the same object as existing one?"
check is bypassed.



Re: [PATCHv2 0/3] Make diff plumbing commands respect the indentHeuristic.

2017-05-01 Thread Junio C Hamano
Jeff King  writes:

> PS Outside of our test scripts, I'd probably just have written:
>
>  perl -lpe 'print "extra line" if $. == 2'
>
>I think we have traditionally preferred sed/awk to perl, but given
>the heavy use of vanilla perl elsewhere in the test suite, I think
>that is maybe just superstition at this point.

I would have avoided sed with 'a', 'i' and 'c' in one-liners myself,
and a Perl script like the above would probably have been my choice.




Re: [PATCH v2 39/53] refs/files-backend: convert many internals to struct object_id

2017-05-01 Thread Jonathan Tan

On 04/30/2017 07:29 PM, brian m. carlson wrote:

@@ -195,27 +195,18 @@ static const char PACKED_REFS_HEADER[] =
  * Return a pointer to the refname within the line (null-terminated),
  * or NULL if there was a problem.
  */
-static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
+static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
 {
const char *ref;

-   /*
-* 42: the answer to everything.
-*
-* In this case, it happens to be the answer to
-*  40 (length of sha1 hex representation)
-*  +1 (space in between hex and name)
-*  +1 (newline at the end of the line)
-*/
-   if (line->len <= 42)
+   if (!line->len)
return NULL;


I would omit this check - parse_oid_hex already exits early if the first 
character is NUL. (The existing code makes a bit more sense, in that it 
avoids checking the first few characters if we already know a bit more 
about the string.)




-   if (get_sha1_hex(line->buf, sha1) < 0)
+   if (parse_oid_hex(line->buf, oid, ) < 0)
return NULL;
-   if (!isspace(line->buf[40]))
+   if (!isspace(*ref++))
return NULL;

-   ref = line->buf + 41;
if (isspace(*ref))
return NULL;



Looks fine, up to here.

(Also, you requested extra-careful review for this patch, but this patch 
seems mostly mechanical to me.)


Re: What's cooking in git.git (May 2017, #01; Mon, 1)

2017-05-01 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> * ab/grep-pcre-v2 (2017-04-25) 20 commits
>>  - SQUASH???
>>  - Makefile & configure: make PCRE v2 the default PCRE implementation
>>  - grep: remove support for concurrent use of both PCRE v1 & v2
>>  - grep: add support for PCRE v2
>>  - grep: add support for the PCRE v1 JIT API
>>  - perf: add a performance comparison test of grep -E and -P
>>  - grep: change the internal PCRE code & header names to be PCRE1
>>  - grep: change the internal PCRE macro names to be PCRE1
>>  - test-lib: rename the LIBPCRE prerequisite to PCRE
>>  - grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
>>  - grep & rev-list doc: stop promising libpcre for --perl-regexp
>>  - log: add -P as a synonym for --perl-regexp
>>  - log: add exhaustive tests for pattern style options & config
>>  - grep: add a test for backreferences in PCRE patterns
>>  - Makefile & configure: reword outdated comment about PCRE
>>  - grep: remove redundant `regflags &= ~REG_EXTENDED` assignments
>>  - grep: remove redundant regflags assignment under PCRE
>>  - grep: submodule-related case statements should die if new fields are added
>>  - grep: add tests for grep pattern types being passed to submodules
>>  - grep: amend submodule recursion test in preparation for rx engine testing
>>
>>  PCRE2, which has an API different from and incompatible with PCRE,
>>  can now be chosen to support "grep -P -e ''" and friends.
>
> That squash looks good to me.

Thanks.

That is not a particulary helpful comment, by the way.  I can help
topics by contributors by queuing emergency fix at the tip to make
ones that do not build correctly buildable and testable (which is
what the "SQUASH???" commits are about), but I'd rather not see me
forced to find among 19 commits which one is broken and needs the
hotfix squashed in myself.

>> * ab/grep-threading-cleanup (2017-04-16) 8 commits
>>  - grep: given --threads with NO_PTHREADS=YesPlease, warn
>>  - pack-objects: fix buggy warning about threads under NO_PTHREADS=YesPlease
>>  - pack-object & index-pack: add test for --threads warning under NO_PTHREADS
>>  - tests: add a PTHREADS prerequisite
>>  - grep: skip pthreads overhead when using one thread
>>  - grep: don't redundantly compile throwaway patterns under threading
>>  - grep: add tests for --threads=N and grep.threads
>>  - grep: assert that threading is enabled when calling grep_{lock,unlock}
>>
>>  Code cleanup.
>>
>>  Needs review.
>
> Between these two series there's 27 patches, and I understand it's a
> bit of a PITA to review/get comments on it.
>
> Anything I should be doing differently here other than just waiting
> for 2.13 to come out so they can be cooked further & merged down to
> next & then master if there's no objections?

There are topics that need fresh eyes to be reviewed by other
contributors, so perhaps you can help unblock them by reviewing,
while they pick lints from yours?


[RFC] [GSoC] Port git-add--interactive.perl:status_cmd to C

2017-05-01 Thread Daniel Ferreira (theiostream)
Hey there,

So, in the GSoC proposal I sent about porting git-add--interactive to
C[1], I expected I would be able to do a couple of small patches to
git-add to familiarize myself with the Git API and have a better clue
of how the porting process would go by. Due to the unexpected size my
microproject ended up taking (though in a good way, I think) I chose
to focus on that and ended up not sending those anticipated patches.
However, I *did* have time to study how I'd go for the port more
closely, and I'd like some comments on what you think of it. Be I
accepted or not into GSoC some days from now, I think the topic might
interest others that might want to tackle this task.

As I went through the code, I grew to believe that a good way to make
a step-by-step conversion might be to get add--interactive "commands"
(as in status_cmd, update_cmd, add_untracked_cmd etc.) ported one at a
time over a set of patch series. For each series, one X_cmd Perl
subroutine would become a one-line call to
"git-add-interactive--helper X " (a builtin). This would make
understanding a function as complex as list_and_choose() to being
ported to C way easier: for each command, a new functionality would be
added on-demand instead of having the whole function sent over at
once. Afterwards, a final series would likely set up the "interactive
main loop" in C and retire the Perl script.

I tried to apply this philosophy to porting `status_cmd` to C, and
easily got most of the configuration/color logic into C by mimicking
what builtins like clean.c do.

The next issue I came up with (and do not yet have a solution) is how
reproduce git-add--interactive's list_modified(). The Perl script runs
git-diff-index and git-diff-files, does some clever string parsing of
their outputs and "merges" the output of both in a single table to
display the number of added/removed lines in a given file BOTH between
HEAD+index and index+tree.

Reproducing either of these comparisons "natively" would simply
require running run_diff_index() or run_diff_files() with
DIFF_FORMAT_NUMSTAT and tweaking diff_flush() to format appropriately
for the "interactive--add case". Furthermore, the current output
prefix options would manage to cover the eventual case of `update_cmd`
selecting files. To output both comparisons, however, I can't come up
with any simple solution. Would you have any ideas?

-- Daniel.

[1]: 
https://public-inbox.org/git/caea2_rjef4vjgcaux8a1kwh1-vxllmv1--vjf9wieqof+gv...@mail.gmail.com/


Re: [PATCH v2 11/53] fast-import: convert to struct object_id

2017-05-01 Thread Jonathan Tan

On 05/01/2017 03:27 PM, Jeff King wrote:

On Mon, May 01, 2017 at 03:07:22PM -0700, Jonathan Tan wrote:


@@ -2298,8 +2296,12 @@ static uintmax_t do_change_note_fanout(
 static uintmax_t change_note_fanout(struct tree_entry *root,
unsigned char fanout)
 {
-   char hex_sha1[40], path[60];
-   return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
+   /*
+* The size of path is due to one slash between every two hex digits,
+* plus the terminating NUL.
+*/
+   char hex_oid[GIT_MAX_HEXSZ], path[GIT_MAX_HEXSZ * 3 / 2];


If your comment is correct, shouldn't the size of path be 61 (that is, add
"+ 1")? I took a look at do_change_note_fanout() and your comment seems
correct.


If you have 40 hex digits, then you have 20 hex pairs. But delimiting
them all takes only 19 slashes, since they only go in between pairs[1].

So the fully expanded formula is:

  GIT_MAX_HEXSZ +   (1) actual hex bytes
  (GIT_MAX_HEXSZ / 2) - 1 + (2) internal slashes between pairs
  1 (3) NUL terminator


Ah...yes, you're right. (And I checked that do_change_note_fanout() does 
place the slashes only between characters, not before or after them.)




which simplifies to 3/2 GIT_MAX_HEXSZ. It may be better to write it out
(the compiler can simplify) or explain that in the comment, though. It
took me a minute to figure out that it was correct, too.

-Peff

[1] This is sort of a reverse-fencepost error:
https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error



Re: [PATCH v2 11/53] fast-import: convert to struct object_id

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 03:07:22PM -0700, Jonathan Tan wrote:

> > @@ -2298,8 +2296,12 @@ static uintmax_t do_change_note_fanout(
> >  static uintmax_t change_note_fanout(struct tree_entry *root,
> > unsigned char fanout)
> >  {
> > -   char hex_sha1[40], path[60];
> > -   return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
> > +   /*
> > +* The size of path is due to one slash between every two hex digits,
> > +* plus the terminating NUL.
> > +*/
> > +   char hex_oid[GIT_MAX_HEXSZ], path[GIT_MAX_HEXSZ * 3 / 2];
> 
> If your comment is correct, shouldn't the size of path be 61 (that is, add
> "+ 1")? I took a look at do_change_note_fanout() and your comment seems
> correct.

If you have 40 hex digits, then you have 20 hex pairs. But delimiting
them all takes only 19 slashes, since they only go in between pairs[1].

So the fully expanded formula is:

  GIT_MAX_HEXSZ +   (1) actual hex bytes
  (GIT_MAX_HEXSZ / 2) - 1 + (2) internal slashes between pairs
  1 (3) NUL terminator

which simplifies to 3/2 GIT_MAX_HEXSZ. It may be better to write it out
(the compiler can simplify) or explain that in the comment, though. It
took me a minute to figure out that it was correct, too.

-Peff

[1] This is sort of a reverse-fencepost error:
https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error


Re: [PATCHv3 3/4] diff: enable indent heuristic by default

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 06:13:44PM -0400, Marc Branchaud wrote:

> From: Stefan Beller 
> 
> The feature was included in v2.11 (released 2016-11-29) and we got no
> negative feedback. Quite the opposite, all feedback we got was positive.
> 
> Turn it on by default. Users who dislike the feature can turn it off
> by setting diff.indentHeuristic (which also configures plumbing commands,
> see prior patches).
> 
> The change to t/t4051-diff-function-context.sh is needed because the
> heuristic shifts the changed hunk in the patch.  To get the same result
> regardless of the heuristic configuration, we modify the test file
> differently:  We insert a completely new line after line 2, instead of
> simply duplicating it.
> 
> Helped-by: Jeff King 
> Signed-off-by: Stefan Beller 
> Signed-off-by: Marc Branchaud 
> 
> ---
> 
> Tested the sed "2a" command's escaping in both Ubuntu 17.04 and FreeBSD 10.3.
> Threw in a little indenting so that it isn't too ugly.

I think that should be fine. The indenting will go into the output, but
we really don't care _what_ that line is, just as long as it's new
content.

> diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh
> index 13d3dc96a..56d7d7760 100755
> --- a/t/t4061-diff-indent.sh
> +++ b/t/t4061-diff-indent.sh
> @@ -153,7 +153,7 @@ test_expect_success 'prepare' '
>  '
>  
>  test_expect_success 'diff: ugly spaces' '
> - git diff old new -- spaces.txt >out &&
> + git diff --no-indent-heuristic old new -- spaces.txt >out &&
>   compare_diff spaces-expect out
>  '

I guess one could argue that most of t4061 should be rewritten. This
fixes the failures that the "ugly" version is no longer the default. But
the tests checking that diff.indentHeuristic=true works are basically
doing nothing (the interesting thing after this patch is that setting it
to false goes back to the ugly output).

I dunno. If we drop the config and options as experiments, then the
whole script basically goes away in favor of checking that the test case
has the non-ugly output. If that's our endgame, it may not be worth
spending too much time refactoring it. But if we're going to keep the
config indefinitely, maybe we should make sure it works.

-Peff


Re: [PATCHv3 0/4] Make diff plumbing commands respect the indentHeuristic.

2017-05-01 Thread Stefan Beller
> Changes since v2:
>
>   Patch 1/4 : Unchanged.
>
>   Patch 2/4 : Mentioned how the new behaviour matches the diff Porcelain.
>
>   Patch 3/4 : Updated the tests.

Thanks for picking up that patch and carrying it further!
The whole series looks good to me.

Thanks,
Stefan


[PATCHv3 3/4] diff: enable indent heuristic by default

2017-05-01 Thread Marc Branchaud
From: Stefan Beller 

The feature was included in v2.11 (released 2016-11-29) and we got no
negative feedback. Quite the opposite, all feedback we got was positive.

Turn it on by default. Users who dislike the feature can turn it off
by setting diff.indentHeuristic (which also configures plumbing commands,
see prior patches).

The change to t/t4051-diff-function-context.sh is needed because the
heuristic shifts the changed hunk in the patch.  To get the same result
regardless of the heuristic configuration, we modify the test file
differently:  We insert a completely new line after line 2, instead of
simply duplicating it.

Helped-by: Jeff King 
Signed-off-by: Stefan Beller 
Signed-off-by: Marc Branchaud 

---

Tested the sed "2a" command's escaping in both Ubuntu 17.04 and FreeBSD 10.3.
Threw in a little indenting so that it isn't too ugly.

 diff.c   | 2 +-
 t/t4051-diff-function-context.sh | 3 ++-
 t/t4061-diff-indent.sh   | 6 +++---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index da96577ea..2c47ccb4a 100644
--- a/diff.c
+++ b/diff.c
@@ -27,7 +27,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_indent_heuristic; /* experimental */
+static int diff_indent_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
diff --git a/t/t4051-diff-function-context.sh b/t/t4051-diff-function-context.sh
index 6154acb45..3e6b485ec 100755
--- a/t/t4051-diff-function-context.sh
+++ b/t/t4051-diff-function-context.sh
@@ -72,7 +72,8 @@ test_expect_success 'setup' '
 
# overlap function context of 1st change and -u context of 2nd change
grep -v "delete me from hello" <"$dir/hello.c" >file.c &&
-   sed 2p <"$dir/dummy.c" >>file.c &&
+   sed "2a\\
+extra line" <"$dir/dummy.c" >>file.c &&
commit_and_tag changed_hello_dummy file.c &&
 
git checkout initial &&
diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh
index 13d3dc96a..56d7d7760 100755
--- a/t/t4061-diff-indent.sh
+++ b/t/t4061-diff-indent.sh
@@ -153,7 +153,7 @@ test_expect_success 'prepare' '
 '
 
 test_expect_success 'diff: ugly spaces' '
-   git diff old new -- spaces.txt >out &&
+   git diff --no-indent-heuristic old new -- spaces.txt >out &&
compare_diff spaces-expect out
 '
 
@@ -183,7 +183,7 @@ test_expect_success 'diff: --indent-heuristic with 
--histogram' '
 '
 
 test_expect_success 'diff: ugly functions' '
-   git diff old new -- functions.c >out &&
+   git diff --no-indent-heuristic old new -- functions.c >out &&
compare_diff functions-expect out
 '
 
@@ -193,7 +193,7 @@ test_expect_success 'diff: nice functions with 
--indent-heuristic' '
 '
 
 test_expect_success 'blame: ugly spaces' '
-   git blame old..new -- spaces.txt >out-blame &&
+   git blame --no-indent-heuristic old..new -- spaces.txt >out-blame &&
compare_blame spaces-expect out-blame
 '
 
-- 
2.13.0.rc1.15.gf67d331ad



[PATCHv3 4/4] add--interactive: drop diff.indentHeuristic handling

2017-05-01 Thread Marc Branchaud
From: Jeff King 

Now that diff.indentHeuristic is handled automatically by the plumbing
commands, there's no need to propagate it manually.

Signed-off-by: Jeff King 
Signed-off-by: Marc Branchaud 
---
 git-add--interactive.perl | 4 
 1 file changed, 4 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 709a5f6ce..79d675b5b 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -46,7 +46,6 @@ my ($diff_new_color) =
 my $normal_color = $repo->get_color("", "reset");
 
 my $diff_algorithm = $repo->config('diff.algorithm');
-my $diff_indent_heuristic = $repo->config_bool('diff.indentheuristic');
 my $diff_filter = $repo->config('interactive.difffilter');
 
 my $use_readkey = 0;
@@ -730,9 +729,6 @@ sub parse_diff {
if (defined $diff_algorithm) {
splice @diff_cmd, 1, 0, "--diff-algorithm=${diff_algorithm}";
}
-   if ($diff_indent_heuristic) {
-   splice @diff_cmd, 1, 0, "--indent-heuristic";
-   }
if (defined $patch_mode_revision) {
push @diff_cmd, get_diff_reference($patch_mode_revision);
}
-- 
2.13.0.rc1.15.gf67d331ad



[PATCHv3 0/4] Make diff plumbing commands respect the indentHeuristic.

2017-05-01 Thread Marc Branchaud
On 2017-04-29 09:14 AM, Jeff King wrote:
> On Sat, Apr 29, 2017 at 08:40:52AM -0400, Jeff King wrote:
> 
>> On Fri, Apr 28, 2017 at 06:33:12PM -0400, Marc Branchaud wrote:
>>
>>> v2: Fixed up the commit messages and added tests.
>>>
>>> Marc Branchaud (2):
>>>   diff: make the indent heuristic part of diff's basic configuration
>>>   diff: have the diff-* builtins configure diff before initializing
>>> revisions
>>>
>>> Stefan Beller (1):
>>>   diff: enable indent heuristic by default
>>
>> Thanks, these look fine to me. I'd like to get an ACK from Michael, in
>> case he had some other reason for omitting them from git_diff_ui_config
>> (from my recollection, it's probably just a mix of conservatism and
>> following what the compaction heuristic had done).
> 
> Sorry, I spoke too soon. The third one needs a few test adjustments
> squashed in to pass the tests.

Doh!  That'll teach me to try to do this stuff at the end of a Friday...

One more try, then:

Changes since v2:

  Patch 1/4 : Unchanged.

  Patch 2/4 : Mentioned how the new behaviour matches the diff Porcelain.

  Patch 3/4 : Updated the tests.

  Patch 4/4 : (New) Jeff's add--interactive patch.

Thanks for all the help, Jeff!

M.


Jeff King (1):
  add--interactive: drop diff.indentHeuristic handling

Marc Branchaud (2):
  diff: make the indent heuristic part of diff's basic configuration
  diff: have the diff-* builtins configure diff before initializing
revisions

Stefan Beller (1):
  diff: enable indent heuristic by default

 builtin/diff-files.c |  2 +-
 builtin/diff-index.c |  2 +-
 builtin/diff-tree.c  |  2 +-
 diff.c   |  8 ++---
 git-add--interactive.perl|  4 ---
 t/t4051-diff-function-context.sh |  3 +-
 t/t4061-diff-indent.sh   | 72 ++--
 7 files changed, 78 insertions(+), 15 deletions(-)

-- 
2.13.0.rc1.15.gf67d331ad



[PATCHv3 1/4] diff: make the indent heuristic part of diff's basic configuration

2017-05-01 Thread Marc Branchaud
This heuristic was originally introduced as an experimental feature,
and therefore part of the UI configuration.

But the user often sees diffs generated by plumbing commands like
diff-tree.  Moving the indent heuristic into diff's basic configuration
prepares the way for diff plumbing commands to respect the setting.

The heuristic itself merely makes the diffs more aesthetically
pleasing, without changing their correctness.  Scripts that rely on
the diff plumbing commands should not care whether or not the heuristic
is employed.

Signed-off-by: Marc Branchaud 
---
 diff.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 11eef1c85..da96577ea 100644
--- a/diff.c
+++ b/diff.c
@@ -290,9 +290,6 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
 
-   if (git_diff_heuristic_config(var, value, cb) < 0)
-   return -1;
-
if (!strcmp(var, "diff.wserrorhighlight")) {
int val = parse_ws_error_highlight(value);
if (val < 0)
@@ -351,6 +348,9 @@ int git_diff_basic_config(const char *var, const char 
*value, void *cb)
if (starts_with(var, "submodule."))
return parse_submodule_config_option(var, value);
 
+   if (git_diff_heuristic_config(var, value, cb) < 0)
+   return -1;
+
return git_default_config(var, value, cb);
 }
 
-- 
2.13.0.rc1.15.gf67d331ad



[PATCHv3 2/4] diff: have the diff-* builtins configure diff before initializing revisions

2017-05-01 Thread Marc Branchaud
This matches how the diff Porcelain works.  It makes the plumbing commands
respect diff's configuration options, such as indentHeuristic, because
init_revisions() calls diff_setup() which fills in the diff_options struct.

Signed-off-by: Marc Branchaud 
---
 builtin/diff-files.c   |  2 +-
 builtin/diff-index.c   |  2 +-
 builtin/diff-tree.c|  2 +-
 t/t4061-diff-indent.sh | 66 ++
 4 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 15c61fd8d..a572da9d5 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -20,9 +20,9 @@ int cmd_diff_files(int argc, const char **argv, const char 
*prefix)
int result;
unsigned options = 0;
 
+   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(, prefix);
gitmodules_config();
-   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
rev.abbrev = 0;
precompose_argv(argc, argv);
 
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 1af373d00..f084826a2 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -17,9 +17,9 @@ int cmd_diff_index(int argc, const char **argv, const char 
*prefix)
int i;
int result;
 
+   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(, prefix);
gitmodules_config();
-   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
rev.abbrev = 0;
precompose_argv(argc, argv);
 
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 326f88b65..36a3a1976 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -105,9 +105,9 @@ int cmd_diff_tree(int argc, const char **argv, const char 
*prefix)
struct setup_revision_opt s_r_opt;
int read_stdin = 0;
 
+   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(opt, prefix);
gitmodules_config();
-   git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
opt->abbrev = 0;
opt->diff = 1;
opt->disable_stdin = 1;
diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh
index 556450609..13d3dc96a 100755
--- a/t/t4061-diff-indent.sh
+++ b/t/t4061-diff-indent.sh
@@ -213,4 +213,70 @@ test_expect_success 'blame: --no-indent-heuristic 
overrides config' '
compare_blame spaces-expect out-blame2
 '
 
+test_expect_success 'diff-tree: nice spaces with --indent-heuristic' '
+   git diff-tree --indent-heuristic -p old new -- spaces.txt 
>out-diff-tree-compacted &&
+   compare_diff spaces-compacted-expect out-diff-tree-compacted
+'
+
+test_expect_success 'diff-tree: nice spaces with diff.indentHeuristic' '
+   git -c diff.indentHeuristic=true diff-tree -p old new -- spaces.txt 
>out-diff-tree-compacted2 &&
+   compare_diff spaces-compacted-expect out-diff-tree-compacted2
+'
+
+test_expect_success 'diff-tree: --no-indent-heuristic overrides config' '
+   git -c diff.indentHeuristic=true diff-tree --no-indent-heuristic -p old 
new -- spaces.txt >out-diff-tree &&
+   compare_diff spaces-expect out-diff-tree
+'
+
+test_expect_success 'diff-index: nice spaces with --indent-heuristic' '
+   git checkout -B diff-index &&
+   git reset --soft HEAD~ &&
+   git diff-index --indent-heuristic -p old -- spaces.txt 
>out-diff-index-compacted &&
+   compare_diff spaces-compacted-expect out-diff-index-compacted &&
+   git checkout -f master
+'
+
+test_expect_success 'diff-index: nice spaces with diff.indentHeuristic' '
+   git checkout -B diff-index &&
+   git reset --soft HEAD~ &&
+   git -c diff.indentHeuristic=true diff-index -p old -- spaces.txt 
>out-diff-index-compacted2 &&
+   compare_diff spaces-compacted-expect out-diff-index-compacted2 &&
+   git checkout -f master
+'
+
+test_expect_success 'diff-index: --no-indent-heuristic overrides config' '
+   git checkout -B diff-index &&
+   git reset --soft HEAD~ &&
+   git -c diff.indentHeuristic=true diff-index --no-indent-heuristic -p 
old -- spaces.txt >out-diff-index &&
+   compare_diff spaces-expect out-diff-index &&
+   git checkout -f master
+'
+
+test_expect_success 'diff-files: nice spaces with diff.indentHeuristic' '
+   git checkout -B diff-files &&
+   git reset HEAD~ &&
+   git -c diff.indentHeuristic=true diff-files -p spaces.txt 
>out-diff-files-raw &&
+   grep -v index out-diff-files-raw >out-diff-files-compacted &&
+   compare_diff spaces-compacted-expect out-diff-files-compacted &&
+   git checkout -f master
+'
+
+test_expect_success 'diff-files: nice spaces with diff.indentHeuristic' '
+   git checkout -B diff-files &&
+   git reset HEAD~ &&
+   git -c diff.indentHeuristic=true diff-files -p spaces.txt 
>out-diff-files-raw2 &&
+   grep -v index out-diff-files-raw2 

Re: [PATCH v2 11/53] fast-import: convert to struct object_id

2017-05-01 Thread Jonathan Tan

On 04/30/2017 07:29 PM, brian m. carlson wrote:

@@ -391,10 +391,8 @@ static void write_branch_report(FILE *rpt, struct branch 
*b)
fputc('\n', rpt);

fprintf(rpt, "  tip commit  : %s\n", oid_to_hex(>oid));
-   fprintf(rpt, "  old tree: %s\n",
-   oid_to_hex(>branch_tree.versions[0].oid));
-   fprintf(rpt, "  cur tree: %s\n",
-   oid_to_hex(>branch_tree.versions[1].oid));
+   fprintf(rpt, "  old tree: %s\n", 
oid_to_hex(>branch_tree.versions[0].oid));
+   fprintf(rpt, "  cur tree: %s\n", 
oid_to_hex(>branch_tree.versions[1].oid));
fprintf(rpt, "  commit clock: %" PRIuMAX "\n", b->last_commit);

fputs("  last pack   : ", rpt);


These look like unnecessary line rewrappings.


@@ -2298,8 +2296,12 @@ static uintmax_t do_change_note_fanout(
 static uintmax_t change_note_fanout(struct tree_entry *root,
unsigned char fanout)
 {
-   char hex_sha1[40], path[60];
-   return do_change_note_fanout(root, root, hex_sha1, 0, path, 0, fanout);
+   /*
+* The size of path is due to one slash between every two hex digits,
+* plus the terminating NUL.
+*/
+   char hex_oid[GIT_MAX_HEXSZ], path[GIT_MAX_HEXSZ * 3 / 2];


If your comment is correct, shouldn't the size of path be 61 (that is, 
add "+ 1")? I took a look at do_change_note_fanout() and your comment 
seems correct.



+   return do_change_note_fanout(root, root, hex_oid, 0, path, 0, fanout);
 }

 /*


Other than these, patches 10 and 11 look fine.


Re: [PATCH v2 09/53] builtin/rev-parse: convert to struct object_id

2017-05-01 Thread Jonathan Tan

On 04/30/2017 07:29 PM, brian m. carlson wrote:

@@ -340,7 +340,7 @@ static int try_parent_shorthands(const char *arg)
}

if (include_rev)
-   show_rev(NORMAL, sha1, arg);
+   show_rev(NORMAL, , arg);
for (parents = commit->parents, parent_number = 1;
 parents;
 parents = parents->next, parent_number++) {
@@ -352,7 +352,7 @@ static int try_parent_shorthands(const char *arg)
if (symbolic)
name = xstrfmt("%s^%d", arg, parent_number);
show_rev(include_parents ? NORMAL : REVERSED,
-parents->item->object.oid.hash, name);
+   & parents->item->object.oid, name);


No space after ampersand.

Up to (and including) this patch, looks good. I didn't notice any 
changes other than introductions of parse_oid_hex() and mechanical changes.


Re: [PATCH] small memory leak fix

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 01:50:57PM -0700, Stefan Beller wrote:

> > diff --git a/remote.c b/remote.c
> > index 9f83fe2c4..2f8cb35a3 100644
> > --- a/remote.c
> > +++ b/remote.c
> > @@ -742,6 +742,8 @@ int for_each_remote(each_remote_fn fn, void *priv)
> >  r->push = parse_push_refspec(r->push_refspec_nr,
> >  r->push_refspec);
> >  result = fn(r, priv);
> > + free_refspecs(r->push, r->push_refspec_nr);
> > + free_refspecs(r->fetch, r->fetch_refspec_nr);
> 
> After freeing the refspec, r->push/fetch still points to
> the (now free'd) memory. We'd want to reset it to NULL as well,
> such that e.g. in this function
> 
> if (!r->fetch)
> ...
> 
> still works.
> 
> After reading this, I think we'd rather want to keep the fetch/push refspec
> around for the next access of the struct remote, and only free the memory
> when the remote itself is freed?
> 
> That however is a problem as we never free them, they are globals :(

Yeah, I think the point is that the whole "remotes" array is a
program-length global that never goes away (and must not, because after
read_config() sets the "loaded" flag, we would never reload it).

The "fetch" and "push" bits are lazily parsed from the refspec strings,
but are intended to have the same lifetime. Unlike the rest of it, we
_could_ drop them after use and then lazy-parse them again.

But note that we call an arbitrary callback in this function. What
expectations does it have for the lifetimes? Do any of the callbacks
record pointers to the refspecs? Or for that mater, the patch as shown
frees the refspecs even if we didn't just lazily allocate them in this
function (e.g., if we did so in remote_get_1()).

So I don't think freeing them is safe unless we do a complete audit of
access to those refspecs. And it's probably not worth the trouble; these
should just follow the same global-until-exit allocation scheme as the
rest of "struct remote".

-Peff


Re: [PATCH] cache-tree: reject entries with null sha1

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 11:00:58PM +0200, René Scharfe wrote:

> Am 01.05.2017 um 21:22 schrieb Jeff King:
> > On Mon, May 01, 2017 at 01:23:28PM +0200, René Scharfe wrote:
> > > I can only get gcc and clang to call memcpy instead of inlining it by
> > > specifying -fno-builtin.  Do you use that option?  If yes, why?  (Just
> > > curious.)
> > 
> > I do my normal edit-compile cycles with -O0 because it's fast, and
> > because it makes debugging much easier.
> 
> GCC and Clang still inline memcpy with -O0 alone (at least the versions
> I tested).

I just checked the assembler output and confirmed that is the case for
my builds, even with -O0. Which probably explains why I wasn't able to
replicate Duy's valgrind error in the first place.

-Peff


Re: [PATCH v2 00/53] object_id part 8

2017-05-01 Thread Stefan Beller
On Sun, Apr 30, 2017 at 7:28 PM, brian m. carlson
 wrote:
> This is the eighth series of patches to convert unsigned char [20] to
> struct object_id.  This series converts lookup_commit, lookup_blob,
> lookup_tree, lookup_tag, and finally parse_object to struct object_id.
>
> A small number of functions have temporaries inserted during the
> conversion in order to allow conversion of functions that still need to
> take unsigned char *; they are removed either later in the series or
> will be in a future series.
>
> This series can be fetched from the object-id-part8 branch from either
> of the follwing:
>
> https://github.com/bk2204/git
> https://git.crustytoothpaste.net/git/bmc/git.git
>
> Changes from v1:
> * Rebase on master.  This led to a conflict with the ref-cache changes in 
> patch
>   39.  Extra-careful review here would be welcome.
> * Undo the needless line rewrapping.
> * Fix the commit message typo.
> * Use GIT_MAX_RAWSZ instead of struct object_id for the pack checksum.

Thanks,

this addresses all reviewer concerns from last round.
I'll give my eyes some rest before reviewing it unless
someone else reviews this series as I find these large
cocci patches very tiring. It is like being a goal keeper
of the winning team in soccer, where a lot of the time
nothing really happens, but then there are these tiny
critical sections that need high concentration.

Thanks,
Stefan


Re: [PATCH] cache-tree: reject entries with null sha1

2017-05-01 Thread René Scharfe

Am 01.05.2017 um 21:22 schrieb Jeff King:

On Mon, May 01, 2017 at 01:23:28PM +0200, René Scharfe wrote:

I can only get gcc and clang to call memcpy instead of inlining it by
specifying -fno-builtin.  Do you use that option?  If yes, why?  (Just
curious.)


I do my normal edit-compile cycles with -O0 because it's fast, and
because it makes debugging much easier.


GCC and Clang still inline memcpy with -O0 alone (at least the versions
I tested).

René


Re: [PATCH] small memory leak fix

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 1:40 PM, David CARLIER  wrote:
> Hi this is my first submission, a memory leak found in the maint branch 
> (might be in master as well).
>
> Kind regards.

Hi and welcome to Git!

> From d98f3944780730447f111a4178c9d99f5110c260 Mon Sep 17 00:00:00 2001
> From: David Carlier 
> Date: Mon, 1 May 2017 21:14:09 +0100
> Subject: [PATCH] memory leaks fixes for remote lists.

We usually word the subject line as "area: what we do"
(C.f. $ git log --oneline --no-merges -- remote.c)
Maybe:

remote.c: plug memleak in for_each_remote

>
> both push and fetch lists were not freed thus
> using free_refspecs usage.
>
> Signed-off-by: David Carlier 
> ---
> remote.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/remote.c b/remote.c
> index 9f83fe2c4..2f8cb35a3 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -742,6 +742,8 @@ int for_each_remote(each_remote_fn fn, void *priv)
>  r->push = parse_push_refspec(r->push_refspec_nr,
>  r->push_refspec);
>  result = fn(r, priv);
> + free_refspecs(r->push, r->push_refspec_nr);
> + free_refspecs(r->fetch, r->fetch_refspec_nr);

After freeing the refspec, r->push/fetch still points to
the (now free'd) memory. We'd want to reset it to NULL as well,
such that e.g. in this function

if (!r->fetch)
...

still works.

After reading this, I think we'd rather want to keep the fetch/push refspec
around for the next access of the struct remote, and only free the memory
when the remote itself is freed?

That however is a problem as we never free them, they are globals :(

Thanks,
Stefan


[PATCH] small memory leak fix

2017-05-01 Thread David CARLIER
Hi this is my first submission, a memory leak found in the maint branch (might 
be in master as well).

Kind regards.

0001-memory-leaks-fixes-for-remote-lists.patch
Description: Binary data


Re: What's cooking in git.git (May 2017, #01; Mon, 1)

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 04:25:14PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > * ab/grep-pcre-v2 (2017-04-25) 20 commits
> [...]
> > * ab/grep-threading-cleanup (2017-04-16) 8 commits
> [...]
> >
> >  Needs review.
> 
> Between these two series there's 27 patches, and I understand it's a
> bit of a PITA to review/get comments on it.
> 
> Anything I should be doing differently here other than just waiting
> for 2.13 to come out so they can be cooked further & merged down to
> next & then master if there's no objections?

These are both on my todo list to review. That's not a promise of
timeliness, but at least they didn't get dropped into a void. :) If you
haven't heard anything post-2.13, I think reposting them then would be
reasonable.

-Peff


Re: [PATCH] cache-tree: reject entries with null sha1

2017-05-01 Thread Jeff King
On Mon, May 01, 2017 at 01:23:28PM +0200, René Scharfe wrote:

> Am 24.04.2017 um 12:39 schrieb Duy Nguyen:
> > BTW, I ran t7009 with valgrind and it reported this. Is it something
> > we should be worried about? I vaguely recall you're doing something
> > with prio-queue...
> > 
> > ==4246== Source and destination overlap in memcpy(0x5952990, 0x5952990, 16)
> > ==4246==at 0x4C2EACD: memcpy@@GLIBC_2.14 (in
> > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> > ==4246==by 0x545D05: swap (prio-queue.c:15)
> > ==4246==by 0x545D72: prio_queue_reverse (prio-queue.c:25)
> > ==4246==by 0x4CBC0C: sort_in_topological_order (commit.c:723)
> > ==4246==by 0x574C97: prepare_revision_walk (revision.c:2858)
> > ==4246==by 0x48A2BA: cmd_rev_list (rev-list.c:385)
> > ==4246==by 0x405A6F: run_builtin (git.c:371)
> > ==4246==by 0x405CDC: handle_builtin (git.c:572)
> > ==4246==by 0x405E51: run_argv (git.c:624)
> > ==4246==by 0x405FF3: cmd_main (git.c:701)
> > ==4246==by 0x4A48CE: main (common-main.c:43)
> 
> I can only get gcc and clang to call memcpy instead of inlining it by
> specifying -fno-builtin.  Do you use that option?  If yes, why?  (Just
> curious.)

I do my normal edit-compile cycles with -O0 because it's fast, and
because it makes debugging much easier.

> But I can't get Valgrind to report overlapping (nicely explained in
> http://valgrind.org/docs/manual/mc-manual.html#mc-manual.overlap, by
> the way), not for t7009 and not for the short test program at the
> bottom.  Do you set flags in GIT_VALGRIND_OPTIONS or use a special
> version of Valgrind?  I use valgrind-3.12.0.SVN from Debian testing.

I saw it with 3.12.0-1.1 on Debian unstable.

-Peff


Re: Proposal for missing blob support in Git repos

2017-05-01 Thread Jonathan Tan

On 04/30/2017 08:57 PM, Junio C Hamano wrote:

One thing I wonder is what the performance impact of a change like
this to the codepath that wants to see if an object does _not_ exist
in the repository.  When creating a new object by hashing raw data,
we see if an object with the same name already exists before writing
the compressed loose object out (or comparing the payload to detect
hash collision).  With a "missing blob" support, we'd essentially
spawn an extra process every time we want to create a new blob
locally, and most of the time that is done only to hear the external
command to say "no, we've never heard of such an object", with a
possibly large latency.

If we do not have to worry about that (or if it is no use to worry
about it, because we cannot avoid it if we wanted to do the lazy
loading of objects from elsewhere), then the patch presented here
looked like a sensible first step towards the stated goal.

Thanks.


Thanks for your comments. If you're referring to the codepath involving 
write_sha1_file() (for example, builtin/hash-object -> index_fd or 
builtin/unpack-objects), that is fine because write_sha1_file() invokes 
freshen_packed_object() and freshen_loose_object() directly to check if 
the object already exists (and thus does not invoke the new mechanism in 
this patch).


Having said that, looking at other parts of the fetching mechanism, 
there are a few calls to has_sha1_file() and others that might need to 
be checked. (We have already discussed one - the one in rev-list when 
invoked to check connectivity.) I could take a look at that, but was 
hoping for discussion on what I've sent so far (so that I know that I'm 
on the right track, and because it somewhat works, albeit slowly).


[PATCH 4/5] cache.h: drop read_cache_preload(pathspec)

2017-05-01 Thread Stefan Beller
coccinelle patch:
@@ expression E; @@
-read_cache_preload(E)
+read_index_preload(_index, E)

Additionally manual editing:
* drop the define from cache.h.
* builtin/{commit,describe}.c were not picked up as we have NULL and
  the address of an expression. Converted them manually.
* builtin/diff{-files,-index}.c error messages converted as well.

Signed-off-by: Stefan Beller 
---
 builtin/checkout.c | 4 ++--
 builtin/commit.c   | 4 ++--
 builtin/describe.c | 2 +-
 builtin/diff-files.c   | 4 ++--
 builtin/diff-index.c   | 4 ++--
 builtin/diff.c | 8 
 builtin/update-index.c | 2 +-
 cache.h| 1 -
 8 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0aac616ad6..2328a475ea 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -295,7 +295,7 @@ static int checkout_paths(const struct checkout_opts *opts,
lock_file = xcalloc(1, sizeof(struct lock_file));
 
hold_locked_index(lock_file, LOCK_DIE_ON_ERROR);
-   if (read_cache_preload(>pathspec) < 0)
+   if (read_index_preload(_index, >pathspec) < 0)
return error(_("index file corrupt"));
 
if (opts->source_tree)
@@ -488,7 +488,7 @@ static int merge_working_tree(const struct checkout_opts 
*opts,
struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 
hold_locked_index(lock_file, LOCK_DIE_ON_ERROR);
-   if (read_cache_preload(NULL) < 0)
+   if (read_index_preload(_index, NULL) < 0)
return error(_("index file corrupt"));
 
resolve_undo_clear();
diff --git a/builtin/commit.c b/builtin/commit.c
index 65a04ac199..687e7c8a3a 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -346,7 +346,7 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
   PATHSPEC_PREFER_FULL,
   prefix, argv);
 
-   if (read_cache_preload() < 0)
+   if (read_index_preload(_index, ) < 0)
die(_("index file corrupt"));
 
if (interactive) {
@@ -1377,7 +1377,7 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
   PATHSPEC_PREFER_FULL,
   prefix, argv);
 
-   read_cache_preload();
+   read_index_preload(_index, );
refresh_index(_index, REFRESH_QUIET|REFRESH_UNMERGED, , 
NULL, NULL);
 
fd = hold_locked_index(_lock, 0);
diff --git a/builtin/describe.c b/builtin/describe.c
index a5cd8c513f..0229458ac6 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -531,7 +531,7 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
static struct lock_file index_lock;
int fd;
 
-   read_cache_preload(NULL);
+   read_index_preload(_index, NULL);
refresh_index(_index, 
REFRESH_QUIET|REFRESH_UNMERGED,
  NULL, NULL, NULL);
fd = hold_locked_index(_lock, 0);
diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 15c61fd8d1..d400d8c1fc 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -62,8 +62,8 @@ int cmd_diff_files(int argc, const char **argv, const char 
*prefix)
(rev.diffopt.output_format & DIFF_FORMAT_PATCH))
rev.combine_merges = rev.dense_combined_merges = 1;
 
-   if (read_cache_preload() < 0) {
-   perror("read_cache_preload");
+   if (read_index_preload(_index, ) < 0) {
+   perror("read_index_preload");
return -1;
}
result = run_diff_files(, options);
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 49fd64d4ce..3fbe33a90a 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -44,8 +44,8 @@ int cmd_diff_index(int argc, const char **argv, const char 
*prefix)
usage(diff_cache_usage);
if (!cached) {
setup_work_tree();
-   if (read_cache_preload() < 0) {
-   perror("read_cache_preload");
+   if (read_index_preload(_index, ) < 0) {
+   perror("read_index_preload");
return -1;
}
} else if (read_index(_index) < 0) {
diff --git a/builtin/diff.c b/builtin/diff.c
index ed9edb2d0c..0ae33bce2b 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -144,8 +144,8 @@ static int builtin_diff_index(struct rev_info *revs,
usage(builtin_diff_usage);
if (!cached) {
setup_work_tree();
-   if (read_cache_preload(>diffopt.pathspec) < 0) {
-   perror("read_cache_preload");
+   if (read_index_preload(_index, >diffopt.pathspec) < 
0) {
+   perror("read_index_preload");
return -1;
}
} else if 

[PATCH 1/5] cache.h: drop read_cache()

2017-05-01 Thread Stefan Beller
This patch is produced via coccinelle using this semantic patch:

@@ @@
-read_cache()
+read_index(_index)

Additional manual editing:
* drop define in cache.h
* a comment in builtin/check-ignore.c and read-cache.c were
  converted
* builtin/diff.c: fix error message referencing the function.

Signed-off-by: Stefan Beller 
---
 apply.c  |  2 +-
 builtin/add.c|  4 ++--
 builtin/am.c |  2 +-
 builtin/blame.c  |  4 ++--
 builtin/check-attr.c |  2 +-
 builtin/check-ignore.c   |  4 ++--
 builtin/checkout-index.c |  2 +-
 builtin/clean.c  |  2 +-
 builtin/commit.c |  4 ++--
 builtin/diff-index.c |  2 +-
 builtin/diff.c   |  6 +++---
 builtin/fsck.c   |  2 +-
 builtin/grep.c   |  2 +-
 builtin/ls-files.c   |  2 +-
 builtin/merge-index.c|  2 +-
 builtin/mv.c |  2 +-
 builtin/reset.c  |  2 +-
 builtin/rev-parse.c  |  2 +-
 builtin/rm.c |  2 +-
 builtin/submodule--helper.c  |  2 +-
 builtin/update-index.c   |  2 +-
 cache.h  |  1 -
 check-racy.c |  2 +-
 diff.c   |  2 +-
 merge-recursive.c|  2 +-
 merge.c  |  2 +-
 read-cache.c |  2 +-
 rerere.c |  6 +++---
 revision.c   |  4 ++--
 sequencer.c  |  6 +++---
 sha1_name.c  |  2 +-
 submodule.c  |  4 ++--
 t/helper/test-dump-cache-tree.c  |  2 +-
 t/helper/test-dump-untracked-cache.c |  2 +-
 t/helper/test-lazy-init-name-hash.c  | 10 +-
 t/helper/test-read-cache.c   |  2 +-
 t/helper/test-scrap-cache-tree.c |  2 +-
 37 files changed, 52 insertions(+), 53 deletions(-)

diff --git a/apply.c b/apply.c
index e6dbab26ad..82701d6580 100644
--- a/apply.c
+++ b/apply.c
@@ -3987,7 +3987,7 @@ static int read_apply_cache(struct apply_state *state)
if (state->index_file)
return read_cache_from(state->index_file);
else
-   return read_cache();
+   return read_index(_index);
 }
 
 /* This function tries to read the object name from the current index */
diff --git a/builtin/add.c b/builtin/add.c
index 9f53f020d0..ab6d04e8db 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -205,7 +205,7 @@ static int edit_patch(int argc, const char **argv, const 
char *prefix)
 
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 
-   if (read_cache() < 0)
+   if (read_index(_index) < 0)
die(_("Could not read the index"));
 
init_revisions(, prefix);
@@ -376,7 +376,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
return 0;
}
 
-   if (read_cache() < 0)
+   if (read_index(_index) < 0)
die(_("index file corrupt"));
 
/*
diff --git a/builtin/am.c b/builtin/am.c
index a95dd8b4e6..4405d7307c 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1633,7 +1633,7 @@ static int fall_back_threeway(const struct am_state 
*state, const char *index_pa
say(state, stdout, _("Falling back to patching base and 3-way 
merge..."));
 
discard_cache();
-   read_cache();
+   read_index(_index);
 
/*
 * This is not so wrong. Depending on which base we picked, orig_tree
diff --git a/builtin/blame.c b/builtin/blame.c
index 07506a3e45..59955208fd 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2313,7 +2313,7 @@ static struct commit *fake_working_tree_commit(struct 
diff_options *opt,
unsigned mode;
struct strbuf msg = STRBUF_INIT;
 
-   read_cache();
+   read_index(_index);
time();
commit = alloc_commit_node();
commit->object.parsed = 1;
@@ -2395,7 +2395,7 @@ static struct commit *fake_working_tree_commit(struct 
diff_options *opt,
 * want to run "diff-index --cached".
 */
discard_cache();
-   read_cache();
+   read_index(_index);
 
len = strlen(path);
if (!mode) {
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 4d01ca0c8b..9cc3675d62 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -114,7 +114,7 @@ int cmd_check_attr(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, check_attr_options,
 check_attr_usage, PARSE_OPT_KEEP_DASHDASH);
 
-   if (read_cache() < 0) {
+   if (read_index(_index) < 0) {
die("invalid cache");
}
 
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c

[PATCH 2/5] cache.h: drop active_* macros

2017-05-01 Thread Stefan Beller
Based on the coccinelle patch:

@@ @@
-active_cache
+the_index.cache
@@ @@
-active_nr
+the_index.cache_nr
@@ @@
-active_alloc
+the_index.cache_alloc
@@ @@
-active_cache_changed
+the_index.cache_changed
@@ @@
-active_cache_tree
+the_index.cache_tree

Additional manual editing:
* drop the macros from cache.h
* fix the whitespace issue that apply complained about in
  builtin/checkout.c
* builtin/commit.c had an occurrence of active_cache_tree->sha1, which
  was not picked up with the coccinelle patch.
* diff.c and t/t2107-update-index-basic.sh referenced
  active_cache[_changed] in comments, fix them up.
* ative_nr was referenced in comments in read-cache.c and
  builtin/update-index.c, fix them.

Signed-off-by: Stefan Beller 
---
 apply.c  |  6 ++---
 builtin/add.c|  6 ++---
 builtin/am.c |  6 ++---
 builtin/blame.c  |  6 ++---
 builtin/checkout-index.c |  8 +++
 builtin/checkout.c   | 49 
 builtin/commit.c | 20 
 builtin/fsck.c   | 12 +-
 builtin/grep.c   |  8 +++
 builtin/ls-files.c   | 36 ++---
 builtin/merge-index.c| 10 
 builtin/merge.c  | 12 +-
 builtin/mv.c | 10 
 builtin/read-tree.c  |  2 +-
 builtin/rm.c | 16 ++---
 builtin/submodule--helper.c  |  8 +++
 builtin/update-index.c   | 48 ---
 cache.h  |  6 -
 check-racy.c |  4 ++--
 diff-lib.c   |  6 ++---
 diff.c   |  8 +++
 dir.c| 20 
 merge-recursive.c| 28 +++
 pathspec.c   | 14 ++--
 read-cache.c |  2 +-
 rerere.c | 26 ++---
 revision.c   | 18 +++
 sequencer.c  | 19 
 sha1_name.c  | 14 ++--
 submodule.c  | 12 +-
 t/helper/test-dump-cache-tree.c  |  2 +-
 t/helper/test-scrap-cache-tree.c |  2 +-
 t/t2107-update-index-basic.sh|  2 +-
 tree.c   |  8 +++
 wt-status.c  | 12 +-
 35 files changed, 232 insertions(+), 234 deletions(-)

diff --git a/apply.c b/apply.c
index 82701d6580..ae1b659b68 100644
--- a/apply.c
+++ b/apply.c
@@ -3499,7 +3499,7 @@ static int load_current(struct apply_state *state,
pos = cache_name_pos(name, strlen(name));
if (pos < 0)
return error(_("%s: does not exist in index"), name);
-   ce = active_cache[pos];
+   ce = the_index.cache[pos];
if (lstat(name, )) {
if (errno != ENOENT)
return error_errno("%s", name);
@@ -3671,7 +3671,7 @@ static int check_preimage(struct apply_state *state,
goto is_new;
return error(_("%s: does not exist in index"), 
old_name);
}
-   *ce = active_cache[pos];
+   *ce = the_index.cache[pos];
if (stat_ret < 0) {
if (checkout_target(_index, *ce, st))
return -1;
@@ -4001,7 +4001,7 @@ static int get_current_oid(struct apply_state *state, 
const char *path,
pos = cache_name_pos(path, strlen(path));
if (pos < 0)
return -1;
-   oidcpy(oid, _cache[pos]->oid);
+   oidcpy(oid, _index.cache[pos]->oid);
return 0;
 }
 
diff --git a/builtin/add.c b/builtin/add.c
index ab6d04e8db..f6d71b10d0 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -34,8 +34,8 @@ static void chmod_pathspec(struct pathspec *pathspec, int 
force_mode)
 {
int i;
 
-   for (i = 0; i < active_nr; i++) {
-   struct cache_entry *ce = active_cache[i];
+   for (i = 0; i < the_index.cache_nr; i++) {
+   struct cache_entry *ce = the_index.cache[i];
 
if (pathspec && !ce_path_match(ce, pathspec, NULL))
continue;
@@ -458,7 +458,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
unplug_bulk_checkin();
 
 finish:
-   if (active_cache_changed) {
+   if (the_index.cache_changed) {
if (write_locked_index(_index, _file, COMMIT_LOCK))
die(_("Unable to write new index file"));
}
diff --git a/builtin/am.c b/builtin/am.c
index 4405d7307c..c6a679d8e1 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1155,12 +1155,12 @@ static int index_has_changes(struct strbuf *sb)
diff_flush();
return DIFF_OPT_TST(, 

[PATCH 5/5] cache.h: drop read_cache_unmerged()

2017-05-01 Thread Stefan Beller
@@ @@
-read_cache_unmerged()
+read_index_unmerged(_index)

Additionally drop the define from cache.h manually.

Signed-off-by: Stefan Beller 
---
 builtin/am.c| 2 +-
 builtin/merge.c | 2 +-
 builtin/pull.c  | 2 +-
 builtin/read-tree.c | 2 +-
 builtin/reset.c | 2 +-
 cache.h | 1 -
 sequencer.c | 2 +-
 7 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index cb3e4dff63..bb0927fbcc 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2053,7 +2053,7 @@ static int clean_index(const struct object_id *head, 
const struct object_id *rem
if (!remote_tree)
return error(_("Could not parse object '%s'."), 
oid_to_hex(remote));
 
-   read_cache_unmerged();
+   read_index_unmerged(_index);
 
if (fast_forward_to(head_tree, head_tree, 1))
return -1;
diff --git a/builtin/merge.c b/builtin/merge.c
index 4d4c56050c..c27c806ac1 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1170,7 +1170,7 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
goto done;
}
 
-   if (read_cache_unmerged())
+   if (read_index_unmerged(_index))
die_resolve_conflict("merge");
 
if (file_exists(git_path_merge_head())) {
diff --git a/builtin/pull.c b/builtin/pull.c
index dd1a4a94e4..42578cee05 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -788,7 +788,7 @@ int cmd_pull(int argc, const char **argv, const char 
*prefix)
 
git_config(git_pull_config, NULL);
 
-   if (read_cache_unmerged())
+   if (read_index_unmerged(_index))
die_resolve_conflict("pull");
 
if (file_exists(git_path_merge_head()))
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index f997814933..0bcf021ead 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -195,7 +195,7 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 */
 
if (opts.reset || opts.merge || opts.prefix) {
-   if (read_cache_unmerged() && (opts.prefix || opts.merge))
+   if (read_index_unmerged(_index) && (opts.prefix || 
opts.merge))
die("You need to resolve your current index first");
stage = opts.merge = 1;
}
diff --git a/builtin/reset.c b/builtin/reset.c
index 03c5498d6e..4a4eb723dd 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -66,7 +66,7 @@ static int reset_index(const struct object_id *oid, int 
reset_type, int quiet)
opts.reset = 1;
}
 
-   read_cache_unmerged();
+   read_index_unmerged(_index);
 
if (reset_type == KEEP) {
struct object_id head_oid;
diff --git a/cache.h b/cache.h
index a66ae97fb7..9b94339573 100644
--- a/cache.h
+++ b/cache.h
@@ -355,7 +355,6 @@ extern void free_name_hash(struct index_state *istate);
 
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define is_cache_unborn() is_index_unborn(_index)
-#define read_cache_unmerged() read_index_unmerged(_index)
 #define discard_cache() discard_index(_index)
 #define unmerged_cache() unmerged_index(_index)
 #define cache_name_pos(name, namelen) 
index_name_pos(_index,(name),(namelen))
diff --git a/sequencer.c b/sequencer.c
index 9409b65aaa..f20e05fe60 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -349,7 +349,7 @@ static struct tree *empty_tree(void)
 
 static int error_dirty_index(struct replay_opts *opts)
 {
-   if (read_cache_unmerged())
+   if (read_index_unmerged(_index))
return error_resolve_conflict(_(action_name(opts)));
 
error(_("your local changes would be overwritten by %s."),
-- 
2.13.0.rc1.1.gbc33f0f778



[PATCH 3/5] cache.h: drop read_cache_from

2017-05-01 Thread Stefan Beller
coccinelle patch:
@@ expression E; @@
-read_cache_from(E)
+read_index_from(_index, E)

additionally drop the define from cache.h manually

Signed-off-by: Stefan Beller 
---
 apply.c  | 2 +-
 builtin/am.c | 4 ++--
 builtin/commit.c | 6 +++---
 cache.h  | 1 -
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/apply.c b/apply.c
index ae1b659b68..159e039a18 100644
--- a/apply.c
+++ b/apply.c
@@ -3985,7 +3985,7 @@ static int check_patch_list(struct apply_state *state, 
struct patch *patch)
 static int read_apply_cache(struct apply_state *state)
 {
if (state->index_file)
-   return read_cache_from(state->index_file);
+   return read_index_from(_index, state->index_file);
else
return read_index(_index);
 }
diff --git a/builtin/am.c b/builtin/am.c
index c6a679d8e1..cb3e4dff63 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1557,7 +1557,7 @@ static int run_apply(const struct am_state *state, const 
char *index_file)
if (index_file) {
/* Reload index as apply_all_patches() will have modified it. */
discard_cache();
-   read_cache_from(index_file);
+   read_index_from(_index, index_file);
}
 
return 0;
@@ -1600,7 +1600,7 @@ static int fall_back_threeway(const struct am_state 
*state, const char *index_pa
return error("could not build fake ancestor");
 
discard_cache();
-   read_cache_from(index_path);
+   read_index_from(_index, index_path);
 
if (write_index_as_tree(orig_tree.hash, _index, index_path, 0, 
NULL))
return error(_("Repository lacks necessary blobs to fall back 
on 3-way merge."));
diff --git a/builtin/commit.c b/builtin/commit.c
index 01d298c836..65a04ac199 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -370,7 +370,7 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
unsetenv(INDEX_ENVIRONMENT);
 
discard_cache();
-   read_cache_from(get_lock_file_path(_lock));
+   read_index_from(_index, get_lock_file_path(_lock));
if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) {
if (reopen_lock_file(_lock) < 0)
die(_("unable to write index file"));
@@ -489,7 +489,7 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
 
discard_cache();
ret = get_lock_file_path(_lock);
-   read_cache_from(ret);
+   read_index_from(_index, ret);
return ret;
 }
 
@@ -949,7 +949,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
 * the editor and after we invoke run_status above.
 */
discard_cache();
-   read_cache_from(index_file);
+   read_index_from(_index, index_file);
if (update_main_cache_tree(0)) {
error(_("Error building trees"));
return 0;
diff --git a/cache.h b/cache.h
index 4e913d1346..6abf48dcc3 100644
--- a/cache.h
+++ b/cache.h
@@ -354,7 +354,6 @@ extern void free_name_hash(struct index_state *istate);
 
 
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
-#define read_cache_from(path) read_index_from(_index, (path))
 #define read_cache_preload(pathspec) read_index_preload(_index, (pathspec))
 #define is_cache_unborn() is_index_unborn(_index)
 #define read_cache_unmerged() read_index_unmerged(_index)
-- 
2.13.0.rc1.1.gbc33f0f778



[PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

2017-05-01 Thread Stefan Beller
This applies to origin/master.

For better readability and understandability for newcomers it is a good idea
to not offer 2 APIs doing the same thing with on being the #define of the other.

In the long run we may want to drop the macros guarded by
NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.

My main reason for this patch is to try out coccinelle as well as a
discussion I had off list about maintainability of software.

I just made these patches and wonder if now is a good time to pull through and
convert the rest as well?

Thanks,
Stefan

Stefan Beller (5):
  cache.h: drop read_cache()
  cache.h: drop active_* macros
  cache.h: drop read_cache_from
  cache.h: drop read_cache_preload(pathspec)
  cache.h: drop read_cache_unmerged()

 apply.c  | 10 +++
 builtin/add.c| 10 +++
 builtin/am.c | 14 +-
 builtin/blame.c  | 10 +++
 builtin/check-attr.c |  2 +-
 builtin/check-ignore.c   |  4 +--
 builtin/checkout-index.c | 10 +++
 builtin/checkout.c   | 53 ++--
 builtin/clean.c  |  2 +-
 builtin/commit.c | 32 +++---
 builtin/describe.c   |  2 +-
 builtin/diff-files.c |  4 +--
 builtin/diff-index.c |  6 ++--
 builtin/diff.c   | 14 +-
 builtin/fsck.c   | 14 +-
 builtin/grep.c   | 10 +++
 builtin/ls-files.c   | 38 +-
 builtin/merge-index.c| 12 
 builtin/merge.c  | 14 +-
 builtin/mv.c | 12 
 builtin/pull.c   |  2 +-
 builtin/read-tree.c  |  4 +--
 builtin/reset.c  |  4 +--
 builtin/rev-parse.c  |  2 +-
 builtin/rm.c | 18 ++--
 builtin/submodule--helper.c  | 10 +++
 builtin/update-index.c   | 52 ++-
 cache.h  | 10 ---
 check-racy.c |  6 ++--
 diff-lib.c   |  6 ++--
 diff.c   | 10 +++
 dir.c| 20 +++---
 merge-recursive.c| 30 ++--
 merge.c  |  2 +-
 pathspec.c   | 14 +-
 read-cache.c |  4 +--
 rerere.c | 32 +++---
 revision.c   | 22 +++
 sequencer.c  | 27 +-
 sha1_name.c  | 16 +--
 submodule.c  | 16 +--
 t/helper/test-dump-cache-tree.c  |  4 +--
 t/helper/test-dump-untracked-cache.c |  2 +-
 t/helper/test-lazy-init-name-hash.c  | 10 +++
 t/helper/test-read-cache.c   |  2 +-
 t/helper/test-scrap-cache-tree.c |  4 +--
 t/t2107-update-index-basic.sh|  2 +-
 tree.c   |  8 +++---
 wt-status.c  | 12 
 49 files changed, 309 insertions(+), 315 deletions(-)

-- 
2.13.0.rc1.1.gbc33f0f778



Re: [PATCH 07/26] http-backend: avoid memory leaks

2017-05-01 Thread Johannes Schindelin
Hi Junio,

On Sun, 30 Apr 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> Hmph.  I find a "leak" of a resource acquired inside the main
> >> function and not released when the main function leaves a lot less
> >> interesting than the other ones this series covers.
> >
> > Ah, I missed that this falls squarely into the "one-shot programs are
> > allowed to be sloppy in their memory management, essentially using
> > exit() as garbage collector" category.
> 
> I actually think it was a good intention of yours and I would agree
> with the patch.  The automated tools that find and complain about
> these issues do not know about our local house rules like "do not
> bother freeing immediately before exit" and will keep notifying us
> of leaks.

In Coverity, I marked them as "Intentional", so that they can be dropped
from the default view.

Unfortunately, the verdict "Intentional" does not percolate to other
projects, so I will probably have to do the entire shebang in the `git`
project again.

Of course, we could also decide that we want to shut up static analyzers
by actually *not* leaking memory, by *not* using exit() as our garbage
collector in one-shot commands, with the added benefit of making the code
easier to libify/reuse.

But I fear that I distract the discussion away from a more focused
attention to the actual patch series, which is actually still in flight
and in actual need for review, as I actually want to get those patches
integrated into git.git.

Ciao,
Dscho


Re: [RFC PATCH 4/5] submodule--helper: reattach-HEAD

2017-05-01 Thread Stefan Beller
On Mon, May 1, 2017 at 11:36 AM, Brandon Williams  wrote:
>   if (flags & REATTACH_HEAD_DIE_ON_ERROR)
> die(...);
>
>   return -1;
>
> It just feels weird to me to have the inverted logic, that's my opinion
> though.

Yeah, me too. But my feelings were not as important as staying
under 80 characters a line. And as the error message is longer than
the "return -1", I wanted to have it not indented yet another level.

>
>> + }
>> +
>> + if (!sub->branch) {
>> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
>> + return -1;
>> + die(_("no branch configured to follow for submodule '%s'"),
>> + sub->path);
>> + }
>> +
>> + /* lookup branch value in .gitmodules */
>> + if (strcmp(".", sub->branch)) {
>> + branch = sub->branch;
>> + } else {
>> + /* special care for '.': Is the superproject on a branch? */
>> + struct object_id oid;
>> + branch = resolve_refdup("HEAD", 0, oid.hash, NULL);
>> + if (!branch) {
>> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
>> + return -1;
>> + die(_("Not on any branch, but submodule configured to 
>> follow superprojects branch"));
>> + }
>> + }
>> +
>> + if (!strcmp("HEAD", branch))
>> + return 0;
>
> So this is the case where the superproject is in a detached-HEAD state?

and the submodule config is branch='.'

> In that case then we don't need to continue because if the superproject
> isn't on a branch, then there isn't a reason the submodule should be on
> a branch.

agreed.


Re: [RFC PATCH 4/5] submodule--helper: reattach-HEAD

2017-05-01 Thread Brandon Williams
On 05/01, Stefan Beller wrote:
> Add a new command, that reattaches a detached HEAD to its configured
> branch in a submodule.
> 
> In a later patch we will teach git-submodule as well as other submodule
> worktree manipulators (git reset/checkout --recurse-submodules) to not
> end up in a detached HEAD state in the submodules.
> 
> Signed-off-by: Stefan Beller 
>  
> +int reattach_HEAD(const char *submodule_path, int flags)
> +{
> + const char *branch;
> + struct object_id sub_head_object;
> + struct object_id sub_branch_object;
> +
> + const struct submodule *sub = submodule_from_path(null_sha1, 
> submodule_path);
> +
> + if (!sub) {
> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
> + return -1;
> + die(_("no submodule mapping found in .gitmodules for path 
> '%s'"),
> + submodule_path);
It may be a bit clearer to do the following:

  if (flags & REATTACH_HEAD_DIE_ON_ERROR)
die(...);

  return -1;

It just feels weird to me to have the inverted logic, that's my opinion
though.

> + }
> +
> + if (!sub->branch) {
> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
> + return -1;
> + die(_("no branch configured to follow for submodule '%s'"),
> + sub->path);
> + }
> +
> + /* lookup branch value in .gitmodules */
> + if (strcmp(".", sub->branch)) {
> + branch = sub->branch;
> + } else {
> + /* special care for '.': Is the superproject on a branch? */
> + struct object_id oid;
> + branch = resolve_refdup("HEAD", 0, oid.hash, NULL);
> + if (!branch) {
> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
> + return -1;
> + die(_("Not on any branch, but submodule configured to 
> follow superprojects branch"));
> + }
> + }
> +
> + if (!strcmp("HEAD", branch))
> + return 0;

So this is the case where the superproject is in a detached-HEAD state?
In that case then we don't need to continue because if the superproject
isn't on a branch, then there isn't a reason the submodule should be on
a branch.

> +
> + resolve_gitlink_ref(sub->path, "HEAD", sub_head_object.hash);
> + resolve_gitlink_ref(sub->path, branch, sub_branch_object.hash);
> +
> + if (!oidcmp(_head_object, _branch_object)) {
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct strbuf reason = STRBUF_INIT;
> +
> + cp.dir = sub->path;
> + prepare_submodule_repo_env(_array);
> +
> + strbuf_addf(, "reattach HEAD to %s", branch);
> + argv_array_pushl(, "git", "symbolic-ref",
> +  "-m", reason.buf, "HEAD", branch, NULL);
> + if (run_command()) {
> + if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
> + return -1;
> + die(_("could not run symbolic-ref in submodule '%s'"),
> + sub->path);
> + }
> + strbuf_release();
> + return 0;
> + } else {
> + fprintf(stderr, "not reattaching HEAD, object ids differ:\n"
> + "HEAD is %s\n branch is %s",
> + oid_to_hex(_head_object),
> + oid_to_hex(_branch_object));
> + return 1;
> + }
> +}
> +
>  static int find_first_merges(struct object_array *result, const char *path,
>   struct commit *a, struct commit *b)
>  {
> diff --git a/submodule.h b/submodule.h
> index 1277480add..f7bb565a6d 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -119,6 +119,16 @@ extern int submodule_move_head(const char *path,
>   */
>  extern void prepare_submodule_repo_env(struct argv_array *out);
>  
> +/*
> + * Attach the submodules HEAD to its configured branch if the underlying
> + * object id are the same. Returns
> + * -  0 when the branch could be reattached.
> + * - +1 when the branch could not be reattached due to object name mismatch
> + * - <0 when an error occurred (e.g. missing .gitmodules file)
> + */
> +#define REATTACH_HEAD_DIE_ON_ERROR (1<<0)
> +extern int reattach_HEAD(const char *submodule_path, int flags);
> +
>  #define ABSORB_GITDIR_RECURSE_SUBMODULES (1<<0)
>  extern void absorb_git_dir_into_superproject(const char *prefix,
>const char *path,
> -- 
> 2.13.0.rc1.1.gbc33f0f778
> 

-- 
Brandon Williams


Re: [PATCH 0/5] Some submodule bugfixes and "reattaching detached HEADs"

2017-05-01 Thread Brandon Williams
On 05/01, Stefan Beller wrote:
> The first three patches fix bugs in existing submodule code,
> sort of independent from the last 2 patches, which are RFCs.
> 
> 
> 
> In submodules as of today you always end up with a detached HEAD,
> which may be scary to people used to working on branches. (I myself
> enjoy working with a detached HEAD).
> 
> The detached HEAD in a submodule is the correct in the submodule-as-a-lib
> case, but in case you actively want to work inside submodules, you
> may want to have proper branch support, e.g. when typing:
> 
> git checkout --recuse-submodules master
> 
> you may want to have the submodules on the master branch as well.

I don't know why submodules were originally designed to be in a
detached HEAD state but I much prefer working on branches (as I'm sure
many other developers do) so the prospect of this becoming the norm is
exciting! :D
> 
> There are a couple of challenges though:
> * We'd probably want to pay attention to the branch configuration
>   (.gitmodules submodule.NAME.branch to be "." or "foo")

Yes, I agree.  If we have a particular name of a branch configured to be
tracked, then we should respect that and try to attach HEAD onto that
branch name.

> * What if the submodule does not have a master branch?
>   Jonathan proposed off list that we may just want to create the
>   branch in the submodule. This is not implemented in this series.

I think it would be fine to create the branch, just as long as it
doesn't already exist as I can't think of a negative impact of this
(aside from maybe having more branches in the submodules after a
prolonged time period?)

> * In case the branch exists in the submodule and doesn't point at the
>   the object as recorded in the superproject, we may either want to 
>   update the branch to point at the superproject record or we want to
>   reject the "reattaching HEAD". Patch 4 provides the later.

The later seems like the most correct thing to do.  It would be
unfortunate if I had done some work on top of the master branch in a
submodule (which wasn't reflected in the superproject), then done a
'checkout master' in the super project only to go back to the submodule
and find that all my work is gone! (Well not gone, but you'd have to go
into the scary reflog!)  It just makes sense to leave HEAD in a detached
state here as it prevents loss of work.

> Any other ideas on why this could be a bad idea or corner cases?
> 
> Thanks,
> Stefan
> 
> Stefan Beller (5):
>   submodule_move_head: fix leak of struct child_process
>   submodule_move_head: prepare env for child correctly
>   submodule: avoid auto-discovery in new working tree manipulator code
>   submodule--helper: reattach-HEAD
>   submodule_move_head: reattach submodule HEAD to branch if possible.
> 
>  builtin/submodule--helper.c | 12 ++
>  submodule.c | 93 
> -
>  submodule.h | 10 +
>  3 files changed, 106 insertions(+), 9 deletions(-)
> 
> -- 
> 2.13.0.rc1.1.gbc33f0f778
> 

-- 
Brandon Williams


Re: [PATCH 1/5] submodule_move_head: fix leak of struct child_process

2017-05-01 Thread Brandon Williams
On 05/01, Stefan Beller wrote:
> While fixing the leak of `cp`, reuse it instead of having to declare
> another struct child_process.
> 
> Signed-off-by: Stefan Beller 

This shouldn't be needed as 'finish_command' does the cleanup for you by
calling 'child_prcoess_clear()'.


> ---
>  submodule.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/submodule.c b/submodule.c
> index d3299e29c0..cd098cf12b 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1466,17 +1466,19 @@ int submodule_move_head(const char *path,
>   goto out;
>   }
>  
> + child_process_clear();
> +
>   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
>   if (new) {
> - struct child_process cp1 = CHILD_PROCESS_INIT;
> + child_process_init();
>   /* also set the HEAD accordingly */
> - cp1.git_cmd = 1;
> - cp1.no_stdin = 1;
> - cp1.dir = path;
> + cp.git_cmd = 1;
> + cp.no_stdin = 1;
> + cp.dir = path;
>  
> - argv_array_pushl(, "update-ref", "HEAD", new, 
> NULL);
> + argv_array_pushl(, "update-ref", "HEAD", new, 
> NULL);
>  
> - if (run_command()) {
> + if (run_command()) {
>   ret = -1;
>   goto out;
>   }
> @@ -1492,6 +1494,7 @@ int submodule_move_head(const char *path,
>   }
>   }
>  out:
> + child_process_clear();
>   return ret;
>  }
>  
> -- 
> 2.13.0.rc1.1.gbc33f0f778
> 

-- 
Brandon Williams


[PATCH 2/5] submodule_move_head: prepare env for child correctly

2017-05-01 Thread Stefan Beller
We forgot to prepare the submodule env, which is only a problem for
nested submodules. See 2e5d6503bd (ls-files: fix recurse-submodules
with nested submodules, 2017-04-13) for further explanation.

Signed-off-by: Stefan Beller 
---
 submodule.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/submodule.c b/submodule.c
index cd098cf12b..c7a7a33916 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1476,6 +1476,7 @@ int submodule_move_head(const char *path,
cp.no_stdin = 1;
cp.dir = path;
 
+   prepare_submodule_repo_env(_array);
argv_array_pushl(, "update-ref", "HEAD", new, 
NULL);
 
if (run_command()) {
-- 
2.13.0.rc1.1.gbc33f0f778



[RFC PATCH 4/5] submodule--helper: reattach-HEAD

2017-05-01 Thread Stefan Beller
Add a new command, that reattaches a detached HEAD to its configured
branch in a submodule.

In a later patch we will teach git-submodule as well as other submodule
worktree manipulators (git reset/checkout --recurse-submodules) to not
end up in a detached HEAD state in the submodules.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 12 
 submodule.c | 69 +
 submodule.h | 10 +++
 3 files changed, 91 insertions(+)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 36e4231821..645ceac999 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1196,6 +1196,17 @@ static int is_active(int argc, const char **argv, const 
char *prefix)
return !is_submodule_initialized(argv[1]);
 }
 
+static int cmd_reattach_HEAD(int argc, const char **argv, const char *prefix)
+{
+   if (argc != 2)
+   die("submodule--helper reattach-HEAD takes exactly 1 argument");
+
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+
+   return reattach_HEAD(argv[1], REATTACH_HEAD_DIE_ON_ERROR);
+}
+
 #define SUPPORT_SUPER_PREFIX (1<<0)
 
 struct cmd_struct {
@@ -1217,6 +1228,7 @@ static struct cmd_struct commands[] = {
{"push-check", push_check, 0},
{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
{"is-active", is_active, 0},
+   {"reattach-HEAD", cmd_reattach_HEAD, 0}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/submodule.c b/submodule.c
index df03691199..4e74e38829 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1499,6 +1499,75 @@ int submodule_move_head(const char *path,
return ret;
 }
 
+int reattach_HEAD(const char *submodule_path, int flags)
+{
+   const char *branch;
+   struct object_id sub_head_object;
+   struct object_id sub_branch_object;
+
+   const struct submodule *sub = submodule_from_path(null_sha1, 
submodule_path);
+
+   if (!sub) {
+   if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
+   return -1;
+   die(_("no submodule mapping found in .gitmodules for path 
'%s'"),
+   submodule_path);
+   }
+
+   if (!sub->branch) {
+   if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
+   return -1;
+   die(_("no branch configured to follow for submodule '%s'"),
+   sub->path);
+   }
+
+   /* lookup branch value in .gitmodules */
+   if (strcmp(".", sub->branch)) {
+   branch = sub->branch;
+   } else {
+   /* special care for '.': Is the superproject on a branch? */
+   struct object_id oid;
+   branch = resolve_refdup("HEAD", 0, oid.hash, NULL);
+   if (!branch) {
+   if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
+   return -1;
+   die(_("Not on any branch, but submodule configured to 
follow superprojects branch"));
+   }
+   }
+
+   if (!strcmp("HEAD", branch))
+   return 0;
+
+   resolve_gitlink_ref(sub->path, "HEAD", sub_head_object.hash);
+   resolve_gitlink_ref(sub->path, branch, sub_branch_object.hash);
+
+   if (!oidcmp(_head_object, _branch_object)) {
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf reason = STRBUF_INIT;
+
+   cp.dir = sub->path;
+   prepare_submodule_repo_env(_array);
+
+   strbuf_addf(, "reattach HEAD to %s", branch);
+   argv_array_pushl(, "git", "symbolic-ref",
+"-m", reason.buf, "HEAD", branch, NULL);
+   if (run_command()) {
+   if (!(flags & REATTACH_HEAD_DIE_ON_ERROR))
+   return -1;
+   die(_("could not run symbolic-ref in submodule '%s'"),
+   sub->path);
+   }
+   strbuf_release();
+   return 0;
+   } else {
+   fprintf(stderr, "not reattaching HEAD, object ids differ:\n"
+   "HEAD is %s\n branch is %s",
+   oid_to_hex(_head_object),
+   oid_to_hex(_branch_object));
+   return 1;
+   }
+}
+
 static int find_first_merges(struct object_array *result, const char *path,
struct commit *a, struct commit *b)
 {
diff --git a/submodule.h b/submodule.h
index 1277480add..f7bb565a6d 100644
--- a/submodule.h
+++ b/submodule.h
@@ -119,6 +119,16 @@ extern int submodule_move_head(const char *path,
  */
 extern void prepare_submodule_repo_env(struct argv_array *out);
 
+/*
+ * Attach the submodules HEAD to its configured branch if the underlying
+ * object id are the same. Returns
+ * -  

[PATCH 1/5] submodule_move_head: fix leak of struct child_process

2017-05-01 Thread Stefan Beller
While fixing the leak of `cp`, reuse it instead of having to declare
another struct child_process.

Signed-off-by: Stefan Beller 
---
 submodule.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/submodule.c b/submodule.c
index d3299e29c0..cd098cf12b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1466,17 +1466,19 @@ int submodule_move_head(const char *path,
goto out;
}
 
+   child_process_clear();
+
if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
if (new) {
-   struct child_process cp1 = CHILD_PROCESS_INIT;
+   child_process_init();
/* also set the HEAD accordingly */
-   cp1.git_cmd = 1;
-   cp1.no_stdin = 1;
-   cp1.dir = path;
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
 
-   argv_array_pushl(, "update-ref", "HEAD", new, 
NULL);
+   argv_array_pushl(, "update-ref", "HEAD", new, 
NULL);
 
-   if (run_command()) {
+   if (run_command()) {
ret = -1;
goto out;
}
@@ -1492,6 +1494,7 @@ int submodule_move_head(const char *path,
}
}
 out:
+   child_process_clear();
return ret;
 }
 
-- 
2.13.0.rc1.1.gbc33f0f778



[RFC PATCH 5/5] submodule_move_head: reattach submodule HEAD to branch if possible.

2017-05-01 Thread Stefan Beller
Side note: We also want to call this from git submodule update

Signed-off-by: Stefan Beller 
---
 submodule.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/submodule.c b/submodule.c
index 4e74e38829..66651ffa34 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1483,6 +1483,8 @@ int submodule_move_head(const char *path,
ret = -1;
goto out;
}
+
+   reattach_HEAD(path, 0);
} else {
struct strbuf sb = STRBUF_INIT;
 
-- 
2.13.0.rc1.1.gbc33f0f778



[PATCH 3/5] submodule: avoid auto-discovery in new working tree manipulator code

2017-05-01 Thread Stefan Beller
All commands that are run in a submodule, are run in a correct setup,
there is no need to prepare the environment without setting the GIT_DIR
variable. By setting the GIT_DIR variable we fix issues as discussed in
10f5c52656 (submodule: avoid auto-discovery in
prepare_submodule_repo_env(), 2016-09-01)

Signed-off-by: Stefan Beller 
---
 submodule.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index c7a7a33916..df03691199 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1363,7 +1363,7 @@ static int submodule_has_dirty_index(const struct 
submodule *sub)
 {
struct child_process cp = CHILD_PROCESS_INIT;
 
-   prepare_submodule_repo_env_no_git_dir(_array);
+   prepare_submodule_repo_env(_array);
 
cp.git_cmd = 1;
argv_array_pushl(, "diff-index", "--quiet",
@@ -1380,7 +1380,7 @@ static int submodule_has_dirty_index(const struct 
submodule *sub)
 static void submodule_reset_index(const char *path)
 {
struct child_process cp = CHILD_PROCESS_INIT;
-   prepare_submodule_repo_env_no_git_dir(_array);
+   prepare_submodule_repo_env(_array);
 
cp.git_cmd = 1;
cp.no_stdin = 1;
@@ -1438,7 +1438,7 @@ int submodule_move_head(const char *path,
}
}
 
-   prepare_submodule_repo_env_no_git_dir(_array);
+   prepare_submodule_repo_env(_array);
 
cp.git_cmd = 1;
cp.no_stdin = 1;
-- 
2.13.0.rc1.1.gbc33f0f778



[PATCH 0/5] Some submodule bugfixes and "reattaching detached HEADs"

2017-05-01 Thread Stefan Beller
The first three patches fix bugs in existing submodule code,
sort of independent from the last 2 patches, which are RFCs.



In submodules as of today you always end up with a detached HEAD,
which may be scary to people used to working on branches. (I myself
enjoy working with a detached HEAD).

The detached HEAD in a submodule is the correct in the submodule-as-a-lib
case, but in case you actively want to work inside submodules, you
may want to have proper branch support, e.g. when typing:

git checkout --recuse-submodules master

you may want to have the submodules on the master branch as well.

There are a couple of challenges though:
* We'd probably want to pay attention to the branch configuration
  (.gitmodules submodule.NAME.branch to be "." or "foo")
* What if the submodule does not have a master branch?
  Jonathan proposed off list that we may just want to create the
  branch in the submodule. This is not implemented in this series.
* In case the branch exists in the submodule and doesn't point at the
  the object as recorded in the superproject, we may either want to 
  update the branch to point at the superproject record or we want to
  reject the "reattaching HEAD". Patch 4 provides the later.

Any other ideas on why this could be a bad idea or corner cases?

Thanks,
Stefan

Stefan Beller (5):
  submodule_move_head: fix leak of struct child_process
  submodule_move_head: prepare env for child correctly
  submodule: avoid auto-discovery in new working tree manipulator code
  submodule--helper: reattach-HEAD
  submodule_move_head: reattach submodule HEAD to branch if possible.

 builtin/submodule--helper.c | 12 ++
 submodule.c | 93 -
 submodule.h | 10 +
 3 files changed, 106 insertions(+), 9 deletions(-)

-- 
2.13.0.rc1.1.gbc33f0f778



Re: Terrible bad performance for it blame --date=iso -C -C master --

2017-05-01 Thread Samuel Lijin
On Fri, Mar 31, 2017 at 10:52 AM, Junio C Hamano  wrote:
> "Ulrich Windl"  writes:
>
>> I was running "vc-annotate" in Emacs for a file from a large
>> repository (>4 files, a big percentage being binary, about 10
>> commits). For the first file the result was presented rather soon, but
>> for a second file the command did not finish even after about 10
>> minutes!
>>
>> The file in question is a rather short text file (124 kB), and
>> according to git log it has one commit.
>>
>> While being bored, I did an strace of the command to find out that a
>> huge number of files is inspected.
>
> With -C -C the user (vc-annotate?) is asking to inspect huge number
> of files, to find if the contents of the file (except for the part
> that came from its own previous version) came from other existing
> files.  So this is very much expected.
>
> It might not be a bad idea to teach "blame" not to pay attention to
> any path that is marked as "-diff" (e.g. binary files) when trying
> to see if remaining contents appeared by borrowing from them.  We do
> not have that heuristics (yet).

Could you elaborate on this? Do you mean to tell diffcore-rename to
ignore diff_filespec objects if they're binary?


Re: What's cooking in git.git (May 2017, #01; Mon, 1)

2017-05-01 Thread Ævar Arnfjörð Bjarmason
On Mon, May 1, 2017 at 6:21 PM, Brandon Williams  wrote:
> On 05/01, Ęvar Arnfjörš Bjarmason wrote:
>> On Mon, May 1, 2017 at 7:35 AM, Junio C Hamano  wrote:
>> > * ab/clone-no-tags (2017-05-01) 3 commits
>> >   (merged to 'next' on 2017-04-30 at 601649896a)
>> >  + tests: rename a test having to do with shallow submodules
>> >  + clone: add a --no-tags option to clone without tags
>> >  + tests: change "cd ... && git fetch" to "cd &&\n\tgit fetch"
>> >
>> >  "git clone" learned the "--no-tags" option not to fetch all tags
>> >  initially, and also set up the tagopt not to follow any tags in
>> >  subsequent fetches.
>> >
>> >  Will cook in 'next'.
>>
>> Thanks for trimming off the top 2 patches. I've dropped those myself,
>> if someone (Brandon || Stefan) more interested in working on
>> submodules wants to pick them up that would be neat, but I don't need
>> it myself & doing it differently than the existing submodule options
>> would take too much of my time.
>
> Yeah we can add it to our backlog, though I'm not sure how quickly we'll
> be able to get to it.  If you end up needing this sooner just let me
> know.

I don't need it at all, I just thought it was prudent to add the
submodule part of passing clone options / config when adding a new
option since there were some existing options that did that.

Since there's outstanding debate about the interface for that sort of
thing I'll just leave that to people who care more about submodules if
they'd like to pursue it.


Re: Bug: Git rename does not work if folder naming was changed from lower to upper case on OSX

2017-05-01 Thread Kevin Daudt
On Sun, Apr 30, 2017 at 06:47:29PM -0700, Junio C Hamano wrote:
> Kevin Daudt  writes:
> 
> > Note that git does not store that files are renamed. So a remove + add
> > is the same as a rename in git. Only git status shows it when you for
> > example use git mv directly, but this information is lost on commit.
> 
> Are you sure about the last sentence?  I do not recall teaching
> anything special to "git status".  If "git status" says A was
> created by renaming B, and if you committed that state was-is, "git
> show" on the result commit _should_ say the same thing, I would
> think.

Yes, of course, you are right. Git status also detects a file being
renamed instead of relying on it being recorded.
> 
> > Instead of storing it get relies on detecting what (parts of ) files got
> > renamed, copied etc.
> 
> That statement I belieave is true.

Why are you reserved here?

Kevin.


Re: [PATCH 5/6] submodule: improve submodule_has_commits

2017-05-01 Thread Brandon Williams
On 05/01, Stefan Beller wrote:
> On Sun, Apr 30, 2017 at 4:14 PM, Brandon Williams  wrote:
> 
> > This hunk of logic is essentially a copy and paste from elsewhere in the
> > file.  Essentially both code paths were essentially doing the same thing
> > (checking if a submodule has a commit) but one of the code paths included
> > this logic to ensure that it was reachable from a ref in the submodule.
> > In order to merge the two code paths, I included that logic here.
> 
> yes, I get that. The question is whether the other code path omitted the check
> intentionally or just missed it (Is it a bug or feature to have this
> check in that code
> path now)?

I'd take a look at Junio's response.  This check more than likely needs
to be included.

-- 
Brandon Williams


Re: [PATCH 5/6] submodule: improve submodule_has_commits

2017-05-01 Thread Stefan Beller
On Sun, Apr 30, 2017 at 4:14 PM, Brandon Williams  wrote:

> This hunk of logic is essentially a copy and paste from elsewhere in the
> file.  Essentially both code paths were essentially doing the same thing
> (checking if a submodule has a commit) but one of the code paths included
> this logic to ensure that it was reachable from a ref in the submodule.
> In order to merge the two code paths, I included that logic here.

yes, I get that. The question is whether the other code path omitted the check
intentionally or just missed it (Is it a bug or feature to have this
check in that code
path now)?


Re: [PATCH 6/6] submodule: refactor logic to determine changed submodules

2017-05-01 Thread Brandon Williams
On 04/28, Stefan Beller wrote:
> + Heiko, who touched the pushing code end of last year.
> 
> On Fri, Apr 28, 2017 at 4:54 PM, Brandon Williams  wrote:
> > There are currently two instances (fetch and push) where we want to
> > determine if submodules have changed given some revision specification.
> > These two instances don't use the same logic to generate a list of
> > changed submodules and as a result there is a fair amount of code
> > duplication.
> >
> > This patch refactors these two code paths such that they both use the
> > same logic to generate a list of changed submodules.  This also makes it
> > easier for future callers to be able to reuse this logic as they only
> > need to create an argv_array with the revision specification to be using
> > during the revision walk.
> 
> Thanks for doing some refactoring. :)
> 
> > -static struct oid_array *submodule_commits(struct string_list *submodules,
> > -   const char *path)
> > ...
> 
> > -static void free_submodules_oids(struct string_list *submodules)
> > -{
> > ...
> 
> These are just moved north, no change in code.
> If you want to be extra nice to reviewers this could have been an extra patch,
> as it makes reviewing easier if you just have to look at the lines of code 
> with
> one goal ("shuffling code around, no change intended" vs "make a change
> to improve things with no bad side effects")

Yeah I suppose, I was having a difficult time breaking this largest
patch into multiple.

> 
> > +
> > +static void collect_changed_submodules_cb(struct diff_queue_struct *q,
> > + struct diff_options *options,
> > + void *data)
> > +{
> 
> This one combines both collect_submodules_from_diff and
> submodule_collect_changed_cb ?

Yep.

> 
> > @@ -921,61 +948,6 @@ int push_unpushed_submodules(struct oid_array *commits,
> > return ret;
> >  }
> >
> > -static int is_submodule_commit_present(const char *path, unsigned char 
> > sha1[20])
> > -{
> > -   int is_present = 0;
> > -   if (!add_submodule_odb(path) && lookup_commit_reference(sha1)) {
> > -   /* Even if the submodule is checked out and the commit is
> > -* present, make sure it is reachable from a ref. */
> > -   struct child_process cp = CHILD_PROCESS_INIT;
> > -   const char *argv[] = {"rev-list", "-n", "1", NULL, "--not", 
> > "--all", NULL};
> > -   struct strbuf buf = STRBUF_INIT;
> > -
> > -   argv[3] = sha1_to_hex(sha1);
> > -   cp.argv = argv;
> > -   prepare_submodule_repo_env(_array);
> > -   cp.git_cmd = 1;
> > -   cp.no_stdin = 1;
> > -   cp.dir = path;
> > -   if (!capture_command(, , 1024) && !buf.len)
> > -   is_present = 1;
> 
> Oh, I see where the nit in patch 5/6 is coming from. Another note
> on that: The hint is way off. The hint should be on the order of
> GIT_SHA1_HEXSZ

Yeah agreed.  I make this change in the earlier patch.

> 
> >  int find_unpushed_submodules(struct oid_array *commits,
> > const char *remotes_name, struct string_list *needs_pushing)
> > ...
> 
> >  static void calculate_changed_submodule_paths(void)
> > ...
> 
> These are both nicely clean now.
> 
> Thanks,
> Stefan

-- 
Brandon Williams


Re: [PATCH 5/6] submodule: improve submodule_has_commits

2017-05-01 Thread Brandon Williams
On 04/30, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > oid_array_for_each_unique(commits, check_has_commit, _commit);
> > +
> > +   if (has_commit) {
> > +   /*
> > +* Even if the submodule is checked out and the commit is
> > +* present, make sure it is reachable from a ref.
> > +*/
> > +   struct child_process cp = CHILD_PROCESS_INIT;
> > +   struct strbuf out = STRBUF_INIT;
> > +
> > +   argv_array_pushl(, "rev-list", "-n", "1", NULL);
> > +   oid_array_for_each_unique(commits, append_oid_to_argv, 
> > );
> > +   argv_array_pushl(, "--not", "--all", NULL);
> > +
> > +   prepare_submodule_repo_env(_array);
> > +   cp.git_cmd = 1;
> > +   cp.no_stdin = 1;
> > +   cp.dir = path;
> > +
> > +   if (capture_command(, , 1024) || out.len)
> > +   has_commit = 0;
> > +
> > +   strbuf_release();
> > +   }
> > +
> > return has_commit;
> >  }
> 
> The "check-has-commit" we see in the pre-context is "we contaminated
> our in-core object store by tentatively borrowing from submodule's
> object store---now do we see these commits in our in-core view?"
> Which is a wrong thing to do from two separate point of view.  Even
> though the commit in question may be visible in our contaminated
> view, there is no guarantee that the commit exists in the object
> store of the correct submodule.  And of course the commit may exist
> but may not be anchored by any ref.
> 
> This patch fixes the latter, and if we remove that check-has-commit
> call before it, we can fix the former at the same time.

I noticed this when cleaning up this code but was unsure if I should
drop the "check-has-commit" bit.

> 
> There is value in leaving the check-has-commit code if we anticipate
> that we would very often have to say "no, the submodule does not
> have these commits"---a cheap but wrong check it does can be used as
> an optimization.  If we do not have the commit object anywhere,
> there is no chance we have it in the object store of the correct
> submodule and have it reachable from a ref, so we can fail without
> spawning rev-list which is expensive.

Mostly because it gave the code a way to fail quickly, of course I'm
making the assumption that polluting the object store than then checking
it is quicker than launching a child process (though I guess most things
are cheaper than launching a process ;)

-- 
Brandon Williams


Re: [PATCH 4/6] submodule: change string_list changed_submodule_paths

2017-05-01 Thread Brandon Williams
On 04/30, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Eliminate a call to 'xstrdup()' by changing the string_list
> > 'changed_submodule_paths' to duplicated strings added to it.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  submodule.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/submodule.c b/submodule.c
> > index 7baa28ae0..3bcf44521 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -20,7 +20,7 @@
> >  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
> >  static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
> >  static int parallel_jobs = 1;
> > -static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
> > +static struct string_list changed_submodule_paths = STRING_LIST_INIT_DUP;
> >  static int initialized_fetch_ref_tips;
> >  static struct oid_array ref_tips_before_fetch;
> >  static struct oid_array ref_tips_after_fetch;
> > @@ -939,7 +939,7 @@ static void submodule_collect_changed_cb(struct 
> > diff_queue_struct *q,
> > struct string_list_item *path;
> > path = 
> > unsorted_string_list_lookup(_submodule_paths, p->two->path);
> > if (!path && !is_submodule_commit_present(p->two->path, 
> > p->two->oid.hash))
> > -   string_list_append(_submodule_paths, 
> > xstrdup(p->two->path));
> > +   string_list_append(_submodule_paths, 
> > p->two->path);
> 
> I notice that "path" is not used at all, and other users of this
> string list do not even bother using a variable, i.e.
> 
>   if (!unsorted_string_list_lookup(_submodule_paths, ...))
> 
> In fact, it might be even better to use a hashmap for this instead?
> 
> The call to string_list_clear() onthis list tells it to free the
> util field, and I went to see what we are storing in the util field,
> but it seems that it is freeing NULLs, which is somewhat misleading
> and time-wasting on the code readers.  Using hashmap may also clear
> this up.
> 
> But all of the above are not within the scope of this topic ;-)

All good good points which hopefully are resolved in a later patch.  This
patch exists mainly to factor out the change to make the string_list
duplicate the strings added to it as opposed to have that change seem
random and out of place in a later patch.

Most of this logic itself is removed entirely later once it is unified
with the other code path which checked for changed submodules.  This may
be out of context on this particular patch, but eventually an oid_array
is stored in the 'util' field of the string list items.  This oid_array
holds all of the changes to the submodule pointers.  This allows us to
then batch check for the existence of all the submodule commits instead
of checking each change individually (which is what this code path
currently does).

-- 
Brandon Williams


  1   2   >