Re: how does "clone --filter=sparse:path" work?

2018-11-08 Thread Jeff Hostetler




On 11/8/2018 12:07 AM, Jeff King wrote:

[cc-ing people who worked on or seem interested in partial-clone
filters]

I've been exploring the partial-clone options a bit, and took a look at
the "sparse:path" option. I had some confusion initially, because I
expected it work something like this:

   git clone --filter=sparse:path=Documentation .

But it really doesn't take an in-repo path. You have to specify a path
to a file that contains a file with .gitignore patterns. Except they're
actually _inverted_ patterns (i.e., what to include). Which confused me
again, but I guess makes sense if these are meant to be adapted from
sparse-checkout files.

So my first question is: how is this meant to be used?

I guess the idea is that somebody (the repo admin?) makes a set of
pre-made profiles, with each profile mentioning some subset of paths.
And then when you clone, you say, "I want the foo profile". How is that
profile stored and accessed?


Yes, the basic idea was if you had a large tree and various groups
of users that tended to only need their group's portion of the tree,
then you could (or your repo admin could) create several profiles.
And users could use a profile to request just the subset of trees and
blobs they need.

During a clone/fetch users could ask for the profile by OID or by
:.  It would be a local/custom detail where the repo admin
collects the various profiles.  (Or at least, that was the thought.)
Likewise, a user could create their own profile(s) by committing a
sparse-checkout file spec to a personal branch.  Again, a local
convention.




If it's a blob in the repository, I think you can use something like
"--filter=sparse:oid=profiles:foo". And then after cloning, you'd
do "git cat-file blob profiles:foo >.git/sparse-checkout" or similar.
That seems like something that could be wrapped up in a single clone
option, but I can't find one; I won't be surprised if it simply hasn't
happened yet.


Right, TBD.




But if it's a path, then what? I'd imagine that you'd "somehow" get a
copy of the sparse profile you want out-of-band. And then you'd say "I
want to clone with the profile in this file", and then copy it into the
sparse-checkout file to do the checkout.

But the sparse-checkout file in the filter is not expanded locally, with
patterns sent over the wire. The _filename_ is sent over the wire, and
then upload-pack expands it. So you can't specify an arbitrary set of
patterns; you have to know about the profile names (how?) on the server.
That's not very flexible, though I imagine it may make certain things
easier on the server (e.g., if you pack in such a way to efficiently
serve only particular profiles).

But I'm also concerned it's dangerous. We're reading gitignore patterns
from an arbitrary name on the server's filesystem, with that name coming
from an untrusted client. So I can do:

   git clone --filter=sparse:path=/etc/passwd $remote

and the server will read /etc/passwd. There's probably a lot of mischief
you can get up to with that. Sadly reading /proc/self/mem doesn't work,
because the gitignore code fstat()s the file to find out how much to
read, and the st_size there is 0. But there are probably others
(/proc/kcore is a fun one, but nobody is running their git server as
root, right?).

Below is a proof of concept script that uses this as an oracle to
explore the filesystem, as well as individual lines of files.

Should we simply be disallowing sparse:path filters over upload-pack?


The option to allow an absolute path over the wire probably needs more
thought as you suggest.

Having it in the traverse code was useful for local testing in the
client.

But mainly I was thinking of a use case on the client of the form:

git rev-list
--objects
--filter=spec:path=.git/sparse-checkout
--missing=print


and get a list of the blobs that you don't have and would need before
you could checkout  using the current sparse-checkout 
definition.  You could then have a pre-checkout hook that would bulk

fetch them before starting the actual checkout.  Since that would be
more efficient than demand-loading blobs individually during the
checkout.  There's more work to do in this area, but that was the idea.

But back to your point, yes, I think we should restrict this over the
wire.


Thanks,
Jeff






-Peff

-- >8 --
# Set this to host:repo.git to see a real cross-machine connection (which makes
# it more obvious which side is reading which files).  For a simulated local
# one, we'll use --no-local to make sure we really run upload-pack.
SERVER=server.git

# Do these steps manually on the remote side if you're trying it cross-server.
case "$SERVER" in
*:*)
;;
*)
# Imagine a server with a repository users can push to, with filters 
enabled.
git init -q --bare $SERVER
git -C $SERVER config uploadpack.allowfilter true

# Imagine the server has a file outside of the repo we're interested in.
echo foo 

Re: [PATCH 0/1] commit-reach: fix first-parent heuristic

2018-10-19 Thread Jeff Hostetler




On 10/18/2018 10:31 PM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:


I originally reported this fix [1] after playing around with the trace2
series for measuring performance. Since trace2 isn't merging quickly, I
pulled the performance fix patch out and am sending it on its own. The only
difference here is that we don't have the tracing to verify the performance
fix in the test script.


Thanks for sending this separately.  What's the current status of
the tracing patch series, by the way?



I'm still working on it. I hope to reroll and submit a new version
next week.  We are currently dogfooding a version of it with our gvfs
users.

Jeff


[PATCH v3 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-11 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 compat/mingw.c| 36 ---
 t/t0051-windows-named-pipe.sh |  2 +-
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 858ca14a57..18caf21969 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode)
return ret;
 }
 
+/*
+ * Calling CreateFile() using FILE_APPEND_DATA and without FILE_WRITE_DATA
+ * is documented in [1] as opening a writable file handle in append mode.
+ * (It is believed that) this is atomic since it is maintained by the
+ * kernel unlike the O_APPEND flag which is racily maintained by the CRT.
+ *
+ * [1] 
https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants
+ *
+ * This trick does not appear to work for named pipes.  Instead it creates
+ * a named pipe client handle that cannot be written to.  Callers should
+ * just use the regular _wopen() for them.  (And since client handle gets
+ * bound to a unique server handle, it isn't really an issue.)
+ */
 static int mingw_open_append(wchar_t const *wfilename, int oflags, ...)
 {
HANDLE handle;
@@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const *wfilename, 
int oflags, ...)
NULL, create, FILE_ATTRIBUTE_NORMAL, NULL);
if (handle == INVALID_HANDLE_VALUE)
return errno = err_win_to_posix(GetLastError()), -1;
+
/*
 * No O_APPEND here, because the CRT uses it only to reset the
-* file pointer to EOF on write(); but that is not necessary
-* for a file created with FILE_APPEND_DATA.
+* file pointer to EOF before each write(); but that is not
+* necessary (and may lead to races) for a file created with
+* FILE_APPEND_DATA.
 */
fd = _open_osfhandle((intptr_t)handle, O_BINARY);
if (fd < 0)
@@ -371,6 +386,21 @@ static int mingw_open_append(wchar_t const *wfilename, int 
oflags, ...)
return fd;
 }
 
+/*
+ * Does the pathname map to the local named pipe filesystem?
+ * That is, does it have a "//./pipe/" prefix?
+ */
+static int is_local_named_pipe_path(const char *filename)
+{
+   return (is_dir_sep(filename[0]) &&
+   is_dir_sep(filename[1]) &&
+   filename[2] == '.'  &&
+   is_dir_sep(filename[3]) &&
+   !strncasecmp(filename+4, "pipe", 4) &&
+   is_dir_sep(filename[8]) &&
+   filename[9]);
+}
+
 int mingw_open (const char *filename, int oflags, ...)
 {
typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, ...);
@@ -387,7 +417,7 @@ int mingw_open (const char *filename, int oflags, ...)
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
 
-   if (oflags & O_APPEND)
+   if ((oflags & O_APPEND) && !is_local_named_pipe_path(filename))
open_fn = mingw_open_append;
else
open_fn = _wopen;
diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh
index e3c36341a0..10ac92d225 100755
--- a/t/t0051-windows-named-pipe.sh
+++ b/t/t0051-windows-named-pipe.sh
@@ -4,7 +4,7 @@ test_description='Windows named pipes'
 
 . ./test-lib.sh
 
-test_expect_failure MINGW 'o_append write to named pipe' '
+test_expect_success MINGW 'o_append write to named pipe' '
GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 &&
{ test-tool windows-named-pipe t0051 >actual 2>&1 & } &&
pid=$! &&
-- 
gitgitgadget


[PATCH v3 0/2] Fixup for js/mingw-o-append

2018-09-11 Thread Jeff Hostetler via GitGitGadget
The recent change mingw O_APPEND change breaks writing to named pipes on
Windows. The first commit adds a new test to confirm the breakage and the
second commit fixes the problem. These could be squashed together or we can
just keep the fix and omit the test if that would be better.

d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND

The new mingw_open_append() routine successfully opens the client side of
the named pipe, but the first write() to it fails with EBADF. Adding the
FILE_WRITE_DATA corrects the problem.

 Signed-off-by: Jeff Hostetler jeffh...@microsoft.com
[jeffh...@microsoft.com]

Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: 
p...@peff.net

Jeff Hostetler (2):
  t0051: test GIT_TRACE to a windows named pipe
  mingw: fix mingw_open_append to work with named pipes

 Makefile   |  1 +
 compat/mingw.c | 36 +--
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 6 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh


base-commit: d641097589160eb795127d8dbcb14c877c217b60
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-35/jeffhostetler/fixup-mingw-o-append-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/35

Range-diff vs v2:

 1:  ecb30eb47c = 1:  ecb30eb47c t0051: test GIT_TRACE to a windows named pipe
 2:  f0361dd306 ! 2:  5592300ca5 mingw: fix mingw_open_append to work with 
named pipes
 @@ -46,22 +46,20 @@
return fd;
   }
   
 -+#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\'))
  +/*
  + * Does the pathname map to the local named pipe filesystem?
  + * That is, does it have a "//./pipe/" prefix?
  + */
 -+static int mingw_is_local_named_pipe_path(const char *filename)
 ++static int is_local_named_pipe_path(const char *filename)
  +{
 -+ return (IS_SBS(filename[0]) &&
 -+ IS_SBS(filename[1]) &&
 ++ return (is_dir_sep(filename[0]) &&
 ++ is_dir_sep(filename[1]) &&
  + filename[2] == '.'  &&
 -+ IS_SBS(filename[3]) &&
 ++ is_dir_sep(filename[3]) &&
  + !strncasecmp(filename+4, "pipe", 4) &&
 -+ IS_SBS(filename[8]) &&
 ++ is_dir_sep(filename[8]) &&
  + filename[9]);
  +}
 -+#undef IS_SBS
  +
   int mingw_open (const char *filename, int oflags, ...)
   {
 @@ -71,7 +69,7 @@
filename = "nul";
   
  - if (oflags & O_APPEND)
 -+ if ((oflags & O_APPEND) && !mingw_is_local_named_pipe_path(filename))
 ++ if ((oflags & O_APPEND) && !is_local_named_pipe_path(filename))
open_fn = mingw_open_append;
else
open_fn = _wopen;

-- 
gitgitgadget


[PATCH v3 1/2] t0051: test GIT_TRACE to a windows named pipe

2018-09-11 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Create a test-tool helper to create the server side of
a windows named pipe, wait for a client connection, and
copy data written to the pipe to stdout.

Create t0051 test to route GIT_TRACE output of a command
to a named pipe using the above test-tool helper.

Signed-off-by: Jeff Hostetler 
---
 Makefile   |  1 +
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 5 files changed, 96 insertions(+)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh

diff --git a/Makefile b/Makefile
index e4b503d259..b1b934d295 100644
--- a/Makefile
+++ b/Makefile
@@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o
 TEST_BUILTINS_OBJS += test-subprocess.o
 TEST_BUILTINS_OBJS += test-urlmatch-normalization.o
 TEST_BUILTINS_OBJS += test-wildmatch.o
+TEST_BUILTINS_OBJS += test-windows-named-pipe.o
 TEST_BUILTINS_OBJS += test-write-cache.o
 
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 805a45de9c..7a9764cd5c 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -41,6 +41,9 @@ static struct test_cmd cmds[] = {
{ "subprocess", cmd__subprocess },
{ "urlmatch-normalization", cmd__urlmatch_normalization },
{ "wildmatch", cmd__wildmatch },
+#ifdef GIT_WINDOWS_NATIVE
+   { "windows-named-pipe", cmd__windows_named_pipe },
+#endif
{ "write-cache", cmd__write_cache },
 };
 
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7116ddfb94..01c34fe5e7 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__wildmatch(int argc, const char **argv);
+#ifdef GIT_WINDOWS_NATIVE
+int cmd__windows_named_pipe(int argc, const char **argv);
+#endif
 int cmd__write_cache(int argc, const char **argv);
 
 #endif
diff --git a/t/helper/test-windows-named-pipe.c 
b/t/helper/test-windows-named-pipe.c
new file mode 100644
index 00..b4b752b01a
--- /dev/null
+++ b/t/helper/test-windows-named-pipe.c
@@ -0,0 +1,72 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+
+#ifdef GIT_WINDOWS_NATIVE
+static const char *usage_string = "";
+
+#define TEST_BUFSIZE (4096)
+
+int cmd__windows_named_pipe(int argc, const char **argv)
+{
+   const char *filename;
+   struct strbuf pathname = STRBUF_INIT;
+   int err;
+   HANDLE h;
+   BOOL connected;
+   char buf[TEST_BUFSIZE + 1];
+
+   if (argc < 2)
+   goto print_usage;
+   filename = argv[1];
+   if (strchr(filename, '/') || strchr(filename, '\\'))
+   goto print_usage;
+   strbuf_addf(, "//./pipe/%s", filename);
+
+   /*
+* Create a single instance of the server side of the named pipe.
+* This will allow exactly one client instance to connect to it.
+*/
+   h = CreateNamedPipeA(
+   pathname.buf,
+   PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE,
+   PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,
+   PIPE_UNLIMITED_INSTANCES,
+   TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL);
+   if (h == INVALID_HANDLE_VALUE) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "CreateNamedPipe failed: %s\n",
+   strerror(err));
+   return err;
+   }
+
+   connected = ConnectNamedPipe(h, NULL)
+   ? TRUE
+   : (GetLastError() == ERROR_PIPE_CONNECTED);
+   if (!connected) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "ConnectNamedPipe failed: %s\n",
+   strerror(err));
+   CloseHandle(h);
+   return err;
+   }
+
+   while (1) {
+   DWORD nbr;
+   BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL);
+   if (!success || nbr == 0)
+   break;
+   buf[nbr] = 0;
+
+   write(1, buf, nbr);
+   }
+
+   DisconnectNamedPipe(h);
+   CloseHandle(h);
+   return 0;
+
+print_usage:
+   fprintf(stderr, "usage: %s %s\n", argv[0], usage_string);
+   return 1;
+}
+#endif
diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh
new file mode 100755
index 00..e3c36341a0
--- /dev/null
+++ b/t/t0051-windows-named-pipe.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description='Windows named pipes'
+
+. ./test-lib.sh
+
+test_expe

Re: [PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-11 Thread Jeff Hostetler




On 9/10/2018 6:00 PM, Junio C Hamano wrote:

Johannes Sixt  writes:


   +#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\'))


I think you already have mingw_is_dir_sep() and its shorter alias
is_dir_sep() available to you.


good catch. thanks.



+/*
+ * Does the pathname map to the local named pipe filesystem?
+ * That is, does it have a "//./pipe/" prefix?
+ */
+static int mingw_is_local_named_pipe_path(const char *filename)


There is no need to prefix mingw_ to this function that is file
local static.  Isn't is_local_named_pipe() descriptive and unique
enough?


right. will do.



+{
+   return (IS_SBS(filename[0]) &&
+   IS_SBS(filename[1]) &&
+   filename[2] == '.'  &&
+   IS_SBS(filename[3]) &&
+   !strncasecmp(filename+4, "pipe", 4) &&
+   IS_SBS(filename[8]) &&
+   filename[9]);
+}
+#undef IS_SBS


It is kind-of surprising that there hasn't been any existing need
for a helper function that would allow us to write this function
like so:

static int is_local_named_pipe(const char *path)
{
return path_is_in_directory(path, "//./pipe/");
}

Not a suggestion to add such a thing; as long as we know there is no
other codepath that would benefit from having one, a generalization
like that can and should wait.


Yeah, I don't think we need something that general just yet.  Named
pipes exist in a special namespace using the UNC/network-share syntax
(rather than the DOS drive-letter syntax), and we don't do much with
UNC paths yet.

Perhaps, later we could have something to try splitting a UNC
path into , , and  and then have
is_local_named_pipe() verify the first 2 are as expected.
Or have a function like is_path_in_volume(path, server, volume)
that does the tests without cutting up strings and allocating.
We could do either, but I don't think we need to be that general
yet.

Thanks,
Jeff



Re: [PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-10 Thread Jeff Hostetler




On 9/10/2018 3:45 PM, Johannes Sixt wrote:

Am 10.09.18 um 19:05 schrieb Jeff Hostetler via GitGitGadget:

diff --git a/compat/mingw.c b/compat/mingw.c
index 858ca14a57..f87376b26a 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode)
  return ret;
  }
+/*
+ * Calling CreateFile() using FILE_APPEND_DATA and without 
FILE_WRITE_DATA
+ * is documented in [1] as opening a writable file handle in append 
mode.

+ * (It is believed that) this is atomic since it is maintained by the
+ * kernel unlike the O_APPEND flag which is racily maintained by the 
CRT.

+ *
+ * [1] 
https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants 


+ *
+ * This trick does not appear to work for named pipes.  Instead it 
creates

+ * a named pipe client handle that cannot be written to.  Callers should
+ * just use the regular _wopen() for them.  (And since client handle 
gets

+ * bound to a unique server handle, it isn't really an issue.)
+ */
  static int mingw_open_append(wchar_t const *wfilename, int oflags, ...)
  {
  HANDLE handle;
@@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const 
*wfilename, int oflags, ...)

  NULL, create, FILE_ATTRIBUTE_NORMAL, NULL);
  if (handle == INVALID_HANDLE_VALUE)
  return errno = err_win_to_posix(GetLastError()), -1;
+
  /*
   * No O_APPEND here, because the CRT uses it only to reset the
- * file pointer to EOF on write(); but that is not necessary
- * for a file created with FILE_APPEND_DATA.
+ * file pointer to EOF before each write(); but that is not
+ * necessary (and may lead to races) for a file created with
+ * FILE_APPEND_DATA.
   */
  fd = _open_osfhandle((intptr_t)handle, O_BINARY);
  if (fd < 0)
@@ -371,6 +386,23 @@ static int mingw_open_append(wchar_t const 
*wfilename, int oflags, ...)

  return fd;
  }
+#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\'))
+/*
+ * Does the pathname map to the local named pipe filesystem?
+ * That is, does it have a "//./pipe/" prefix?
+ */
+static int mingw_is_local_named_pipe_path(const char *filename)
+{
+    return (IS_SBS(filename[0]) &&
+    IS_SBS(filename[1]) &&
+    filename[2] == '.'  &&
+    IS_SBS(filename[3]) &&
+    !strncasecmp(filename+4, "pipe", 4) &&
+    IS_SBS(filename[8]) &&
+    filename[9]);
+}
+#undef IS_SBS
+
  int mingw_open (const char *filename, int oflags, ...)
  {
  typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, 
...);
@@ -387,7 +419,7 @@ int mingw_open (const char *filename, int oflags, 
...)

  if (filename && !strcmp(filename, "/dev/null"))
  filename = "nul";
-    if (oflags & O_APPEND)
+    if ((oflags & O_APPEND) && 
!mingw_is_local_named_pipe_path(filename))

  open_fn = mingw_open_append;
  else
  open_fn = _wopen;


This looks reasonable.


Thanks for the review.



I wonder which part of the code uses local named pipes. Is it downstream 
in Git for Windows or one of the topics in flight?


-- Hannes


I'm wanting to use them as a tracing target option in my trace2 series
currently in progress.

Jeff


[PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-10 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 compat/mingw.c| 38 ---
 t/t0051-windows-named-pipe.sh |  2 +-
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 858ca14a57..f87376b26a 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode)
return ret;
 }
 
+/*
+ * Calling CreateFile() using FILE_APPEND_DATA and without FILE_WRITE_DATA
+ * is documented in [1] as opening a writable file handle in append mode.
+ * (It is believed that) this is atomic since it is maintained by the
+ * kernel unlike the O_APPEND flag which is racily maintained by the CRT.
+ *
+ * [1] 
https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants
+ *
+ * This trick does not appear to work for named pipes.  Instead it creates
+ * a named pipe client handle that cannot be written to.  Callers should
+ * just use the regular _wopen() for them.  (And since client handle gets
+ * bound to a unique server handle, it isn't really an issue.)
+ */
 static int mingw_open_append(wchar_t const *wfilename, int oflags, ...)
 {
HANDLE handle;
@@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const *wfilename, 
int oflags, ...)
NULL, create, FILE_ATTRIBUTE_NORMAL, NULL);
if (handle == INVALID_HANDLE_VALUE)
return errno = err_win_to_posix(GetLastError()), -1;
+
/*
 * No O_APPEND here, because the CRT uses it only to reset the
-* file pointer to EOF on write(); but that is not necessary
-* for a file created with FILE_APPEND_DATA.
+* file pointer to EOF before each write(); but that is not
+* necessary (and may lead to races) for a file created with
+* FILE_APPEND_DATA.
 */
fd = _open_osfhandle((intptr_t)handle, O_BINARY);
if (fd < 0)
@@ -371,6 +386,23 @@ static int mingw_open_append(wchar_t const *wfilename, int 
oflags, ...)
return fd;
 }
 
+#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\'))
+/*
+ * Does the pathname map to the local named pipe filesystem?
+ * That is, does it have a "//./pipe/" prefix?
+ */
+static int mingw_is_local_named_pipe_path(const char *filename)
+{
+   return (IS_SBS(filename[0]) &&
+   IS_SBS(filename[1]) &&
+   filename[2] == '.'  &&
+   IS_SBS(filename[3]) &&
+   !strncasecmp(filename+4, "pipe", 4) &&
+   IS_SBS(filename[8]) &&
+   filename[9]);
+}
+#undef IS_SBS
+
 int mingw_open (const char *filename, int oflags, ...)
 {
typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, ...);
@@ -387,7 +419,7 @@ int mingw_open (const char *filename, int oflags, ...)
if (filename && !strcmp(filename, "/dev/null"))
filename = "nul";
 
-   if (oflags & O_APPEND)
+   if ((oflags & O_APPEND) && !mingw_is_local_named_pipe_path(filename))
open_fn = mingw_open_append;
else
open_fn = _wopen;
diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh
index e3c36341a0..10ac92d225 100755
--- a/t/t0051-windows-named-pipe.sh
+++ b/t/t0051-windows-named-pipe.sh
@@ -4,7 +4,7 @@ test_description='Windows named pipes'
 
 . ./test-lib.sh
 
-test_expect_failure MINGW 'o_append write to named pipe' '
+test_expect_success MINGW 'o_append write to named pipe' '
GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 &&
{ test-tool windows-named-pipe t0051 >actual 2>&1 & } &&
pid=$! &&
-- 
gitgitgadget


[PATCH v2 0/2] Fixup for js/mingw-o-append

2018-09-10 Thread Jeff Hostetler via GitGitGadget
The recent change mingw O_APPEND change breaks writing to named pipes on
Windows. The first commit adds a new test to confirm the breakage and the
second commit fixes the problem. These could be squashed together or we can
just keep the fix and omit the test if that would be better.

d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND

The new mingw_open_append() routine successfully opens the client side of
the named pipe, but the first write() to it fails with EBADF. Adding the
FILE_WRITE_DATA corrects the problem.

 Signed-off-by: Jeff Hostetler jeffh...@microsoft.com
[jeffh...@microsoft.com]

Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: 
p...@peff.net

Jeff Hostetler (2):
  t0051: test GIT_TRACE to a windows named pipe
  mingw: fix mingw_open_append to work with named pipes

 Makefile   |  1 +
 compat/mingw.c | 38 ++--
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 6 files changed, 131 insertions(+), 3 deletions(-)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh


base-commit: d641097589160eb795127d8dbcb14c877c217b60
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-35/jeffhostetler/fixup-mingw-o-append-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/35

Range-diff vs v1:

 1:  03453cb521 ! 1:  ecb30eb47c t0051: test GIT_TRACE to a windows named pipe
 @@ -140,7 +140,7 @@
  +
  +. ./test-lib.sh
  +
 -+test_expect_success MINGW 'o_append write to named pipe' '
 ++test_expect_failure MINGW 'o_append write to named pipe' '
  + GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 &&
  + { test-tool windows-named-pipe t0051 >actual 2>&1 & } &&
  + pid=$! &&
 2:  f433937d55 < -:  -- mingw: fix mingw_open_append to work with 
named pipes
 -:  -- > 2:  f0361dd306 mingw: fix mingw_open_append to work with 
named pipes

-- 
gitgitgadget


[PATCH v2 1/2] t0051: test GIT_TRACE to a windows named pipe

2018-09-10 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Create a test-tool helper to create the server side of
a windows named pipe, wait for a client connection, and
copy data written to the pipe to stdout.

Create t0051 test to route GIT_TRACE output of a command
to a named pipe using the above test-tool helper.

Signed-off-by: Jeff Hostetler 
---
 Makefile   |  1 +
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 5 files changed, 96 insertions(+)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh

diff --git a/Makefile b/Makefile
index e4b503d259..b1b934d295 100644
--- a/Makefile
+++ b/Makefile
@@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o
 TEST_BUILTINS_OBJS += test-subprocess.o
 TEST_BUILTINS_OBJS += test-urlmatch-normalization.o
 TEST_BUILTINS_OBJS += test-wildmatch.o
+TEST_BUILTINS_OBJS += test-windows-named-pipe.o
 TEST_BUILTINS_OBJS += test-write-cache.o
 
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 805a45de9c..7a9764cd5c 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -41,6 +41,9 @@ static struct test_cmd cmds[] = {
{ "subprocess", cmd__subprocess },
{ "urlmatch-normalization", cmd__urlmatch_normalization },
{ "wildmatch", cmd__wildmatch },
+#ifdef GIT_WINDOWS_NATIVE
+   { "windows-named-pipe", cmd__windows_named_pipe },
+#endif
{ "write-cache", cmd__write_cache },
 };
 
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7116ddfb94..01c34fe5e7 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__wildmatch(int argc, const char **argv);
+#ifdef GIT_WINDOWS_NATIVE
+int cmd__windows_named_pipe(int argc, const char **argv);
+#endif
 int cmd__write_cache(int argc, const char **argv);
 
 #endif
diff --git a/t/helper/test-windows-named-pipe.c 
b/t/helper/test-windows-named-pipe.c
new file mode 100644
index 00..b4b752b01a
--- /dev/null
+++ b/t/helper/test-windows-named-pipe.c
@@ -0,0 +1,72 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+
+#ifdef GIT_WINDOWS_NATIVE
+static const char *usage_string = "";
+
+#define TEST_BUFSIZE (4096)
+
+int cmd__windows_named_pipe(int argc, const char **argv)
+{
+   const char *filename;
+   struct strbuf pathname = STRBUF_INIT;
+   int err;
+   HANDLE h;
+   BOOL connected;
+   char buf[TEST_BUFSIZE + 1];
+
+   if (argc < 2)
+   goto print_usage;
+   filename = argv[1];
+   if (strchr(filename, '/') || strchr(filename, '\\'))
+   goto print_usage;
+   strbuf_addf(, "//./pipe/%s", filename);
+
+   /*
+* Create a single instance of the server side of the named pipe.
+* This will allow exactly one client instance to connect to it.
+*/
+   h = CreateNamedPipeA(
+   pathname.buf,
+   PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE,
+   PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,
+   PIPE_UNLIMITED_INSTANCES,
+   TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL);
+   if (h == INVALID_HANDLE_VALUE) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "CreateNamedPipe failed: %s\n",
+   strerror(err));
+   return err;
+   }
+
+   connected = ConnectNamedPipe(h, NULL)
+   ? TRUE
+   : (GetLastError() == ERROR_PIPE_CONNECTED);
+   if (!connected) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "ConnectNamedPipe failed: %s\n",
+   strerror(err));
+   CloseHandle(h);
+   return err;
+   }
+
+   while (1) {
+   DWORD nbr;
+   BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL);
+   if (!success || nbr == 0)
+   break;
+   buf[nbr] = 0;
+
+   write(1, buf, nbr);
+   }
+
+   DisconnectNamedPipe(h);
+   CloseHandle(h);
+   return 0;
+
+print_usage:
+   fprintf(stderr, "usage: %s %s\n", argv[0], usage_string);
+   return 1;
+}
+#endif
diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh
new file mode 100755
index 00..e3c36341a0
--- /dev/null
+++ b/t/t0051-windows-named-pipe.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description='Windows named pipes'
+
+. ./test-lib.sh
+
+test_expe

Re: [PATCH 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-10 Thread Jeff Hostetler




On 9/10/2018 12:42 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


Yeah, this whole thing is a little under-documented for my tastes.
Let's leave it as you have it.  I'll re-roll with a fix to route
named pipes to the existing _wopen() code.


OK, I have these two patches in 'next', but let's postpone it for
now.  Windows port will be built with extra topics over what we have
a the release anyway, so your improved version may become part of
that and then can come back to benefit us in the next cycle ;-)

Thanks.



That's fine.  This can easily wait until after 2.19.0.

Thanks
Jeff


Re: [PATCH 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-10 Thread Jeff Hostetler




On 9/8/2018 2:31 PM, Johannes Sixt wrote:

Am 08.09.2018 um 11:26 schrieb Johannes Sixt:

Am 07.09.2018 um 20:19 schrieb Jeff Hostetler via GitGitGadget:

diff --git a/compat/mingw.c b/compat/mingw.c
index 858ca14a57..ef03bbe5d2 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -355,7 +355,7 @@ static int mingw_open_append(wchar_t const 
*wfilename, int oflags, ...)

   * FILE_SHARE_WRITE is required to permit child processes
   * to append to the file.
   */
-    handle = CreateFileW(wfilename, FILE_APPEND_DATA,
+    handle = CreateFileW(wfilename, FILE_WRITE_DATA | FILE_APPEND_DATA,
  FILE_SHARE_WRITE | FILE_SHARE_READ,
  NULL, create, FILE_ATTRIBUTE_NORMAL, NULL);
  if (handle == INVALID_HANDLE_VALUE)



I did not go with this version because the documentation 
https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants 
says:


FILE_APPEND_DATA: For a file object, the right to append data to the 
file. (For local files, write operations will not overwrite existing 
data if this flag is specified without FILE_WRITE_DATA.) [...]


which could be interpreted as: Only if FILE_WRITE_DATA is not set, we 
have the guarantee that existing data in local files is not 
overwritten, i.e., new data is appended atomically.


Is this interpretation too narrow and we do get atomicity even when 
FILE_WRITE_DATA is set?


Here is are some comments on stackoverflow which let me think that 
FILE_APPEND_DATA with FILE_WRITE_DATA is no longer atomic:


https://stackoverflow.com/questions/20093571/difference-between-file-write-data-and-file-append-data#comment29995346_20108249 



-- Hannes


Yeah, this whole thing is a little under-documented for my tastes.
Let's leave it as you have it.  I'll re-roll with a fix to route
named pipes to the existing _wopen() code.

thanks
Jeff


Re: [PATCH 1/2] t0051: test GIT_TRACE to a windows named pipe

2018-09-10 Thread Jeff Hostetler




On 9/9/2018 3:28 AM, Sebastian Schuberth wrote:

On 9/7/2018 8:19 PM, Jeff Hostetler via GitGitGadget wrote:


+test_expect_success MINGW 'o_append write to named pipe' '


Shouldn't this be "test_expect_failure" here, and then be changed to 
"test_expect_success" by your second patch?





yes. thanks!
Jeff


Re: [PATCH 0/2] Fixup for js/mingw-o-append

2018-09-07 Thread Jeff Hostetler

GitGitGadget botched the CCs when I submitted this.
Replying here to add them.

Sorry,
Jeff

https://github.com/gitgitgadget/gitgitgadget/issues/35


On 9/7/2018 2:19 PM, Jeff Hostetler via GitGitGadget wrote:

The recent change mingw O_APPEND change breaks writing to named pipes on
Windows. The first commit adds a new test to confirm the breakage and the
second commit fixes the problem. These could be squashed together or we can
just keep the fix and omit the test if that would be better.

d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND

The new mingw_open_append() routine successfully opens the client side of
the named pipe, but the first write() to it fails with EBADF. Adding the
FILE_WRITE_DATA corrects the problem.

  Signed-off-by: Jeff Hostetler jeffh...@microsoft.com
[jeffh...@microsoft.com]

Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc:
p...@peff.net

Jeff Hostetler (2):
   t0051: test GIT_TRACE to a windows named pipe
   mingw: fix mingw_open_append to work with named pipes

  Makefile   |  1 +
  compat/mingw.c |  2 +-
  t/helper/test-tool.c   |  3 ++
  t/helper/test-tool.h   |  3 ++
  t/helper/test-windows-named-pipe.c | 72 ++
  t/t0051-windows-named-pipe.sh  | 17 +++
  6 files changed, 97 insertions(+), 1 deletion(-)
  create mode 100644 t/helper/test-windows-named-pipe.c
  create mode 100755 t/t0051-windows-named-pipe.sh


base-commit: d641097589160eb795127d8dbcb14c877c217b60
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-35/jeffhostetler/fixup-mingw-o-append-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/35



[PATCH 1/2] t0051: test GIT_TRACE to a windows named pipe

2018-09-07 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Create a test-tool helper to create the server side of
a windows named pipe, wait for a client connection, and
copy data written to the pipe to stdout.

Create t0051 test to route GIT_TRACE output of a command
to a named pipe using the above test-tool helper.

Signed-off-by: Jeff Hostetler 
---
 Makefile   |  1 +
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 5 files changed, 96 insertions(+)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh

diff --git a/Makefile b/Makefile
index e4b503d259..b1b934d295 100644
--- a/Makefile
+++ b/Makefile
@@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o
 TEST_BUILTINS_OBJS += test-subprocess.o
 TEST_BUILTINS_OBJS += test-urlmatch-normalization.o
 TEST_BUILTINS_OBJS += test-wildmatch.o
+TEST_BUILTINS_OBJS += test-windows-named-pipe.o
 TEST_BUILTINS_OBJS += test-write-cache.o
 
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 805a45de9c..7a9764cd5c 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -41,6 +41,9 @@ static struct test_cmd cmds[] = {
{ "subprocess", cmd__subprocess },
{ "urlmatch-normalization", cmd__urlmatch_normalization },
{ "wildmatch", cmd__wildmatch },
+#ifdef GIT_WINDOWS_NATIVE
+   { "windows-named-pipe", cmd__windows_named_pipe },
+#endif
{ "write-cache", cmd__write_cache },
 };
 
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7116ddfb94..01c34fe5e7 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv);
 int cmd__subprocess(int argc, const char **argv);
 int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__wildmatch(int argc, const char **argv);
+#ifdef GIT_WINDOWS_NATIVE
+int cmd__windows_named_pipe(int argc, const char **argv);
+#endif
 int cmd__write_cache(int argc, const char **argv);
 
 #endif
diff --git a/t/helper/test-windows-named-pipe.c 
b/t/helper/test-windows-named-pipe.c
new file mode 100644
index 00..b4b752b01a
--- /dev/null
+++ b/t/helper/test-windows-named-pipe.c
@@ -0,0 +1,72 @@
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "strbuf.h"
+
+#ifdef GIT_WINDOWS_NATIVE
+static const char *usage_string = "";
+
+#define TEST_BUFSIZE (4096)
+
+int cmd__windows_named_pipe(int argc, const char **argv)
+{
+   const char *filename;
+   struct strbuf pathname = STRBUF_INIT;
+   int err;
+   HANDLE h;
+   BOOL connected;
+   char buf[TEST_BUFSIZE + 1];
+
+   if (argc < 2)
+   goto print_usage;
+   filename = argv[1];
+   if (strchr(filename, '/') || strchr(filename, '\\'))
+   goto print_usage;
+   strbuf_addf(, "//./pipe/%s", filename);
+
+   /*
+* Create a single instance of the server side of the named pipe.
+* This will allow exactly one client instance to connect to it.
+*/
+   h = CreateNamedPipeA(
+   pathname.buf,
+   PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE,
+   PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,
+   PIPE_UNLIMITED_INSTANCES,
+   TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL);
+   if (h == INVALID_HANDLE_VALUE) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "CreateNamedPipe failed: %s\n",
+   strerror(err));
+   return err;
+   }
+
+   connected = ConnectNamedPipe(h, NULL)
+   ? TRUE
+   : (GetLastError() == ERROR_PIPE_CONNECTED);
+   if (!connected) {
+   err = err_win_to_posix(GetLastError());
+   fprintf(stderr, "ConnectNamedPipe failed: %s\n",
+   strerror(err));
+   CloseHandle(h);
+   return err;
+   }
+
+   while (1) {
+   DWORD nbr;
+   BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL);
+   if (!success || nbr == 0)
+   break;
+   buf[nbr] = 0;
+
+   write(1, buf, nbr);
+   }
+
+   DisconnectNamedPipe(h);
+   CloseHandle(h);
+   return 0;
+
+print_usage:
+   fprintf(stderr, "usage: %s %s\n", argv[0], usage_string);
+   return 1;
+}
+#endif
diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh
new file mode 100755
index 00..10ac92d225
--- /dev/null
+++ b/t/t0051-windows-named-pipe.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description='Windows named pipes'
+
+. ./test-lib.sh
+
+test_expe

[PATCH 2/2] mingw: fix mingw_open_append to work with named pipes

2018-09-07 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 compat/mingw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 858ca14a57..ef03bbe5d2 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -355,7 +355,7 @@ static int mingw_open_append(wchar_t const *wfilename, int 
oflags, ...)
 * FILE_SHARE_WRITE is required to permit child processes
 * to append to the file.
 */
-   handle = CreateFileW(wfilename, FILE_APPEND_DATA,
+   handle = CreateFileW(wfilename, FILE_WRITE_DATA | FILE_APPEND_DATA,
FILE_SHARE_WRITE | FILE_SHARE_READ,
NULL, create, FILE_ATTRIBUTE_NORMAL, NULL);
if (handle == INVALID_HANDLE_VALUE)
-- 
gitgitgadget


[PATCH 0/2] Fixup for js/mingw-o-append

2018-09-07 Thread Jeff Hostetler via GitGitGadget
The recent change mingw O_APPEND change breaks writing to named pipes on
Windows. The first commit adds a new test to confirm the breakage and the
second commit fixes the problem. These could be squashed together or we can
just keep the fix and omit the test if that would be better.

d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND

The new mingw_open_append() routine successfully opens the client side of
the named pipe, but the first write() to it fails with EBADF. Adding the
FILE_WRITE_DATA corrects the problem.

 Signed-off-by: Jeff Hostetler jeffh...@microsoft.com
[jeffh...@microsoft.com]

Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: 
p...@peff.net

Jeff Hostetler (2):
  t0051: test GIT_TRACE to a windows named pipe
  mingw: fix mingw_open_append to work with named pipes

 Makefile   |  1 +
 compat/mingw.c |  2 +-
 t/helper/test-tool.c   |  3 ++
 t/helper/test-tool.h   |  3 ++
 t/helper/test-windows-named-pipe.c | 72 ++
 t/t0051-windows-named-pipe.sh  | 17 +++
 6 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-windows-named-pipe.c
 create mode 100755 t/t0051-windows-named-pipe.sh


base-commit: d641097589160eb795127d8dbcb14c877c217b60
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-35/jeffhostetler/fixup-mingw-o-append-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/35
-- 
gitgitgadget


Re: [PATCH 1/8] trace2: create new combined trace facility

2018-09-05 Thread Jeff Hostetler




On 9/4/2018 6:30 PM, Junio C Hamano wrote:

Stefan Beller  writes:


The API defines both fixed-field and printf-style functions.

The trace2 performance tracing includes thread-specific function
nesting and timings.


So this only adds the new API, and we need to merge the TRACE
into the TRACE2 later?


If this is a rhetorical question implying that it would be best if
the existing trace() were rewritten to be built on top of trace2()
while building this series, especially before adding new callsites
that directly use trace2(), I may share that feeling.  I haven't
studied this new round deeply enough to see how realistic it would
be, though.




I wanted to come up with a unified API that we liked and was
sufficient to handle the default-key, performance-key,
the new event-key (currently supporting JSON output), and any
other formats/variants that we want (protobufs, syslog, etc).

And hopefully get some agreement on it and see what else we want
from it.

And then look at converting the trace_printf() and trace_performance()
calls to trace2.  Clearly, I could just replace the existing printf
style calls to trace2_printf's, but I thought it would be better to
look at them and promote them to higher-level functions.  For example,
the trace_argv_printf() calls are generally used to dump the command
line arguments for the current process or spawned child processes.
I have trace2_start() and trace2_child_start() that captures the
argv and additional information about it.  (The "why" if you will.)
So the trace_argv_* forms can go away.

Likewise, all of the trace_performance() and trace_performance_since()
can be converted to trace2_region_enter/_leave calls.  And those forms
can be removed from trace.[ch].

The net-net is that trace.[ch] shrinks in a short sequence of commits
on top of my initial trace2 commit in a reroll of this patch series.
(and replacing some of the demonstration commits in V1)

Then I'll address the trace_printf_key() forms, since they write
to alternate stream.

Then I can delete trace.[ch].  And we can consider renaming
trace2_* functions and/or GIT_TR2_* env vars if we want.

I wanted to avoid rewriting trace.[ch] just to delete them later
in the same patch series.

Jeff


Re: [PATCH 1/8] trace2: create new combined trace facility

2018-09-05 Thread Jeff Hostetler




On 9/4/2018 6:12 PM, Stefan Beller wrote:

Create GIT_TR2 trace-key to replace GIT_TRACE, GIT_TR2_PERFORMANCE to
replace GIT_TRACE_PERFORMANCE, and a new trace-key GIT_TR2_EVENT to
generate JSON data for telemetry purposes.  Other structured formats
can easily be added later using this new existing model.


So the idea is to use the GIT_TR2 instead of GIT_TRACE and we
get the same output as well as a new form of structured logging here?
(Then GIT_TRACE could be retired, and we'd use the new API to add
more events, which are also more structured as the API allows for more
than just a string printed?)


Define a higher-level event API that selectively writes to all of the
new GIT_TR2_* targets (depending on event type) without needing to call
different trace_printf*() or trace_performance_*() routines.

The API defines both fixed-field and printf-style functions.

The trace2 performance tracing includes thread-specific function
nesting and timings.


So this only adds the new API, and we need to merge the TRACE
into the TRACE2 later?


+++ b/trace2.c
@@ -0,0 +1,1592 @@

[...]

+
+/*
+ * TODO remove this section header
+ */


Yes, please.


+/*
+ * Compute a "unique" session id (SID) for the current process.  All events
+ * from this process will have this label.  If we were started by another
+ * git instance, use our parent's SID as a prefix and count the number of
+ * nested git processes.  (This lets us track parent/child relationships
+ * even if there is an intermediate shell process.)


How does this work with threading. From this description we can have
two threads starting new child processes and they have the same ID
(-2)


Threads are not involved here.

A git process computes its own unique session id.  It is constructed
from { [], ,  }.
So in the following example, fetch spawned rev-list and gc.  (I've
stripped out fields irrelevant to this discussion.)

"sid":"1536153920286494-12592",
"argv":["C:\\work\\gfw\\git.exe","--exec-path=.","fetch","gh"]

"sid":"1536153920286494-12592/1536153925520530-23024",
"argv":["git","rev-list","--objects","--stdin", ...]

"sid":"1536153920286494-12592/1536153926081533-23992",
"argv":["git","gc","--auto"]

So 2 child processes simultaneously spawned from 2 threads in the
top-level git command, would still have unique SIDs since their PIDs
are unique over the time interval of their execution.

In the above example, if rev-list spawned a child git process, that
child's SID would have 3 components (the prefix that it inherited
plus its own time and pid):
1536153920286494-12592/1536153925520530-23024/-

The above model works even if there is a shell command between the
top-level git command and the child git processes.




+
+/*
+ * TODO remove this section header
+ */


This looks like a reoccurring pattern. Did you have a tool
that needs these? Does the tool want to enforce some level of
documentation on each function?


No, this is just an  that helps me see the different sections
as different sections in the editor and helps me group like items.
I might change that to a group-level comment that describes the
overall concept/purpose of the section.  Either way I'll get rid of
the stars.




+
+/*
+ * Each thread (including the main thread) has a stack of nested regions.
+ * This is used to indent messages and provide nested perf times.  The
+ * limit here is for simplicity.  Note that the region stack is per-thread
+ * and not per-trace_key.


What are the (nested) regions used for? I would imagine recursive
operations, e.g. unpacking trees? Where am I supposed to read
up on those?


Unpacking trees would be a good usage. Duy did something similar in a
recent patch series.

Nested regions are intended for perf times.  The 8th patch in
this series demonstrates adding trace2_region_enter() and _leave()
calls inside read_directory_recursive().  The second column here
shows the elapsed time between the _enter and _leave.  (Again, I've
stripped out fields not currently relevant.)

| region_enter |   | status_untracked
| region_enter |   | ..read_directory_recursive:
| region_enter |   | read_directory_recursive:.github/
| region_leave |  0.95 | read_directory_recursive:.github/
| region_enter |   | read_directory_recursive:block-sha1/
| region_leave |  0.86 | read_directory_recursive:block-sha1/
| region_enter |   | read_directory_recursive:builtin/
| region_leave |  0.000716 | read_directory_recursive:builtin/
| region_enter |   | read_directory_recursive:ci/
| region_enter |   | ..read_directory_recursive:ci/util/
| region_leave |  0.87 | ..read_directory_recursive:ci/util/
| region_leave |  0.000250 | 

[PATCH 4/8] trace2: demonstrate trace2 child process classification

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Classify editory, pager, and sub-process child processes.
The former two can be used to identify interactive commands,
for example.

Signed-off-by: Jeff Hostetler 
---
 editor.c  | 1 +
 pager.c   | 1 +
 sub-process.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/editor.c b/editor.c
index 9a9b4e12d1..29707de198 100644
--- a/editor.c
+++ b/editor.c
@@ -66,6 +66,7 @@ int launch_editor(const char *path, struct strbuf *buffer, 
const char *const *en
p.argv = args;
p.env = env;
p.use_shell = 1;
+   p.trace2_child_class = "editor";
if (start_command() < 0)
return error("unable to start editor '%s'", editor);
 
diff --git a/pager.c b/pager.c
index a768797fcf..4168460ae9 100644
--- a/pager.c
+++ b/pager.c
@@ -100,6 +100,7 @@ void prepare_pager_args(struct child_process 
*pager_process, const char *pager)
argv_array_push(_process->args, pager);
pager_process->use_shell = 1;
setup_pager_env(_process->env_array);
+   pager_process->trace2_child_class = "pager";
 }
 
 void setup_pager(void)
diff --git a/sub-process.c b/sub-process.c
index 8d2a1707cf..3f4af93555 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -88,6 +88,7 @@ int subprocess_start(struct hashmap *hashmap, struct 
subprocess_entry *entry, co
process->out = -1;
process->clean_on_exit = 1;
process->clean_on_exit_handler = subprocess_exit_handler;
+   process->trace2_child_class = "subprocess";
 
err = start_command(process);
if (err) {
-- 
gitgitgadget



[PATCH 1/8] trace2: create new combined trace facility

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Create trace2 API allowing event-based tracing.  This will hopefully
eventually replace the existing trace API.

Create GIT_TR2 trace-key to replace GIT_TRACE, GIT_TR2_PERFORMANCE to
replace GIT_TRACE_PERFORMANCE, and a new trace-key GIT_TR2_EVENT to
generate JSON data for telemetry purposes.  Other structured formats
can easily be added later using this new existing model.

Define a higher-level event API that selectively writes to all of the
new GIT_TR2_* targets (depending on event type) without needing to call
different trace_printf*() or trace_performance_*() routines.

The API defines both fixed-field and printf-style functions.

The trace2 performance tracing includes thread-specific function
nesting and timings.

Signed-off-by: Jeff Hostetler 
---
 Makefile |1 +
 cache.h  |1 +
 trace2.c | 1592 ++
 trace2.h |  214 
 4 files changed, 1808 insertions(+)
 create mode 100644 trace2.c
 create mode 100644 trace2.h

diff --git a/Makefile b/Makefile
index 5a969f5830..88ae7afdd4 100644
--- a/Makefile
+++ b/Makefile
@@ -974,6 +974,7 @@ LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
 LIB_OBJS += tmp-objdir.o
 LIB_OBJS += trace.o
+LIB_OBJS += trace2.o
 LIB_OBJS += trailer.o
 LIB_OBJS += transport.o
 LIB_OBJS += transport-helper.o
diff --git a/cache.h b/cache.h
index 4d014541ab..38f0cd2ba0 100644
--- a/cache.h
+++ b/cache.h
@@ -9,6 +9,7 @@
 #include "gettext.h"
 #include "convert.h"
 #include "trace.h"
+#include "trace2.h"
 #include "string-list.h"
 #include "pack-revindex.h"
 #include "hash.h"
diff --git a/trace2.c b/trace2.c
new file mode 100644
index 00..13d5c85366
--- /dev/null
+++ b/trace2.c
@@ -0,0 +1,1592 @@
+#include "cache.h"
+#include "config.h"
+#include "json-writer.h"
+#include "quote.h"
+#include "run-command.h"
+#include "sigchain.h"
+#include "thread-utils.h"
+#include "version.h"
+
+/*
+ * TODO remove this section header
+ */
+
+static struct strbuf tr2sid_buf = STRBUF_INIT;
+static int tr2sid_nr_git_parents;
+
+/*
+ * Compute a "unique" session id (SID) for the current process.  All events
+ * from this process will have this label.  If we were started by another
+ * git instance, use our parent's SID as a prefix and count the number of
+ * nested git processes.  (This lets us track parent/child relationships
+ * even if there is an intermediate shell process.)
+ */
+static void tr2sid_compute(void)
+{
+   uint64_t us_now;
+   const char *parent_sid;
+
+   if (tr2sid_buf.len)
+   return;
+
+   parent_sid = getenv("SLOG_PARENT_SID");
+   if (parent_sid && *parent_sid) {
+   const char *p;
+   for (p = parent_sid; *p; p++)
+   if (*p == '/')
+   tr2sid_nr_git_parents++;
+
+   strbuf_addstr(_buf, parent_sid);
+   strbuf_addch(_buf, '/');
+   tr2sid_nr_git_parents++;
+   }
+
+   us_now = getnanotime() / 1000;
+   strbuf_addf(_buf, "%"PRIuMAX"-%"PRIdMAX,
+   (uintmax_t)us_now, (intmax_t)getpid());
+
+   setenv("SLOG_PARENT_SID", tr2sid_buf.buf, 1);
+}
+
+/*
+ * TODO remove this section header
+ */
+
+/*
+ * Each thread (including the main thread) has a stack of nested regions.
+ * This is used to indent messages and provide nested perf times.  The
+ * limit here is for simplicity.  Note that the region stack is per-thread
+ * and not per-trace_key.
+ */
+#define TR2_MAX_REGION_NESTING (100)
+#define TR2_INDENT (2)
+#define TR2_INDENT_LENGTH(ctx) (((ctx)->nr_open_regions - 1) * TR2_INDENT)
+
+#define TR2_MAX_THREAD_NAME (24)
+
+struct tr2tls_thread_ctx {
+   struct strbuf thread_name;
+   uint64_t us_start[TR2_MAX_REGION_NESTING];
+   int nr_open_regions;
+   int thread_id;
+};
+
+static struct tr2tls_thread_ctx *tr2tls_thread_main;
+
+static pthread_mutex_t tr2tls_mutex;
+static pthread_key_t tr2tls_key;
+static int tr2tls_initialized;
+
+static struct strbuf tr2_dots = STRBUF_INIT;
+
+static int tr2_next_child_id;
+static int tr2_next_thread_id;
+
+/*
+ * Create TLS data for the current thread.  This gives us a place to
+ * put per-thread data, such as thread start time, function nesting
+ * and a per-thread label for our messages.
+ *
+ * We assume the first thread is "main".  Other threads are given
+ * non-zero thread-ids to help distinguish messages from concurrent
+ * threads.
+ *
+ * Truncate the thread name if necessary to help with column align

[PATCH 5/8] trace2: demonstrate instrumenting do_read_index

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 read-cache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 7b1354d759..7a31ac4da8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1867,6 +1867,8 @@ static void tweak_split_index(struct index_state *istate)
 
 static void post_read_index_from(struct index_state *istate)
 {
+   trace2_data_intmax("index", "cache_nr", istate->cache_nr);
+
check_ce_order(istate);
tweak_untracked_cache(istate);
tweak_split_index(istate);
@@ -2012,7 +2014,9 @@ int read_index_from(struct index_state *istate, const 
char *path,
if (istate->initialized)
return istate->cache_nr;
 
+   trace2_region_enter("do_read_index");
ret = do_read_index(istate, path, 0);
+   trace2_region_leave("do_read_index");
trace_performance_since(start, "read cache %s", path);
 
split_index = istate->split_index;
@@ -2028,7 +2032,9 @@ int read_index_from(struct index_state *istate, const 
char *path,
 
base_oid_hex = oid_to_hex(_index->base_oid);
base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex);
+   trace2_region_enter("do_read_index");
ret = do_read_index(split_index->base, base_path, 1);
+   trace2_region_leave("do_read_index");
if (oidcmp(_index->base_oid, _index->base->oid))
die("broken index, expect %s in %s, got %s",
base_oid_hex, base_path,
-- 
gitgitgadget



[PATCH 0/8] WIP: trace2: a new trace facility

2018-08-31 Thread Jeff Hostetler via GitGitGadget
This patch series contains a new trace2 facility that hopefully addresses
the recent trace- and structured-logging-related discussions. The intent is
to eventually replace the existing trace_ routines (or to route them to the
new trace2_ routines) as time permits.

This draft adds new trace2_ calls and leaves most of the original trace_
calls in place. Subsequent drafts will address this.

This version should be considered a replacement for my earlier structured
logging patch series [1].

It addresses Jonathan Nieder's, Ben Peart's, Peff's, and Junio's comments
[2,3,4,5] about merging the existing tracing routines, ease of use,
progressive logging rather than logging at the end of the program, hiding
all JSON details inside the trace2_ routines, and leaving an opening for
additional formats (protobuf or whatever).

It also adds a nested performance tracing feature similar to Duy's
suggestion in [6]. This version adds per-thread nesting and marks each event
with a thread name.

[1] 
https://public-inbox.org/git/20180713165621.52017-1-...@jeffhostetler.com/
[2] 
https://public-inbox.org/git/20180821044724.ga219...@aiede.svl.corp.google.com/
[3] 
https://public-inbox.org/git/13302a8c-a114-c3a7-65df-55f47f902...@gmail.com/
[4] 
https://public-inbox.org/git/20180814195456.ge28...@sigill.intra.peff.net/
[5] https://public-inbox.org/git/xmqqeff0zn53@gitster-ct.c.googlers.com/
[6] https://public-inbox.org/git/20180818144128.19361-2-pclo...@gmail.com/

Cc: gitster@pobox.comCc: peff@peff.netCc: peartben@gmail.comCc: 
jrnieder@gmail.comCc: pclo...@gmail.com

Jeff Hostetler (8):
  trace2: create new combined trace facility
  trace2: add trace2 to main
  trace2: demonstrate trace2 regions in wt-status
  trace2: demonstrate trace2 child process classification
  trace2: demonstrate instrumenting do_read_index
  trace2: demonstrate instrumenting threaded preload_index
  trace2: demonstrate setting sub-command parameter in checkout
  trace2: demonstrate use of regions in read_directory_recursive

 Makefile   |1 +
 builtin/checkout.c |5 +
 cache.h|1 +
 compat/mingw.h |3 +-
 dir.c  |3 +
 editor.c   |1 +
 git-compat-util.h  |7 +
 git.c  |9 +-
 pager.c|1 +
 preload-index.c|   10 +
 read-cache.c   |6 +
 repository.c   |2 +
 run-command.c  |8 +-
 run-command.h  |5 +
 sub-process.c  |1 +
 trace2.c   | 1592 
 trace2.h   |  214 ++
 usage.c|5 +
 wt-status.c|   14 +-
 19 files changed, 1882 insertions(+), 6 deletions(-)
 create mode 100644 trace2.c
 create mode 100644 trace2.h


base-commit: 2f743933341f27603550fbf383a34dfcfd38
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-29%2Fjeffhostetler%2Fml-trace2-v0-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-29/jeffhostetler/ml-trace2-v0-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/29
-- 
gitgitgadget


[PATCH 6/8] trace2: demonstrate instrumenting threaded preload_index

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Add trace2_region_enter() and _leave() around the entire preload_index()
call.  Also add thread-specific events in the preload_thread() threadproc.

Signed-off-by: Jeff Hostetler 
---
 preload-index.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/preload-index.c b/preload-index.c
index 71cd2437a3..2483d92c61 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -40,10 +40,14 @@ static void *preload_thread(void *_data)
struct cache_entry **cep = index->cache + p->offset;
struct cache_def cache = CACHE_DEF_INIT;
 
+   trace2_thread_start("preload_thread");
+
nr = p->nr;
if (nr + p->offset > index->cache_nr)
nr = index->cache_nr - p->offset;
 
+   trace2_printf("preload {offset %7d}{count %7d}", p->offset, nr);
+
do {
struct cache_entry *ce = *cep++;
struct stat st;
@@ -70,6 +74,9 @@ static void *preload_thread(void *_data)
mark_fsmonitor_valid(ce);
} while (--nr > 0);
cache_def_clear();
+
+   trace2_thread_exit();
+
return NULL;
 }
 
@@ -118,6 +125,9 @@ int read_index_preload(struct index_state *index,
 {
int retval = read_index(index);
 
+   trace2_region_enter("preload_index");
preload_index(index, pathspec);
+   trace2_region_leave("preload_index");
+
return retval;
 }
-- 
gitgitgadget



[PATCH 3/8] trace2: demonstrate trace2 regions in wt-status

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Add trace2_region_enter() and trace2_region_leave() calls around the
various phases of a status scan.  This gives elapsed time for each
phase in the GIT_TR2_PERFORMANCE trace target.

Signed-off-by: Jeff Hostetler 
---
 wt-status.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 5ffab61015..9e37a73c73 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -726,13 +726,23 @@ static void wt_status_collect_untracked(struct wt_status 
*s)
 
 void wt_status_collect(struct wt_status *s)
 {
+   trace2_region_enter("status_worktrees");
wt_status_collect_changes_worktree(s);
+   trace2_region_leave("status_worktrees");
 
-   if (s->is_initial)
+   if (s->is_initial) {
+   trace2_region_enter("status_initial");
wt_status_collect_changes_initial(s);
-   else
+   trace2_region_leave("status_initial");
+   } else {
+   trace2_region_enter("status_index");
wt_status_collect_changes_index(s);
+   trace2_region_leave("status_index");
+   }
+
+   trace2_region_enter("status_untracked");
wt_status_collect_untracked(s);
+   trace2_region_leave("status_untracked");
 }
 
 static void wt_longstatus_print_unmerged(struct wt_status *s)
-- 
gitgitgadget



[PATCH 7/8] trace2: demonstrate setting sub-command parameter in checkout

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Add trace2_param() events in checkout to report whether the command
is switching branches or just checking out a file.

Signed-off-by: Jeff Hostetler 
---
 builtin/checkout.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29ef50013d..0934587781 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -251,6 +251,8 @@ static int checkout_paths(const struct checkout_opts *opts,
int errs = 0;
struct lock_file lock_file = LOCK_INIT;
 
+   trace2_param("subcommand", (opts->patch_mode ? "patch" : "path"));
+
if (opts->track != BRANCH_TRACK_UNSPECIFIED)
die(_("'%s' cannot be used with updating paths"), "--track");
 
@@ -828,6 +830,9 @@ static int switch_branches(const struct checkout_opts *opts,
void *path_to_free;
struct object_id rev;
int flag, writeout_error = 0;
+
+   trace2_param("subcommand", "switch");
+
memset(_branch_info, 0, sizeof(old_branch_info));
old_branch_info.path = path_to_free = resolve_refdup("HEAD", 0, , 
);
if (old_branch_info.path)
-- 
gitgitgadget



[PATCH 8/8] trace2: demonstrate use of regions in read_directory_recursive

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Add trace2_region_enter() and _leave() calls inside read_directory_recursive()
to show nesting and per-level timing.

TODO Drop this commit or ifdef the calls.  It generates too much data to
be in the production version.  It is included in this patch series for
illustration purposes only.  It is typical of what a developer might do
during a perf investigation.

Signed-off-by: Jeff Hostetler 
---
 dir.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/dir.c b/dir.c
index aceb0d4869..c7c0654da5 100644
--- a/dir.c
+++ b/dir.c
@@ -1951,6 +1951,8 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
 
strbuf_add(, base, baselen);
 
+   trace2_region_enter("read_directory_recursive:%.*s", baselen, base);
+
if (open_cached_dir(, dir, untracked, istate, , check_only))
goto out;
 
@@ -2042,6 +2044,7 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
close_cached_dir();
  out:
strbuf_release();
+   trace2_region_leave("read_directory_recursive:%.*s", baselen, base);
 
return dir_state;
 }
-- 
gitgitgadget


[PATCH 2/8] trace2: add trace2 to main

2018-08-31 Thread Jeff Hostetler via GitGitGadget
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 compat/mingw.h| 3 +--
 git-compat-util.h | 7 +++
 git.c | 9 -
 repository.c  | 2 ++
 run-command.c | 8 +++-
 run-command.h | 5 +
 usage.c   | 5 +
 7 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.h b/compat/mingw.h
index 571019d0bd..606402faeb 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -144,8 +144,7 @@ static inline int fcntl(int fd, int cmd, ...)
errno = EINVAL;
return -1;
 }
-/* bash cannot reliably detect negative return codes as failure */
-#define exit(code) exit((code) & 0xff)
+
 #define sigemptyset(x) (void)0
 static inline int sigaddset(sigset_t *set, int signum)
 { return 0; }
diff --git a/git-compat-util.h b/git-compat-util.h
index 5f2e90932f..c0901d9ec6 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1219,6 +1219,13 @@ static inline int is_missing_file_error(int errno_)
 
 extern int cmd_main(int, const char **);
 
+/*
+ * Intercept all calls to exit() and route them to trace2 to
+ * optionally emit a message before calling the real exit().
+ */
+int trace2_exit_fl(const char *file, int line, int code);
+#define exit(code) exit(trace2_exit_fl(__FILE__, __LINE__, (code)))
+
 /*
  * You can mark a stack variable with UNLEAK(var) to avoid it being
  * reported as a leak by tools like LSAN or valgrind. The argument
diff --git a/git.c b/git.c
index c27c38738b..cc56279a8c 100644
--- a/git.c
+++ b/git.c
@@ -331,6 +331,8 @@ static int handle_alias(int *argcp, const char ***argv)
argv_array_push(, alias_string + 1);
argv_array_pushv(, (*argv) + 1);
 
+   trace2_alias(alias_command, child.args.argv);
+
ret = run_command();
if (ret >= 0)   /* normal exit */
exit(ret);
@@ -365,6 +367,8 @@ static int handle_alias(int *argcp, const char ***argv)
/* insert after command name */
memcpy(new_argv + count, *argv + 1, sizeof(char *) * *argcp);
 
+   trace2_alias(alias_command, new_argv);
+
*argv = new_argv;
*argcp += count - 1;
 
@@ -413,6 +417,7 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
setup_work_tree();
 
trace_argv_printf(argv, "trace: built-in: git");
+   trace2_command(p->cmd);
 
validate_cache_entries(_index);
status = p->fn(argc, argv, prefix);
@@ -719,6 +724,8 @@ int cmd_main(int argc, const char **argv)
cmd = slash + 1;
}
 
+   trace2_start(argv);
+
trace_command_performance(argv);
 
/*
@@ -782,5 +789,5 @@ int cmd_main(int argc, const char **argv)
fprintf(stderr, _("failed to run command '%s': %s\n"),
cmd, strerror(errno));
 
-   return 1;
+   return trace2_exit(1);
 }
diff --git a/repository.c b/repository.c
index 5dd1486718..c169f61ccd 100644
--- a/repository.c
+++ b/repository.c
@@ -113,6 +113,8 @@ out:
 void repo_set_worktree(struct repository *repo, const char *path)
 {
repo->worktree = real_pathdup(path, 1);
+
+   trace2_worktree(repo->worktree);
 }
 
 static int read_and_verify_repository_format(struct repository_format *format,
diff --git a/run-command.c b/run-command.c
index 84b883c213..e833d9a277 100644
--- a/run-command.c
+++ b/run-command.c
@@ -706,6 +706,7 @@ fail_pipe:
cmd->err = fderr[0];
}
 
+   trace2_child_start(cmd);
trace_run_command(cmd);
 
fflush(NULL);
@@ -911,6 +912,8 @@ fail_pipe:
 #endif
 
if (cmd->pid < 0) {
+   trace2_child_exit(cmd, -1);
+
if (need_in)
close_pair(fdin);
else if (cmd->in)
@@ -949,13 +952,16 @@ fail_pipe:
 int finish_command(struct child_process *cmd)
 {
int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0);
+   trace2_child_exit(cmd, ret);
child_process_clear(cmd);
return ret;
 }
 
 int finish_command_in_signal(struct child_process *cmd)
 {
-   return wait_or_whine(cmd->pid, cmd->argv[0], 1);
+   int ret = wait_or_whine(cmd->pid, cmd->argv[0], 1);
+   trace2_child_exit(cmd, ret);
+   return ret;
 }
 
 
diff --git a/run-command.h b/run-command.h
index 3932420ec8..a91206b08c 100644
--- a/run-command.h
+++ b/run-command.h
@@ -12,6 +12,11 @@ struct child_process {
struct argv_array args;
struct argv_array env_array;
pid_t pid;
+
+   int trace2_child_id;
+   uint64_t trace2_child_us_start;
+   const char *trace2_child_class;
+
/*
 * Using .in, .out, .err:
 * - Specify 0 for no redirections (child inherits stdin, stdout,
diff --git a/usage.c b/usage.c
index cc803336bd..1838c46d20 100644
--- a/usage.c
+++ b

Re: [PATCH v1 00/25] RFC: structured logging

2018-08-28 Thread Jeff Hostetler




On 8/28/2018 1:38 PM, Junio C Hamano wrote:

g...@jeffhostetler.com writes:


From: Jeff Hostetler 

This RFC patch series adds structured logging to git.  The motivation,
...

Jeff Hostetler (25):
   structured-logging: design document
   structured-logging: add STRUCTURED_LOGGING=1 to Makefile
   structured-logging: add structured logging framework
   structured-logging: add session-id to log events
   structured-logging: set sub_command field for branch command
   structured-logging: set sub_command field for checkout command
   structured-logging: t0420 basic tests
   structured-logging: add detail-event facility
   structured-logging: add detail-event for lazy_init_name_hash
   structured-logging: add timer facility
   structured-logging: add timer around do_read_index
   structured-logging: add timer around do_write_index
   structured-logging: add timer around wt-status functions
   structured-logging: add timer around preload_index
   structured-logging: t0420 tests for timers
   structured-logging: add aux-data facility
   structured-logging: add aux-data for index size
   structured-logging: add aux-data for size of sparse-checkout file
   structured-logging: t0420 tests for aux-data
   structured-logging: add structured logging to remote-curl
   structured-logging: add detail-events for child processes
   structured-logging: add child process classification
   structured-logging: t0420 tests for child process detail events
   structured-logging: t0420 tests for interacitve child_summary
   structured-logging: add config data facility



I noticed that Travis job has been failing with a trivially fixable
failure, so I'll push out today's 'pu' with the attached applied on
top.  This may become unapplicable to the code when issues raised in
recent reviews addressed, though.

  structured-logging.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/structured-logging.c b/structured-logging.c
index 0e3f79ee48..78abcd2e59 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name)
 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__command_name)
-   free(my__command_name);
+   free(my__command_name);
my__command_name = xstrdup(command_name);
  }
  
@@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name)

 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__sub_command_name)
-   free(my__sub_command_name);
+   free(my__sub_command_name);
my__sub_command_name = xstrdup(sub_command_name);
  }
  



Sorry about that.

Let me withdraw the current series.  I'm working on a new version that
addresses the comments on the mailing list.  It combines my logging
with a variation on the nested perf logging that Duy suggested and
the consolidation that you were talking about last week.

Jeff



Re: [PATCH v7 09/16] fetch: support filters

2018-08-20 Thread Jeff Hostetler




On 8/19/2018 7:24 AM, Duy Nguyen wrote:

On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler  wrote:


From: Jeff Hostetler 

Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan 
---
  builtin/fetch.c   | 23 +--
  connected.c   |  2 ++
  remote-curl.c |  6 ++
  t/t5500-fetch-pack.sh | 36 
  4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..14aab71 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
  #include "argv-array.h"
  #include "utf8.h"
  #include "packfile.h"
+#include "list-objects-filter-options.h"

  static const char * const builtin_fetch_usage[] = {
 N_("git fetch [] [ [...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = 
RECURSE_SUBMODULES_ON_DEMAND;
  static int shown_url = 0;
  static int refmap_alloc, refmap_nr;
  static const char **refmap_array;
+static struct list_objects_filter_options filter_options;

  static int git_fetch_config(const char *k, const char *v, void *cb)
  {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
 TRANSPORT_FAMILY_IPV4),
 OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
 TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(_options),


Documentation is missing. Please add something to git-fetch.txt (or
fetch-options.txt) about this option. I would make a patch but I don't
know enough about this to write and I'm in the middle of something
else.



Documentation for --filter= (and --no-filter) were added to
rev-list-options.txt.  The opening paragraph talks about --objects
which would need to be omitted for fetch and clone, but the rest
of that text is relevant.

I'll push up a patch shortly.

Thanks
Jeff


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler




On 8/14/2018 2:44 PM, Stefan Beller wrote:

On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:


On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:

I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.


Complement maybe. Replace, please no. I'd rather not stare at json messages.


 From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/



Last time I checked, protocol-buffers has a C++ binding but not
a C binding.

I've not had a chance to use pbuffers, so I have to ask what advantages
would they have over JSON or some other similar self-describing format?
And/or would it be possible for you to tail the json log file and
convert it to whatever format you preferred?

It seems like the important thing is to capture structured data
(whatever the format) to disk first.

Jeff


Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler




On 8/13/2018 6:41 PM, Junio C Hamano wrote:

Jeff King  writes:


I can buy the argument that it's nice to have some form of profiling
that works everywhere, even if it's lowest-common-denominator. I just
wonder if we could be investing effort into tooling around existing
solutions that will end up more powerful and flexible in the long run.


Another thing I noticed is that the codepaths we would find
interesting to annotate with trace_performance_* stuff often
overlaps with the "slog" thing.  If the latter aims to eventually
replace GIT_TRACE (and if not, I suspect there is not much point
adding it in the first place), perhaps we can extend it to also
cover the need of these trace_performance_* calls, so that we do not
have to carry three different tracing mechanisms.



I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.
And I would like to have a set of nested messages like Duy has proposed
be a part of that.

In an independent effort I've found the nested messages being very
helpful in certain contexts.  They are not a replacement for the
various platform tools, like PerfView and friends as discussed earlier
on this thread, but then again I can ask a customer to turn a knob and
run it again and send me the output and hopefully get a rough idea of
the problem -- without having them install a bunch of perf tools.

Jeff



Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jeff Hostetler




On 8/13/2018 2:14 PM, Matthew DeVore wrote:

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
  Documentation/rev-list-options.txt |  3 ++
  list-objects-filter-options.c  |  4 +++
  list-objects-filter-options.h  |  1 +
  list-objects-filter.c  | 50 ++
  t/t5317-pack-objects-filter-objects.sh | 27 ++
  t/t5616-partial-clone.sh   | 27 ++
  t/t6112-rev-list-filters-objects.sh| 13 +++
  7 files changed, 125 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..9e351ec2a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,9 @@ specification contained in .
A debug option to help with future "partial clone" development.
This option specifies how missing objects are handled.
  +
+The form '--filter=tree:' omits all blobs and trees deeper than
+ from the root tree. Currently, only =0 is supported.
++
  The form '--missing=error' requests that rev-list stop with an error if
  a missing object is encountered.  This is the default action.
  +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
return 0;
}
  
+	} else if (!strcmp(arg, "tree:0")) {

+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
return d;
  }
  
+/*

+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   die("unknown filter_situation");
+   return LOFR_ZERO;
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, >oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}


There are a couple of options here:
[] If really want to omit all trees and blobs (and we DO NOT want
   the oidset of everything omitted), then we might be able to
   shortcut the traversal and speed things up.

   {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
   {} test this bit process_tree() and avoid the init_tree_desc() and
  the while loop and some adjacent setup/tear-down code.
   {} make this filter something like:

case LOFS_BEGIN_TREE:
if (filter_data->omits) {
oidset_insert(filter_data->omits, >oid);
return LOFR_MARK_SEEN; /* ... (hard omit) */
} else

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Jeff Hostetler




On 8/13/2018 3:02 PM, Johannes Sixt wrote:

The Windows CRT implements O_APPEND "manually": on write() calls, the
file pointer is set to EOF before the data is written. Clearly, this is
not atomic. And in fact, this is the root cause of failures observed in
t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where
different processes write to the same trace file simultanously; it also
occurred in t5400-send-pack.sh, but there it was worked around in
71406ed4d6 ("t5400: avoid concurrent writes into a trace file",
2017-05-18).

Fortunately, Windows does support atomic O_APPEND semantics using the
file access mode FILE_APPEND_DATA. Provide an implementation that does.

This implementation is minimal in such a way that it only implements
the open modes that are actually used in the Git code base. Emulation
for other modes can be added as necessary later. To become aware of
the necessity early, the unusal error ENOSYS is reported if an
unsupported mode is encountered.

Diagnosed-by: Johannes Schindelin 
Helped-by: Jeff Hostetler 
Signed-off-by: Johannes Sixt 
---
  compat/mingw.c | 41 +++--
  1 file changed, 39 insertions(+), 2 deletions(-)



[...]

This looks good.  Thanks for following up on this.

Jeff


Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none

2018-08-13 Thread Jeff Hostetler




On 8/10/2018 7:06 PM, Matthew DeVore wrote:

Teach list-objects the "tree:none" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:none - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

Signed-off-by: Matthew DeVore 
---
  Documentation/rev-list-options.txt |  2 ++
  list-objects-filter-options.c  |  4 +++
  list-objects-filter-options.h  |  1 +
  list-objects-filter.c  | 49 +++---
  t/t5317-pack-objects-filter-objects.sh | 27 ++
  t/t5616-partial-clone.sh   | 27 ++
  t/t6112-rev-list-filters-objects.sh| 13 +++
  7 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..68b4b9552 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,8 @@ specification contained in .
A debug option to help with future "partial clone" development.
This option specifies how missing objects are handled.
  +
+The form '--filter=tree:none' omits all blobs and trees.
++
  The form '--missing=error' requests that rev-list stop with an error if
  a missing object is encountered.  This is the default action.
  +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..523cb00a0 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
return 0;
}
  
+	} else if (!strcmp(arg, "tree:none")) {

+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..22c894093 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -26,38 +26,45 @@
  #define FILTER_SHOWN_BUT_REVISIT (1<<21)
  
  /*

- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
+ * A filter for list-objects to omit ALL blobs from the traversal, and possibly
+ * trees as well.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
   */
-struct filter_blobs_none_data {
+struct filter_none_of_type_data {
+   /* blobs are always omitted */
+   unsigned omit_trees : 1;
struct oidset *omits;
  };
  


I'm not sure I'd convert the existing filter types.
When I created this file, I created a set of function pairs
for each filter type:
filter_() and filter___init()

with the latter being added to the s_filters[] array and created
a choice enum having corresponding values
LOFC_

Here you're adding a new _init() and LOFC_ key, but mapping both
the original "blob:none" and the new "tree:none" to a combined
filter function and blends these 2 modes.

Style-wise, I'd keep the original filters as they were and add a
new function pair for the new tree:none filter.  Then you can
simplify the logic inside your new filter.  For example, in your
filter "filter_data->omit_trees" will always be true, so you can
just do the "if (filter_data->omits) oidset_insert(...); return _SEEN"
and not have the fallthru stuff -- or get rid of the asserts() and put
the case labels together.

One of the things I wanted to do (when I found some free time) was to
add a "tree:none" and maybe a "tree:root" filter.  (The latter only
including the root trees associated with the fetched commits, since
there are/were some places where we implicitly also load the root tree
when loading the commit object.)  So in that vein, it might be that we
would want a "tree:" filter instead with 0 = none and 1 = root.
I wasn't ready to propose that when I did the filtering, but I had that
in mind.  (And is partially why I suggest keeping your new filter
independent of the existing ones.)

Jeff



-static enum list_objects_filter_result filter_blobs_none(
+static enum list_objects_filter_result filter_none_of_type(
 

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

2018-08-13 Thread Jeff Hostetler




On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote:

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

This patch is tested with vim-colorschemes repository on a JFS partition
with case insensitive support on Linux. This repository has two files
darkBlue.vim and darkblue.vim.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  v4 removes nr_duplicates (and fixes that false warning Szeder
  reported). It also hints about case insensitivity as a cause of
  problem because it's most likely the case when this warning shows up.

  builtin/clone.c  |  1 +
  cache.h  |  1 +
  entry.c  | 28 
  t/t5601-clone.sh |  8 +++-
  unpack-trees.c   | 28 
  unpack-trees.h   |  1 +
  6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = _index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
  };
  #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..c70340df8e 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
  }
  
+static void mark_colliding_entries(const struct checkout *state,

+  struct cache_entry *ce, struct stat *st)
+{
+   int i;
+
+   ce->ce_flags |= CE_MATCHED;
+
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if (dup->ce_stat_data.sd_ino == st->st_ino) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+#endif
+}
+
  /*
   * Write the contents from ce out to the working tree.
   *
@@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
  
+		if (state->clone)

+   mark_colliding_entries(state, ce, );
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
  '
  
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '

+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
  partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..443df048ef 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o)
state.refresh_cache = 1;
state.istate = index;
  
+	if (o->clone) {

+   state.clone = 1;
+   for (i = 0; i < index->cache_nr; i++)
+   index->cache[i]->ce_flags &= ~CE_MATCHED;
+   }
+
progress = get_progress(o);
  
  	if (o->update)

@@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o)
errs |= finish_delayed_checkout();
if (o->update)

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:51 PM, Jeff Hostetler wrote:



On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs 
a lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the 
same

file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I 
looked for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 



It is basically the same as Peff suggests: log only one side of the 
fetch.


As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need 
some sort of locking on Windows. Unless your friends at Microsoft have 
an ace in their sleeves that lets us have atomic O_APPEND the POSIX 
way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)


d'oh

s/GENERIC_READ/GENERIC_WRITE/



Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
 HANDLE h = CreateFileW(L"foo.txt",
     FILE_APPEND_DATA,
     FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
     NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

 const char * buf = "this is a test\n";
 DWORD len = strlen(buf);
 DWORD nbw;

 WriteFile(h, buf, len, , NULL);
 CloseHandle(h);

 return 0;
}


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a 
lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I looked 
for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742

It is basically the same as Peff suggests: log only one side of the fetch.

As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need some 
sort of locking on Windows. Unless your friends at Microsoft have an ace 
in their sleeves that lets us have atomic O_APPEND the POSIX way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)

Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
HANDLE h = CreateFileW(L"foo.txt",
FILE_APPEND_DATA,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

const char * buf = "this is a test\n";
DWORD len = strlen(buf);
DWORD nbw;

WriteFile(h, buf, len, , NULL);
CloseHandle(h);

return 0;
}


Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

2018-08-09 Thread Jeff Hostetler




On 8/9/2018 10:23 AM, Jeff King wrote:

On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote:


If we have an equivalence-class hashmap and feed it inodes (or again,
some system equivalent) as the keys, we should get buckets of
collisions.


I guess one way to get "some system equivalent" that can be used as
the last resort, when there absolutely is no inum equivalent, is to
rehash the working tree file that shouldn't be there when we detect
a collision.

If we found that there is something when we tried to write out
"Foo.txt", if we open "Foo.txt" on the working tree and hash-object
it, we should find the matching blob somewhere in the index _before_
"Foo.txt".  On a case-insensitive filesytem, it may well be
"foo.txt", but we do not even have to know "foo.txt" and "Foo.txt"
only differ in case.


Clever. You might still run into false positives when there is
duplicated content in the repository (especially, say, zero-length
files).  But the fact that you only do the hashing on known duplicates
helps with that.



I worry that the false positives make this a non-starter.  I mean, if
clone creates files 'A' and 'B' (both equal) and then tries to create
'b', would the collision code reports that 'b' collided with 'A' because
that was the first OID match?  Ideally with this scheme we'd have to
search the entire index prior to 'b' and then report that 'b' collided
with either 'A' or 'B'.  Neither message instills confidence.  And
there's no way to prefer answer 'B' over 'A' without using knowledge
of the FS name mangling/aliasing rules -- unless we want to just assume
ignore-case for this iteration.


One of the things I did like about the equivalence-class approach is
that it can be done in a single linear pass in the worst case. Whereas
anything that searches when we see a collision is quite likely to be
quadratic. But as I said before, it may not be worth worrying too much
about that for an error code path where we expect the number of
collisions to be small.

-Peff



Sorry to be paranoid, but I have an index with 3.5M entries, the word
"quadratic" rings all kinds of alarm bells for me.  :-)

Granted, we expect the number of collisions to be small, but searching
back for each collision over the already-populated portion of the index
could be expensive.

Jeff



Re: [PATCH v1 01/25] structured-logging: design document

2018-08-09 Thread Jeff Hostetler




On 8/3/2018 11:26 AM, Ben Peart wrote:



On 7/13/2018 12:55 PM, g...@jeffhostetler.com wrote:

From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
  Documentation/technical/structured-logging.txt | 816 
+

  1 file changed, 816 insertions(+)
  create mode 100644 Documentation/technical/structured-logging.txt

diff --git a/Documentation/technical/structured-logging.txt 
b/Documentation/technical/structured-logging.txt

new file mode 100644
index 000..794c614
--- /dev/null
+++ b/Documentation/technical/structured-logging.txt
@@ -0,0 +1,816 @@
+Structured Logging
+==
+
+Structured Logging (SLOG) is an optional feature to allow Git to
+generate structured log data for executed commands.  This includes
+command line arguments, command run times, error codes and messages,
+child process information, time spent in various critical functions,
+and repository data-shape information.  Data is written to a target
+log file in JSON[1,2,3] format.
+
+SLOG is disabled by default.  Several steps are required to enable it:
+
+1. Add the compile-time flag "STRUCTURED_LOGGING=1" when building git
+   to include the SLOG routines in the git executable.
+


Is the intent to remove this compile-time flag before this is merged? 
With it off by default in builds, the audience for this is limited to 
those who build their own/custom versions of git. I can see other 
organizations wanting to use this that don't have a custom fork of git 
they build and install on their users machines.


Like the GIT_TRACE mechanism today, I think this should be compiled in 
but turned off via the default settings by default.


I would like to get rid of this compile-time flag and just have it
be available to those who want to use it.  And defaulted to off.
But I wasn't sure what kind of reaction or level of interest this
feature would receive from the mailing list.



+2. Set "slog.*" config settings[5] to enable SLOG in your repo.
+
+
+Motivation
+==
+
+Git users may be faced with scenarios that are surprisingly slow or
+produce unexpected results.  And Git developers may have difficulty
+reproducing these experiences.  Structured logging allows users to
+provide developers with additional usage, performance and error data
+that can help diagnose and debug issues.
+
+Many Git hosting providers and users with many developers have bespoke
+efforts to help troubleshoot problems; for example, command wrappers,
+custom pre- and post-command hooks, and custom instrumentation of Git
+code.  These are inefficient and/or difficult to maintain.  The goal
+of SLOG is to provide this data as efficiently as possible.
+
+And having structured rather than free format log data, will help
+developers with their analysis.
+
+
+Background (Git Merge 2018 Barcelona)
+=
+
+Performance and/or error logging was discussed during the contributor's
+summit in Barcelona.  Here are the relevant notes from the meeting
+minutes[6].
+
+> Performance misc (Ævar)
+> ---
+> [...]
+>  - central error reporting for git
+>    - `git status` logging
+>    - git config that collects data, pushes to known endpoint with 
`git push`

+>    - pre_command and post_command hooks, for logs
+>    - `gvfs diagnose` that looks at packfiles, etc
+>    - detect BSODs, etc
+>    - Dropbox writes out json with index properties and command-line
+>    information for status/fetch/push, fork/execs external tool 
to upload

+>    - windows trace facility; would be nice to have cross-platform
+>    - would hosting providers care?
+>    - zipfile of logs to give when debugging
+>    - sanitizing data is harder
+>    - more in a company setting
+>    - fileshare to upload zipfile
+>    - most of the errors are proxy when they shouldn't, wrong proxy, 
proxy

+>    specific to particular URL; so upload endpoint wouldn't work
+>    - GIT_TRACE is supposed to be that (for proxy)
+>    - but we need more trace variables
+>    - series to make tracing cheaper
+>    - except that curl selects the proxy
+>    - trace should have an API, so it can call an executable
+>    - dump to .git/traces/... and everything else happens externally
+>    - tools like visual studio can't set GIT_TRACE, so
+>    - sourcetree has seen user environments where commands just take 
forever
+>    - third-party tools like perf/strace - could we be better 
leveraging those?

+>    - distribute turn-key solution to handout to collect more data?
+


While it makes sense to have clear goals in the design document, the 
motivation and background sections feel somehow out of place.  I'd 
recommend you clearly articulate the design goals and drop the 
background data that led you to the goals.


good point.  thanks.



+
+A Quick Example
+===
+
+Note: JSON pretty-printing is enabled in all of the exa

Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

2018-08-08 Thread Jeff Hostetler




On 8/7/2018 3:31 PM, Junio C Hamano wrote:

Nguyễn Thái Ngọc Duy   writes:


  One nice thing about this is we don't need platform specific code for
  detecting the duplicate entries. I think ce_match_stat() works even
  on Windows. And it's now equally expensive on all platforms :D


ce_match_stat() may not be a very good measure to see if two paths
refer to the same file, though.  After a fresh checkout, I would not
be surprised if two completely unrelated paths have the same size
and have same mtime/ctime.  In its original use case, i.e. "I have
one specific path in mind and took a snapshot of its metadata
earlier.  Is it still the same, or has it been touched?", that may
be sufficient to detect that the path has been _modified_, but
without reliable inum, it may be a poor measure to say two paths
refer to the same.


I agree with Junio on this one.  The mtime values are sloppy at best.
On FAT file systems, they have 2 second resolution.  Even NTFS IIRC
has only 100ns resolution, so we might get a lot of false matches
using this technique, right?

It might be better to build an equivalence-class hash-map for the
colliding entries.  Compute a "normalized" version of the pathname
(such as convert to lowercase, strip final-dots/spaces, strip the
digits following tilda of a shortname, and etc for the MAC's UTF-isms).
Then when you rescan the index entries to find the matches, apply the
equivalence operator on the pathname and do the hashmap lookup.
When you find a match, you have a "potential" collider pair (I say
potential only because of the ambiguity of shortnames).  Then we
can use inum/file-index/whatever to see if they actually collide.


Jeff


Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems

2018-08-05 Thread Jeff Hostetler




On 8/3/2018 2:53 PM, Jeff King wrote:

On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote:


Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
maybe it's simply impossible. I don't know much about Windows. Some
searching implies that NTFS does have a "file index" concept which is
supposed to be unique.


This is hard and/or expensive on Windows.  Yes, you can get the
"file index" values for an open file handle with a cost similar to
an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
to the opendir/readdir routines), don't give you that data.  So we'd
have to scan the directory and then open and stat each file.  This is
terribly expensive on Windows -- and the reason we have the fscache
layer (in the GfW version) to intercept the lstat() calls whenever
possible.


I think that high cost might be OK for our purposes here. This code
would _only_ kick in during a clone, and then only on the error path
once we knew we had a collision during the checkout step.



Good point.

I've confirmed that the "file index" values can be used to determine
whether 2 path names are equivalent under NTFS for case variation,
final-dot/space, and short-names vs long-names.

I ran out of time this morning to search the directory for equivalent 
paths.  I'll look at that shortly.


Jeff


Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems

2018-08-03 Thread Jeff Hostetler




On 8/2/2018 5:28 PM, Jeff King wrote:

On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote:


Jeff King  writes:


I also wonder if Windows could return some other file-unique identifier
that would work in place of an inode here. That would be pretty easy to
swap in via an #ifdef's helper function. I'd be OK shipping without that
and letting Windows folks fill it in later (as long as we do not do
anything too stupid until then, like claim all of the inode==0 files are
the same).


Yeah, but such a useful file-unique identifier would probably be
used in place of inum in their (l)stat emulation already, if exists,
no?


Maybe. It might not work as ino_t. Or it might be expensive to get.  Or
maybe it's simply impossible. I don't know much about Windows. Some
searching implies that NTFS does have a "file index" concept which is
supposed to be unique.


This is hard and/or expensive on Windows.  Yes, you can get the
"file index" values for an open file handle with a cost similar to
an fstat().  Unfortunately, the FindFirst/FindNext routines (equivalent
to the opendir/readdir routines), don't give you that data.  So we'd
have to scan the directory and then open and stat each file.  This is
terribly expensive on Windows -- and the reason we have the fscache
layer (in the GfW version) to intercept the lstat() calls whenever
possible.

It might be possible to use the NTFS Master File Table to discover
this (very big handwave), but I would need to do a little digging.

This would all be NTFS specific.  FAT and other volume types would not
be covered.

Another thing to keep in mind is that the collision could be because
of case folding (or other such nonsense) on a directory in the path.
I mean, if someone on Linux builds a commit containing:

a/b/c/D/e/foo.txt
a/b/c/d/e/foo.txt

we'll get a similar collision as if one of them were spelled "FOO.txt".

Also, do we need to worry about hard-links or symlinks here?
If checkout populates symlinks, then you might have another collision
opportunity.  For example:

a/b/c/D/e/foo.txt
a/link -> ./b/c/d
a/link/e/foo.txt

Also, some platforms (like the Mac) allow directory hard-links.
Granted, Git doesn't create hard-links during checkout, but the
user might.

I'm sure there are other edge cases here that make reporting
difficult; these are just a few I thought of.  I guess what I'm
trying to say is that as a first step just report that you found
a collision -- without trying to identify the set existing objects
that it collided with.



At any rate, until we have an actual plan for Windows, I think it would
make sense only to split the cases into "has working inodes" and
"other", and make sure "other" does something sensible in the meantime
(like mention the conflict, but skip trying to list duplicates).


Yes, this should be split.  Do the "easy" Linux version first.
Keep in mind that there may also be a different solution for the Mac.


When somebody wants to work on Windows support, then we can figure out
if it just needs to wrap the "get unique identifier" operation, or if it
would use a totally different algorithm.

-Peff



Jeff


Re: Git clone and case sensitivity

2018-07-31 Thread Jeff Hostetler




On 7/29/2018 5:28 AM, Jeff King wrote:

On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote:


strcasecmp() will only catch a subset of the cases. We really need to
follow the same folding rules that the filesystem would.


True. But that's how we handle case insensitivity internally. If a
filesytem has more sophisticated folding rules then git will not work
well on that one anyway.


Hrm. Yeah, I guess that's the best we can do for the actual in-memory
checks. Everything else depends on doing an actual filesystem operation,
and our icase stuff kicks in way before then. I was mostly thinking of
HFS+ utf8 normalization weirdness, but I guess people are accustomed to
that by now.


For the case of clone, I actually wonder if we could detect during the
checkout step that a file already exists. Since we know that the
directory we started with was empty, then if it does, either:

   - there's some funny case-folding going on that means two paths in the
 repository map to the same name in the filesystem; or

   - somebody else is writing to the directory at the same time as us


This is exactly what my first patch does (minus the sparse checkout
part).


Right, sorry, I should have read that one more carefully.


But without knowing the exact folding rules, I don't think we can
locate this "somebody else" who wrote the first path. So if N paths
are treated the same by this filesystem, we could only report N-1 of
them.

If we want to report just one path when this happens though, then this
works quite well.


Hmm. Since most such systems are case-preserving, would it be possible
to report the name of the existing file? Doing it via opendir/readdir is
hacky, and anyway puts the burden on us to find the matching name. Doing
it via fstat() on the opened file doesn't work because at that the
filesystem has resolved the name to an inode.

So yeah, perhaps strcasecmp() is the best we can do (I do agree that
being able to mention all of the conflicting names is a benefit).

I guess we should be using fspathcmp(), though, in case it later learns
to be smarter.

-Peff



As has already been mentioned, this gets into weird territory really
fast, between case folding, final space/dot on windows, utf8 NFC/NFD
weirdness on the mac, utf8 invisible chars on the mac, long/short names
on windows, and etc.

And that's just for filenames.  Things really get weird if directory
names have these ambiguities.

Perhaps just print the problematic paths (where the collision is
detected) and let the user decide how to correct them.

Perhaps we could have a separate tool that could scan the index or
commit for potential conflicts and warn them in advance (granted, it
might not be perfect and may report a few false positives).

Forcing them into a sparse-checkout situation might be over their
skill level.

Jeff


Re: [PATCH v1 03/25] structured-logging: add structured logging framework

2018-07-27 Thread Jeff Hostetler




On 7/26/2018 5:09 AM, SZEDER Gábor wrote:



+void slog_set_command_name(const char *command_name)
+{
+   /*
+* Capture the command name even if logging is not enabled
+* because we don't know if the config has been loaded yet by
+* the cmd_() and/or it may be too early to force a
+* lazy load.
+*/
+   if (my__command_name)
+   free(my__command_name);
+   my__command_name = xstrdup(command_name);
+}
+
+void slog_set_sub_command_name(const char *sub_command_name)
+{
+   /*
+* Capture the sub-command name even if logging is not enabled
+* because we don't know if the config has been loaded yet by
+* the cmd_() and/or it may be too early to force a
+* lazy load.
+*/
+   if (my__sub_command_name)
+   free(my__sub_command_name);


Please drop the condition in these two functions; free() handles NULL
arguments just fine.


sure.



(Sidenote: what's the deal with these 'my__' prefixes anyway?)



simply a way to identify file-scope variables and distinguish
them from local variables.

Jeff


Re: [PATCH v1] config.c: fix msvc compile error

2018-07-25 Thread Jeff Hostetler




On 7/24/2018 3:56 PM, Taylor Blau wrote:

On Tue, Jul 24, 2018 at 03:30:10PM +, g...@jeffhostetler.com wrote:

From: Jeff Hostetler 

In commit fb0dc3bac135e9f6243bd6d293e8c9293c73b9cd code was added
to builtin/config.c to define a new function and a forward declaration
for an array of unknown size.  This causes a compile error under MSVC.


Thanks for spending time fixing that. fb0dc3bac1 (builtin/config.c:
support `--type=` as preferred alias for `--`, 2018-04-18)
is from me, so I owe you an extra thanks for patching up mistakes :-).

As others have noted in this thread, another patch was sent into similar
effect, which has already been queued, and I agree that we should prefer
that, since it's further along.


Thanks,
Taylor



Yes, the other version is further along.  Let's take it.
Jeff


Re: [PATCH v1] msvc: fix non-standard escape sequence in source

2018-07-25 Thread Jeff Hostetler




On 7/24/2018 2:13 PM, Beat Bolli wrote:

Hi Jeff

On 24.07.18 16:42, g...@jeffhostetler.com wrote:

From: Jeff Hostetler 

Replace non-standard "\e" escape sequence with "\x1B".


This was already fixed in <20180708144342.11922-4-dev+...@drbeat.li>.

Cheers,
Beat



Thanks for the pointer.  I see that commit is in 'next'.
I was only looking in 'master'.

Thanks,
Jeff



Re: Git 2.18: RUNTIME_PREFIX... is it working?

2018-07-20 Thread Jeff Hostetler




On 7/19/2018 6:26 PM, brian m. carlson wrote:

On Wed, Jul 18, 2018 at 02:18:44PM +0200, Johannes Schindelin wrote:

On Sat, 14 Jul 2018, brian m. carlson wrote:

I will say that at cPanel, we have a configuration where end users can
end up inside a mount namespace without /proc (depending on the
preferences of the administrator).  However, it's easy enough for us to
simply build without RUNTIME_PREFIX if necessary.

If we turn it on by default, it would be nice if we documented (maybe in
the Makefile) that it requires /proc on Linux for the benefit of other
people who might be in a similar situation.


Is there *really* no other way on Linux to figure out the absolute path of
the current executable than to open a pseudo file in the `/proc` file
system?


Nope, not that I'm aware of.  You have to read the destination of
the /proc/PID/exe symlink.



Getting the full path of the current executable is a very Windows thing.
On most Unix-based systems it just isn't possible (even if Linux does
have the /proc thing).  Think about hard-links, for example.  There just
isn't a single canonical pathname for an inode.

Jeff


Re: [RFC] push: add documentation on push v2

2018-07-20 Thread Jeff Hostetler




On 7/18/2018 1:15 PM, Brandon Williams wrote:

On 07/18, Stefan Beller wrote:

On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee  wrote:


On 7/17/2018 7:25 PM, Stefan Beller wrote:

On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams  wrote:

Signed-off-by: Brandon Williams 
---

Since introducing protocol v2 and enabling fetch I've been thinking
about what its inverse 'push' would look like.  After talking with a
number of people I have a longish list of things that could be done to
improve push and I think I've been able to distill the core features we
want in push v2.

It would be nice to know which things you want to improve.


Hopefully we can also get others to chime in with things they don't like
about the existing protocol. What pain points exist, and what can we do
to improve at the transport layer before considering new functionality?


Another thing that I realized last night was the possibility to chunk requests.
The web of today is driven by lots of small http(s) requests. I know our server
team fights with the internal tools all the time because the communication
involved in git-fetch is usually a large http request (large packfile).
So it would be nice to have the possibility of chunking the request.
But I think that can be added as a capability? (Not sure how)


Fetch and push requests/responses are already "chunked" when using the
http transport.  So I'm not sure what you mean by adding a capability
because the protocol doesn't care about which transport you're using.
This is of course unless you're talking about a different "chunking"
from what it means to chunk an http request/response.



Internally, we've talked about wanting to have resumable pushes and
fetches.  I realize this is difficult to do when the server is
replicated and the repeated request might be talking to a different
server instance.  And there's a problem with temp files littering the
server as it waits for the repeated attempt.  But still, the packfile
sent/received can be large and connections do get dropped.

That is, if we think about sending 1 large packfile and just using a
byte-range-like approach to resuming the transfer.

If we allowed the request to send a series of packfiles, with each
"chunk" being self-contained and usable.  So if a push connection was
dropped the server could apply the successfully received packfile(s)
(add the received objects and update the refs to the commits received so
far).  And ignore the interrupted and unreceived packfile(s) and let the
client retry later.  When/if the client retried the push, it would
renegotiate haves/wants and send a new series of packfile(s).  With the
assumption being that the server would have updated refs from the
earlier aborted push, so the packfile(s) computed for the second attempt
would not repeat the content successfully transmitted in the first
attempt.

This would require that the client build an ordered set of packfiles
from oldest to newest so that the server can apply them in-order and
the graph remain connected.  That may be outside your scope here.

Also, we might have to add a few messages to the protocol after the
negotiation, for the client to say that it is going to send the push
content in 'n' packfiles and send 'n' messages with the intermediate
ref values being updated in each packfile.

Just thinking out loud here.
Jeff


Re: [PATCH v1 00/25] RFC: structured logging

2018-07-16 Thread Jeff Hostetler




On 7/13/2018 2:51 PM, David Lang wrote:
Please make an option for git to write these logs to syslog, not just a 
local file. Every modern syslog daemon has lots of tools to be able to 
deal with json messages well.


David Lang


That is certainly possible and we can easily add it in a later draft,
but for now I'd like to stay platform-neutral and just log events to a
file.

My main goal right now is to get consensus on the basic structured 
logging framework -- the shape of the SLOG API, the event message

format, and etc.

Thanks,
Jeff


Re: [PATCH v8 2/2] json-writer: t0019: add perl unit test

2018-06-11 Thread Jeff Hostetler




On 6/7/2018 1:13 PM, Eric Sunshine wrote:

On Thu, Jun 7, 2018 at 10:12 AM,   wrote:

Test json-writer output using perl script.

Signed-off-by: Jeff Hostetler 
---
diff --git a/t/t0019/parse_json.perl b/t/t0019/parse_json.perl
@@ -0,0 +1,52 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+use JSON;


This new script is going to have to be protected by a prerequisite
since the JSON module is not part of the standard Perl installation,
thus will not necessarily be installed everywhere (it isn't on any of
my machines, for instance). Something like:

 test_lazy_prereq PERLJSON '
 perl -MJSON -e "exit 0"
 '

which would be used like this:

 test_expect_success PERLJSON 'parse JSON using Perl' '
 ...
 '



will do.  thanks!
Jeff


Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-11 Thread Jeff Hostetler




On 6/8/2018 4:07 PM, René Scharfe wrote:

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:

From: Jeff Hostetler 

[...]

+   if (jw->first_stack.buf[jw->first_stack.len - 1] == '1')
+   jw->first_stack.buf[jw->first_stack.len - 1] = '0';
+   else
+   strbuf_addch(>json, ',');


You only need a binary flag to indicate if a comma is needed, not a
stack.  We need a comma at the current level if and only if we already
wrote a child item.  If we finish a level then we do need a comma at the
previous level because we just wrote a sub-object or sub-array there.
And that should cover all cases.  Am I missing something?

I get a sense of déjà vu, by the way. :)

[...]

Yes, your way is simpler.  I'll update and re-roll.
(And yes, we did discuss this earlier.  I found a problem with
my first version where it wouldn't handle empty sub-items, but
you found the missing piece.)

Thanks
Jeff




Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-11 Thread Jeff Hostetler




On 6/8/2018 2:05 AM, René Scharfe wrote:

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:

From: Jeff Hostetler 

Add a series of jw_ routines and "struct json_writer" structure to compose

[...]

  TEST_PROGRAMS_NEED_X += test-index-version
+TEST_PROGRAMS_NEED_X += test-json-writer>  TEST_PROGRAMS_NEED_X += 
test-lazy-init-name-hash
  TEST_PROGRAMS_NEED_X += test-line-buffer
  TEST_PROGRAMS_NEED_X += test-match-trees


This doesn't apply cleanly on master, because most test helpers have
been combined into a single binary to reduce their disk footprint and
link times.  See efd71f8913 (t/helper: add an empty test-tool program)
for the overall rationale.

test-json-writer could become a built-in as well, I think.  You can see
e.g. in c932a5ff28 (t/helper: merge test-string-list into test-tool)
what needs to be done to convert a helper.

René



You're right, the test helper framework changed since I started this
patch series.  I was trying to keep the same parent commit as my V1
to make it easier to compare, but that's not working out so well.

I'll move it forward to the current master and fix it up.

Thanks,
Jeff


Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-11 Thread Jeff Hostetler




On 6/8/2018 4:32 PM, René Scharfe wrote:

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:

   Makefile|   2 +
   json-writer.c   | 419 
   json-writer.h   | 113 +
   t/helper/test-json-writer.c | 572 

   t/t0019-json-writer.sh  | 236 ++
   5 files changed, 1342 insertions(+)
   create mode 100644 json-writer.c
   create mode 100644 json-writer.h
   create mode 100644 t/helper/test-json-writer.c
   create mode 100755 t/t0019-json-writer.sh


$ git grep 'static inline' '*json*'
json-writer.c:static inline void indent_pretty(struct json_writer *jw)
json-writer.c:static inline void begin(struct json_writer *jw, char ch_open, 
int pretty)
json-writer.c:static inline void assert_in_object(const struct json_writer *jw, 
const char *key)
json-writer.c:static inline void assert_in_array(const struct json_writer *jw)
json-writer.c:static inline void maybe_add_comma(struct json_writer *jw)
json-writer.c:static inline void fmt_double(struct json_writer *jw, int 
precision,
json-writer.c:static inline void object_common(struct json_writer *jw, const 
char *key)
json-writer.c:static inline void array_common(struct json_writer *jw)
json-writer.c:static inline void assert_is_terminated(const struct json_writer 
*jw)
t/helper/test-json-writer.c:static inline int scripted(int argc, const char 
**argv)
t/helper/test-json-writer.c:static inline int my_usage(void)

Do you need all those inline keywords?  I'd rather leave the decision
about inlining to the compiler and (via optimization flags) the user
as much as possible.  Not a biggie, but the high frequency of that
word made me blink..

René



I was just trying to match the patterns I found elsewhere in the code.

$ grep "static inline" *.c builtin/*.c | wc -l
111

But yeah, it's no big deal.  I can remove them.

Jeff


Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-08 Thread Jeff Hostetler




On 6/7/2018 1:24 PM, Eric Sunshine wrote:

On Thu, Jun 7, 2018 at 10:12 AM,   wrote:

Add a series of jw_ routines and "struct json_writer" structure to compose
JSON data.  The resulting string data can then be output by commands wanting
to support a JSON output format.
[...]
Signed-off-by: Jeff Hostetler 
---
diff --git a/t/t0019-json-writer.sh b/t/t0019-json-writer.sh
@@ -0,0 +1,236 @@
+test_expect_success 'simple object' '
+   cat >expect <<-\EOF &&
+   {"a":"abc","b":42,"c":3.14,"d":true,"e":false,"f":null}
+   EOF
+   test-json-writer >actual \
+   @object \
+   @object-string a abc \
+   @object-int b 42 \
+   @object-double c 2 3.140 \
+   @object-true d \
+   @object-false e \
+   @object-null f \
+   @end &&
+   test_cmp expect actual
+'


To make it easier on people writing these tests, it might be nice for
this to be less noisy by getting rid of "@" and "\". To get rid of
"\", the test program could grab its script commands from stdin (one
instruction per line) rather than from argv[]. For instance:

 test-json-writer >actual <<-\EOF &&
 object
 object-string a abc
 ...
 end
 EOF

Not a big deal, and certainly not worth a re-roll.



I hadn't thought about doing it that way.  Might be a little easier
to use.  Let me take a look and see if it would be much work to switch.

Thanks
Jeff


Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff Hostetler




On 6/6/2018 5:03 PM, Jeff King wrote:

On Wed, Jun 06, 2018 at 01:10:52PM -0400, Todd Zullinger wrote:


g...@jeffhostetler.com wrote:

+# As a sanity check, ask Python to parse our generated JSON.  Let Python
+# recursively dump the resulting dictionary in sorted order.  Confirm that
+# that matches our expectations.
+test_expect_success PYTHON 'parse JSON using Python' '

[...]

+   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&


Would this be better using $PYTHON_PATH rather than
hard-coding python as the command?


Probably. We may want to go the same route as we did for perl in
a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH,
2013-10-28) so that test writers don't have to remember this.

That said, I wonder if it would be hard to simply do the python bits
here in perl. This is the first use of python in our test scripts (and
really the only user in the whole code base outside of a few fringe
commands). Leaving aside any perl vs python flame-war, I think there's
value in keeping the number of languages limited when there's not a
compelling reason to do otherwise.



Perl works too.  (I don't know perl, but I'm told it does work for
stuff like this. :-)

I'll take a stab at it.

Jeff





Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test

2018-06-06 Thread Jeff Hostetler




On 6/6/2018 1:10 PM, Todd Zullinger wrote:

g...@jeffhostetler.com wrote:

+# As a sanity check, ask Python to parse our generated JSON.  Let Python
+# recursively dump the resulting dictionary in sorted order.  Confirm that
+# that matches our expectations.
+test_expect_success PYTHON 'parse JSON using Python' '

[...]

+   python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual &&


Would this be better using $PYTHON_PATH rather than
hard-coding python as the command?



good catch!
thanks
Jeff


Re: [PATCH v4] json_writer: new routines to create data in JSON format

2018-06-04 Thread Jeff Hostetler




On 6/2/2018 1:13 AM, Jeff King wrote:

On Sat, Jun 02, 2018 at 06:41:06AM +0200, Duy Nguyen wrote:


On Mon, Mar 26, 2018 at 4:31 PM,   wrote:

+static inline void assert_in_array(const struct json_writer *jw)
+{
+   if (!jw->open_stack.len)
+   die("json-writer: array: missing jw_array_begin()");


When you reroll, please consider marking all these string for
translation with _(), unless these are for machine consumption.


Actually, these are all basically asserts, I think. So ideally they'd be
BUG()s and not translated.


Yes, these are basically asserts.  I'll convert them to BUG's and send
up another version this week.

Thanks
Jeff



Re: [PATCH v2 1/5] core.aheadbehind: add new config setting

2018-04-03 Thread Jeff Hostetler



On 4/3/2018 7:39 AM, Derrick Stolee wrote:

On 4/3/2018 6:18 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Apr 03 2018, Lars Schneider wrote:

What is the state of this series? I can't find it in git/git nor in
git-for-windows/git. I think Stolee mentioned the config in
his Git Merge talk [1] and I was about to test it/roll it out :-)

It's in the gvfs branch of g...@github.com:Microsoft/git.git, i.e. it's
not in Git for Windows, but used in Microsoft's own in-house version
used for Windows.git.


Thanks for adding me to CC. I mentioned it in my talk because that was one thing we 
shipped internally as a "quick fix" until we could do the right thing.

If I remember correctly, Jeff abandoned shipping this upstream because it did 
have the feel of a hack and we wanted to see if users used the config setting 
or really cared about the output values. We saw fast adoption of the feature 
and even turned the config setting on automatically in the following version of 
GVFS.


I may be misunderstanding this feature, but my impression was that it
was a kludge as a workaround until the commit graph code landed, because
once we have that then surely we can just cheaply report the actual (or
approximate?) number in the common case, but of course it may still be
slow if your commit graph file is out of date.


Right, the only thing in master are the changes to take the new
command line option and to alter the output of status.  We did not
reach consensus on the need for the config setting and/or whether
it should be in "core." or "status." or another namespace and/or
how it should work.

And yes, it was also seen as a hack (just turn it off) until the
client-side commit graph was ready (at least for interactive use).
Because there are callers that don't need the answer (regardless
of whether it is cheap to compute) and so the explicit command line
arg limitation is sufficient for them.


This part is in upstream master:
commit 4094e47fd2c49fcdbd0152d20ed4d610d72680d7
Merge: c710d182ea f39a757dd9
Author: Junio C Hamano <gits...@pobox.com>
Date:   Thu Mar 8 12:36:24 2018 -0800

Merge branch 'jh/status-no-ahead-behind'



These parts are in the 'gvfs' branch in the g...@github.com:Microsoft/git.git 
repo:

commit 039f65946968fa654a9c3bca27a4f4e93c1c9381
Author: Jeff Hostetler <jeffh...@microsoft.com>
Date:   Wed Jan 10 13:50:24 2018 -0500

status: add warning when a/b calculation takes too long for long/normal format


commit 0d6756f06d0ad6f1fdc8dba0ead7911e411c9704
Author: Jeff Hostetler <jeffh...@microsoft.com>
Date:   Mon Feb 5 09:44:04 2018 -0500

status: ignore status.aheadbehind in porcelain formats


Teach porcelain V[12] formats to ignore the status.aheadbehind
config setting. They only respect the --[no-]ahead-behind
command line argument.  This is for backwards compatibility
with existing scripts.

commit 0dd122d6cd43106a5928587d768a7381cfe9e7a3
Author: Jeff Hostetler <jeffh...@microsoft.com>
Date:   Tue Jan 9 14:16:07 2018 -0500

status: add status.aheadbehind setting

Add "status.aheadbehind" config setting to change the default

behavior of ALL git status formats.

Hope this helps,
Jeff




Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-30 Thread Jeff Hostetler



On 3/30/2018 4:38 PM, Junio C Hamano wrote:


* jh/json-writer (2018-03-28) 1 commit
  - json_writer: new routines to create data in JSON format

  Is this ready for 'next'?


Yes, I believe it is.  This has the V6 fixup for the HEREDOC
with leading whitespace, so I think we're good.

Thanks
Jeff



Re: [RFC/PATCH] upload-pack: disable object filtering when disabled by config

2018-03-29 Thread Jeff Hostetler



On 3/28/2018 6:12 PM, Junio C Hamano wrote:

Jonathan Nieder  writes:


When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.

That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.

This is particularly significant if a security bug is discovered in
this new experimental partial clone code.  Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.

Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.

NEEDSWORK: tests

Signed-off-by: Jonathan Nieder 
---
Noticed while reviewing the corresponding JGit code.

If this change seems like a good idea, I can add tests and re-send it
for real.


Yup.  The names of the static variables tell almost the whole story
to convince me why this is a good change ;-).  It is not about
advertising a feature alone, but if the feature is actually enabled,
so advertisement and handling of the feature should be either both
enabled or disabled.

Thanks.



I agree.  Thanks.


Re: [PATCH v4] json_writer: new routines to create data in JSON format

2018-03-27 Thread Jeff Hostetler



On 3/27/2018 11:45 AM, Ramsay Jones wrote:



On 27/03/18 04:18, Ramsay Jones wrote:

On 26/03/18 15:31, g...@jeffhostetler.com wrote:

From: Jeff Hostetler <jeffh...@microsoft.com>


[snip]

Thanks, this version fixes all issues I had (with the compilation
and sparse warnings).

[Was using UINT64_C(0x) a problem on windows?]


BTW, I forgot to mention that you had some whitespace problems
with this patch, viz:


I ran "make sparse" on this and it did not complain about any of this.

What command do you run to get these messages?
I know they appear as red in diffs (and I usually double check that),
but I had not seen a command to complain about them like this.



   $ git log --oneline -1 --check master-json
   ab643d838 (master-json) json_writer: new routines to create data in JSON 
format
   t/helper/test-json-writer.c:280: trailing whitespace.
   + */
   t/t0019-json-writer.sh:179: indent with spaces.
   +"g": 0,
   t/t0019-json-writer.sh:180: indent with spaces.
   +"h": 1
   $


the leading spaces are required in this case.
the pretty json output contains 8 spaces for that sub-structure not a tab.
is there a preferred way to denote this in the test script?

Jeff




Re: Fwd: New Defects reported by Coverity Scan for git

2018-03-27 Thread Jeff Hostetler



On 3/26/2018 7:39 PM, Stefan Beller wrote:

coverity scan failed for the last couple month (since Nov 20th)
without me noticing, I plan on running it again nightly for the
Git project.

Anyway, here are issues that piled up (in origin/pu) since then.

Stefan


-- Forwarded message --

[...]


*** CID 1433539:  Null pointer dereferences  (FORWARD_NULL)
/t/helper/test-json-writer.c: 278 in scripted()
272 struct json_writer jw = JSON_WRITER_INIT;
273 int k;
274
275 if (!strcmp(argv[0], "@object"))
276 jw_object_begin();
277 else if (!strcmp(argv[0], "@array"))

 CID 1433539:  Null pointer dereferences  (FORWARD_NULL)
 Passing "" to "jw_array_begin", which dereferences null "jw.levels".

278 jw_array_begin();
279 else
280 die("first script term must be '@object' or
'@array': '%s'", argv[0]);
281
282 for (k = 1; k < argc; k++) {
283 const char *a_k = argv[k];

** CID 1433538:  Null pointer dereferences  (FORWARD_NULL)



The "jw.levels" field has been removed in the json-writer V4 reroll,
so this isn't an issue going forward.

Thanks,
Jeff


Re: [PATCH v4] json_writer: new routines to create data in JSON format

2018-03-27 Thread Jeff Hostetler



On 3/26/2018 11:18 PM, Ramsay Jones wrote:

On 26/03/18 15:31, g...@jeffhostetler.com wrote:

From: Jeff Hostetler <jeffh...@microsoft.com>

[snip]

Thanks, this version fixes all issues I had (with the compilation
and sparse warnings).

[Was using UINT64_C(0x) a problem on windows?]


Thanks for the confirmation.

I was building on Linux.  I haven't tried using UINT64_C()
for anything, but I'll keep that in mind next time.

Thanks,
Jeff




Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-27 Thread Jeff Hostetler



On 3/26/2018 11:26 PM, Ramsay Jones wrote:

On 26/03/18 18:04, Junio C Hamano wrote:

Ramsay Jones  writes:

[...]

I must confess to not having given any thought to the wider
implications of the code. I don't really know what this code
is going to be used for. [Although I did shudder when I read
some mention of a 'universal interchange format' - I still
have nightmares about XML :-D ]

[...]

My current goals are to add telemetry in a friendly way and
have events written in JSON to some audit destination.
Something like:

{ "argv":["./git","status"],
  "pid":84941,
  "exit-code":0,
  "elapsed-time":0.011121,
  "version":"2.16.2.5.g71445db.dirty",
  ... }

Later, we could add a JSON formatter to a command like "status"
and then do things like:

$ git status --json | python '... json.load ...'

and eliminate the need to write custom parsers for normal
or porcelain formats.  There are other commands that could
be similarly adapted and save callers a lot of screen-scraping
code.  But that is later.

Thanks,
Jeff


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-27 Thread Jeff Hostetler



On 3/27/2018 1:07 AM, Junio C Hamano wrote:

Jeff Hostetler <g...@jeffhostetler.com> writes:

[...]

So I would think it is most sensible to have double, uintmax_t and
intmax_t variants.  If you do not care about the extra value range
that unsigned integral types afford, a single intmax_t variant would
also be fine.


I'll reroll with just the double and intmax_t variants.
Thanks for the feedback and sorry for all the noise.

Jeff



Re: Including object type and size in object id (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 5:00 PM, Jonathan Nieder wrote:

Jeff Hostetler wrote:
[long quote snipped]


While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.

If would be nice if we could extend the OID to include 6 bytes of data
(4 or 8 bits for the type and the rest for the raw object size), and
just say that an OID is a {hash,type,size} tuple.

There are lots of places where we open an object to see what type it is
or how big it is.  This requires uncompressing/undeltafying the object
(or at least decoding enough to get the header).  In the case of missing
objects (partial clone or a gvfs-like projection) it requires either
dynamically fetching the object or asking an object-size-server for the
data.

All of these cases could be eliminated if the type/size were available
in the OID.


This implies a limit on the object size (e.g. 5 bytes in your
example).  What happens when someone wants to encode an object larger
than that limit?


I could say add a full uint64 to the tail end of the hash, but
we currently don't handle blobs/objects larger then 4GB right now
anyway, right?

5 bytes for the size is just a compromise -- 1TB blobs would be
terrible to think about...
 


This also decreases the number of bits available for the hash, but
that shouldn't be a big issue.


I was suggesting extending the OIDs by 6 bytes while we are changing
the hash function.


Aside from those two, I don't see any downsides.  It would mean that
tree objects contain information about the sizes of blobs contained
there, which helps with virtual file systems.  It's also possible to
do that without putting the size in the object id, but maybe having it
in the object id is simpler.

Will think more about this.

Thanks for the idea,
Jonathan



Thanks
Jeff



Re: Git Merge contributor summit notes

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:56 PM, Stefan Beller wrote:

On Mon, Mar 26, 2018 at 10:33 AM Jeff Hostetler <g...@jeffhostetler.com>
wrote:




On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:


On Sat, Mar 10 2018, Alex Vandiver wrote:


New hash (Stefan, etc)
--
   - discussed on the mailing list
   - actual plan checked in to

Documentation/technical/hash-function-transition.txt

   - lots of work renaming
   - any actual work with the transition plan?
   - local conversion first; fetch/push have translation table
   - like git-svn
   - also modified pack and index format to have lookup/translation

efficiently

   - brian's series to eliminate SHA1 strings from the codebase
   - testsuite is not working well because hardcoded SHA1 values
   - flip a bit in the sha1 computation and see what breaks in the

testsuite

   - will also need a way to do the conversion itself; traverse and

write out new version

   - without that, can start new repos, but not work on old ones
   - on-disk formats will need to change -- something to keep in mind

with new index work

   - documentation describes packfile and index formats
   - what time frame are we talking?
   - public perception question
   - signing commits doesn't help (just signs commit object) unless you

"recursive sign"

   - switched to SHA1dc; we detect and reject known collision technique
   - do it now because it takes too long if we start when the collision

drops

   - always call it "new hash" to reduce bikeshedding
   - is translation table a backdoor? has it been reviewed by crypto

folks?

 - no, but everything gets translated
   - meant to avoid a flag day for entire repositories
   - linus can decide to upgrade to newhash; if pushes to server that

is not newhash aware, that's fine

   - will need a wire protocol change
   - v2 might add a capability for newhash
   - "now that you mention md5, it's a good idea"
   - can use md5 to test the conversion
   - is there a technical reason for why not /n/ hashes?
   - the slow step goes away as people converge to the new hash
   - beneficial to make up some fake hash function for testing
   - is there a plan on how we decide which hash function?
   - trust junio to merge commits when appropriate
   - conservancy committee explicitly does not make code decisions
   - waiting will just give better data
   - some hash functions are in silicon (e.g. microsoft cares)
   - any movement in libgit2 / jgit?
 - basic stuff for libgit2; same testsuite problems
 - no work in jgit
   - most optimistic forecast?
 - could be done in 1-2y
   - submodules with one hash function?
 - unable to convert project unless all submodules are converted
 - OO-ing is not a prereq


Late reply, but one thing I brought up at the time is that we'll want to
keep this code around even after the NewHash migration at least for
testing purposes, should we ever need to move to NewNewHash.

It occurred to me recently that once we have such a layer it could be
(ab)used with some relatively minor changes to do any arbitrary
local-to-remote object content translation, unless I've missed something
(but I just re-read hash-function-transition.txt now...).

E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
remote server so that you upload a GPG encrypted version of all your
blobs, and have your trees reference those blobs.

Because we'd be doing arbitrary translations for all of
commits/trees/blobs this could go further than other bolted-on
encryption solutions for Git. E.g. paths in trees could be encrypted
too, as well as all the content of the commit object that isn't parent
info & the like (but that would have different hashes).

Basically clean/smudge filters on steroids, but for every object in the
repo. Anyone who got a hold of it would still see the shape of the repo
& approximate content size, but other than that it wouldn't be more info
than they'd get via `fast-export --anonymize` now.

I mainly find it interesting because presents an intersection between a
feature we might want to offer anyway, and something that would stress
the hash transition codepath going forward, to make sure it hasn't all
bitrotted by the time we'll need NewHash->NewNewHash.

Git hosting providers would hate it, but they should probably be
charging users by how much Michael Haggerty's git-sizer tool hates their
repo anyway :)




While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.


This would allow to craft invalid OIDs, i.e. the correct hash value with
the wrong object type. (This is different field of "invalid" compared to
today, where we either have or do not have the object named by the
hash value. If we don't have it, it may be just unknown to us, but not
"wrong".)


An invalid OID (such as a wrong ob

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 2:00 PM, Junio C Hamano wrote:

Jeff Hostetler <g...@jeffhostetler.com> writes:


I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.


On a platform whose uintmax_t is u32, is it realistic to expect that
we would be able to use u64, even if we explicitly ask for it, in
the first place?

In other words, on a platform that handles uint64_t, I would expect
uintmax_t to be wide enough to hold an uint64_t value without
truncation.



I was just going by what the reported compiler error message was.
It said that "unsigned long" didn't match the uint64_t variable.
And that made me nervous.

If all of the platforms we build on define uintmax_t >= 64 bits,
then it doesn't matter.

If we do have a platform where uintmax_t is u32, then we'll have a
lot more breakage than in just the new function I added.

Thanks,
Jeff


Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:57 PM, Junio C Hamano wrote:

Jeff Hostetler <g...@jeffhostetler.com> writes:


I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.


Hmph, but the target format does not have different representation
of inttypes in different sizes, no?

I personally doubt that we would benefit from having a group of
functions (i.e. format_int{8,16,32,64}_to_json()) that callers have
to choose from, depending on the exact size of the integer they want
to serialize.  The de-serializing side would be the same story.

Even if the variable a potential caller of the formetter is a sized
type that is different from uintmax_t, the caller shouldn't have to
add an extra cast.

Am I missing some obvious merit for having these separate functions
for explicit sizes?



I did the uint64_t for the unsigned ns times.

I did the other one for the usual signed ints.

I could convert them both to a single signed 64 bit typed function
if we only want to have one function.

Jeff



Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 1:04 PM, Junio C Hamano wrote:

Ramsay Jones  writes:


@@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char 
*key, uint64_t value)
maybe_add_comma(jw);
  
  	append_quoted_string(>json, key);

-   strbuf_addf(>json, ":%"PRIuMAX, value);
+   strbuf_addf(>json, ":%"PRIu64, value);


In this code-base, that would normally be written as:

strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t) value);


heh, I should learn not to reply in a hurry, just before
going out ...

I had not noticed that 'value' was declared with an 'sized type'
of uint64_t, so using PRIu64 should be fine.


But why is this codepath using a sized type in the first place?  It
is not like it wants to read/write a fixed binary file format---it
just wants to use an integer type that is wide enough to handle any
inttype the platform uses, for which uintmax_t would be a more
appropriate type, no?



[Somehow the conversation forked and this compiler warning
appeared in both the json-writer and the rebase-interactive
threads.  I'm copying here the response that I already made
on the latter.]


I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.

My preference would be to change the PRIuMAX to PRIu64, but there
aren't any other references in the code to that symbol and I didn't
want to start a new trend here.

I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.

So while I'm not really worried about 128 bit integers right now, I'm
more concerned about 32 bit compilers truncating that value without any
warnings.

Jeff


Re: [RFC PATCH v2 1/1] json-writer: add cast to uintmax_t

2018-03-26 Thread Jeff Hostetler



On 3/24/2018 2:38 PM, Wink Saville wrote:

Correct a compile error on Mac OSX by adding a cast to uintmax_t
in calls to strbuf_addf.

Helped-by: Ramsay Jones 
Tested-by: travis-ci
Signed-off-by: Wink Saville 
---
  json-writer.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index 89a6abb57..1f40482ff 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char 
*key, uint64_t value)
maybe_add_comma(jw);
  
  	append_quoted_string(>json, key);

-   strbuf_addf(>json, ":%"PRIuMAX, value);
+   strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t)value);
  }
  
  void jw_object_double(struct json_writer *jw, const char *fmt,

@@ -225,7 +225,7 @@ void jw_array_uint64(struct json_writer *jw, uint64_t value)
assert_in_array(jw);
maybe_add_comma(jw);
  
-	strbuf_addf(>json, "%"PRIuMAX, value);

+   strbuf_addf(>json, "%"PRIuMAX, (uintmax_t)value);
  }
  
  void jw_array_double(struct json_writer *jw, const char *fmt, double value)




FYI.  I included and squashed this change into V4 of my json-writer series.

Jeff



Re: Git Merge contributor summit notes

2018-03-26 Thread Jeff Hostetler



On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote:


On Sat, Mar 10 2018, Alex Vandiver wrote:


New hash (Stefan, etc)
--
  - discussed on the mailing list
  - actual plan checked in to 
Documentation/technical/hash-function-transition.txt
  - lots of work renaming
  - any actual work with the transition plan?
  - local conversion first; fetch/push have translation table
  - like git-svn
  - also modified pack and index format to have lookup/translation efficiently
  - brian's series to eliminate SHA1 strings from the codebase
  - testsuite is not working well because hardcoded SHA1 values
  - flip a bit in the sha1 computation and see what breaks in the testsuite
  - will also need a way to do the conversion itself; traverse and write out 
new version
  - without that, can start new repos, but not work on old ones
  - on-disk formats will need to change -- something to keep in mind with new 
index work
  - documentation describes packfile and index formats
  - what time frame are we talking?
  - public perception question
  - signing commits doesn't help (just signs commit object) unless you "recursive 
sign"
  - switched to SHA1dc; we detect and reject known collision technique
  - do it now because it takes too long if we start when the collision drops
  - always call it "new hash" to reduce bikeshedding
  - is translation table a backdoor? has it been reviewed by crypto folks?
- no, but everything gets translated
  - meant to avoid a flag day for entire repositories
  - linus can decide to upgrade to newhash; if pushes to server that is not 
newhash aware, that's fine
  - will need a wire protocol change
  - v2 might add a capability for newhash
  - "now that you mention md5, it's a good idea"
  - can use md5 to test the conversion
  - is there a technical reason for why not /n/ hashes?
  - the slow step goes away as people converge to the new hash
  - beneficial to make up some fake hash function for testing
  - is there a plan on how we decide which hash function?
  - trust junio to merge commits when appropriate
  - conservancy committee explicitly does not make code decisions
  - waiting will just give better data
  - some hash functions are in silicon (e.g. microsoft cares)
  - any movement in libgit2 / jgit?
- basic stuff for libgit2; same testsuite problems
- no work in jgit
  - most optimistic forecast?
- could be done in 1-2y
  - submodules with one hash function?
- unable to convert project unless all submodules are converted
- OO-ing is not a prereq


Late reply, but one thing I brought up at the time is that we'll want to
keep this code around even after the NewHash migration at least for
testing purposes, should we ever need to move to NewNewHash.

It occurred to me recently that once we have such a layer it could be
(ab)used with some relatively minor changes to do any arbitrary
local-to-remote object content translation, unless I've missed something
(but I just re-read hash-function-transition.txt now...).

E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
remote server so that you upload a GPG encrypted version of all your
blobs, and have your trees reference those blobs.

Because we'd be doing arbitrary translations for all of
commits/trees/blobs this could go further than other bolted-on
encryption solutions for Git. E.g. paths in trees could be encrypted
too, as well as all the content of the commit object that isn't parent
info & the like (but that would have different hashes).

Basically clean/smudge filters on steroids, but for every object in the
repo. Anyone who got a hold of it would still see the shape of the repo
& approximate content size, but other than that it wouldn't be more info
than they'd get via `fast-export --anonymize` now.

I mainly find it interesting because presents an intersection between a
feature we might want to offer anyway, and something that would stress
the hash transition codepath going forward, to make sure it hasn't all
bitrotted by the time we'll need NewHash->NewNewHash.

Git hosting providers would hate it, but they should probably be
charging users by how much Michael Haggerty's git-sizer tool hates their
repo anyway :)



While we are converting to a new hash function, it would be nice
if we could add a couple of fields to the end of the OID:  the object
type and the raw uncompressed object size.

If would be nice if we could extend the OID to include 6 bytes of data
(4 or 8 bits for the type and the rest for the raw object size), and
just say that an OID is a {hash,type,size} tuple.

There are lots of places where we open an object to see what type it is
or how big it is.  This requires uncompressing/undeltafying the object
(or at least decoding enough to get the header).  In the case of missing
objects (partial clone or a gvfs-like projection) it requires either
dynamically fetching the object or asking an object-size-server for the
data.

All of these cases could be 

Re: [RFC PATCH v5 0/8] rebase-interactive

2018-03-26 Thread Jeff Hostetler



On 3/26/2018 11:56 AM, Junio C Hamano wrote:

Wink Saville  writes:


json-writer.c:123:38:  error:  format specifies type 'uintmax_t' (aka
'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
long long') [-Werror,-Wformat]

 strbuf_addf(>json, ":%"PRIuMAX, value);
  ~~ ^
json-writer.c:228:37:  error:  format specifies type 'uintmax_t' (aka
'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned
long long') [-Werror,-Wformat] [0m

 strbuf_addf(>json, "%"PRIuMAX, value);
  ~~ ^
2 errors generated.
make: *** [json-writer.o] Error 1
make: *** Waiting for unfinished jobs


For whatever reason, our codebase seems to shy away from PRIu64,
even though there are liberal uses of PRIu32.  Showing the value
casted to uintmax_t with PRIuMAX seems to be our preferred way to
say "We cannot say how wide this type is on different platforms, and
are playing safe by using widest-possible int type" (e.g. showing a
pid_t value from daemon.c).

In this codepath, the actual values are specified to be uint64_t, so
the use of PRIu64 may be OK, but I have to wonder why the codepath
is not dealing with uintmax_t in the first place.  When even larger
than present archs are prevalent in N years and 64-bit starts to
feel a tad small (like we feel for 16-bit ints these days), it will
feel a bit silly to have a subsystem that is limited to such a
"fixed and a tad small these days" types and pretend it to be be a
generic seriealizer, I suspect.



I defined that routine to take a uint64_t because I wanted to
pass a nanosecond value received from getnanotime() and that's
what it returns.

My preference would be to change the PRIuMAX to PRIu64, but there
aren't any other references in the code to that symbol and I didn't
want to start a new trend here.

I am concerned that the above compiler error message says that uintmax_t
is defined as an "unsigned long" (which is defined as *at least* 32 bits,
but not necessarily 64.  But a uint64_t is defined as a "unsigned long long"
and guaranteed as a 64 bit value.

So while I'm not really worried about 128 bit integers right now, I'm
more concerned about 32 bit compilers truncating that value without any
warnings.

Jeff



Re: [RFC PATCH 1/1] json-writer: incorrect format specifier

2018-03-24 Thread Jeff Hostetler



On 3/24/2018 1:37 AM, Wink Saville wrote:

In routines jw_object_uint64 and jw_object_double strbuf_addf is
invoked with strbuf_addf(>json, ":%"PRIuMAX, value) where value
is a uint64_t. This causes a compile error on OSX.

The correct format specifier is PRIu64 instead of PRIuMax.

Signed-off-by: Wink Saville 


That's odd.

A grep on the Git source tree did not find a "PRIu64" symbol.
Searching public-inbox only found one message [1] talking about it
(other than the ones associated with your messages here).

I have to wonder if that is defined in a OSX header file and
you're getting it from there [2].  (I don't have a MAC in front of
me, so I can't verify what's in that header.)  But [2] defines
PRIuMAX as PRIu64, so we shouldn't need to make that change in
json-writer -- unless something is getting lost in the #ifdefs.

Could you double check this in the header files on your system?
Any chance you are doing a 32-bit build?

Thanks
Jeff

[1] 
https://public-inbox.org/git/mwhpr21mb0478181ae0b64901da2c07cdf4...@mwhpr21mb0478.namprd21.prod.outlook.com/raw


[2] https://opensource.apple.com/source/gcc/gcc-926/inttypes.h.auto.html



---
  json-writer.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index 89a6abb57..04045448a 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char 
*key, uint64_t value)
maybe_add_comma(jw);
  
  	append_quoted_string(>json, key);

-   strbuf_addf(>json, ":%"PRIuMAX, value);
+   strbuf_addf(>json, ":%"PRIu64, value);
  }
  
  void jw_object_double(struct json_writer *jw, const char *fmt,

@@ -225,7 +225,7 @@ void jw_array_uint64(struct json_writer *jw, uint64_t value)
assert_in_array(jw);
maybe_add_comma(jw);
  
-	strbuf_addf(>json, "%"PRIuMAX, value);

+   strbuf_addf(>json, "%"PRIu64, value);
  }
  
  void jw_array_double(struct json_writer *jw, const char *fmt, double value)




Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 4:11 PM, René Scharfe wrote:

Am 23.03.2018 um 20:55 schrieb Jeff Hostetler:

+struct json_writer_level
+{
+    unsigned level_is_array : 1;
+    unsigned level_is_empty : 1;
+};
+
+struct json_writer
+{
+    struct json_writer_level *levels;
+    int nr, alloc;
+    struct strbuf json;
+};


A simpler and probably more compact representation of is_array would
be a strbuf with one char per level, e.g. '[' for an array and '{'
for an object (or ']' and '}').

I don't understand the need to track emptiness per level.  Only the
top level array/object can ever be empty, can it?


My expectation was that any sub-object or sub-array could be empty.
That is, this should be valid (and the JSON parser in Python allows):

      {"a":{}, "b":[], "c":[[]], "d":[{}]}


Sure, but the emptiness of finished arrays and objects doesn't matter
for the purposes of error checking, comma setting or closing.  At most
one of them is empty *and* unclosed while writing the overall JSON
object -- the last one opened:

{
{"a":{
{"a":{}, "b":[
{"a":{}, "b":[], "c":[
{"a":{}, "b":[], "c":[[
{"a":{}, "b":[], "c":[[]], "d":[
{"a":{}, "b":[], "c":[[]], "d":[{

Any of the earlier written arrays/objects are either closed or contain
at least a half-done sub-array/object, which makes them non-empty.

René



good point.  i'll revisit.  thanks.
Jeff


Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 2:01 PM, René Scharfe wrote:

Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com:

From: Jeff Hostetler <jeffh...@microsoft.com>

Add basic routines to generate data in JSON format.

Signed-off-by: Jeff Hostetler <jeffh...@microsoft.com>
---
   Makefile|   2 +
   json-writer.c   | 321 +
   json-writer.h   |  86 +
   t/helper/test-json-writer.c | 420 

   t/t0019-json-writer.sh  | 102 +++
   5 files changed, 931 insertions(+)
   create mode 100644 json-writer.c
   create mode 100644 json-writer.h
   create mode 100644 t/helper/test-json-writer.c
   create mode 100755 t/t0019-json-writer.sh

diff --git a/Makefile b/Makefile
index 1a9b23b..57f58e6 100644
--- a/Makefile
+++ b/Makefile
@@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh
   TEST_PROGRAMS_NEED_X += test-genrandom
   TEST_PROGRAMS_NEED_X += test-hashmap
   TEST_PROGRAMS_NEED_X += test-index-version
+TEST_PROGRAMS_NEED_X += test-json-writer
   TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
   TEST_PROGRAMS_NEED_X += test-line-buffer
   TEST_PROGRAMS_NEED_X += test-match-trees
@@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o
   LIB_OBJS += help.o
   LIB_OBJS += hex.o
   LIB_OBJS += ident.o
+LIB_OBJS += json-writer.o
   LIB_OBJS += kwset.o
   LIB_OBJS += levenshtein.o
   LIB_OBJS += line-log.o
diff --git a/json-writer.c b/json-writer.c
new file mode 100644
index 000..89a6abb
--- /dev/null
+++ b/json-writer.c
@@ -0,0 +1,321 @@
+#include "cache.h"
+#include "json-writer.h"
+
+static char g_ch_open[2]  = { '{', '[' };
+static char g_ch_close[2] = { '}', ']' };
+
+/*
+ * Append JSON-quoted version of the given string to 'out'.
+ */
+static void append_quoted_string(struct strbuf *out, const char *in)
+{
+   strbuf_addch(out, '"');
+   for (/**/; *in; in++) {
+   unsigned char c = (unsigned char)*in;
+   if (c == '"')
+   strbuf_add(out, "\\\"", 2);
+   else if (c == '\\')
+   strbuf_add(out, "", 2);
+   else if (c == '\n')
+   strbuf_add(out, "\\n", 2);
+   else if (c == '\r')
+   strbuf_add(out, "\\r", 2);
+   else if (c == '\t')
+   strbuf_add(out, "\\t", 2);
+   else if (c == '\f')
+   strbuf_add(out, "\\f", 2);
+   else if (c == '\b')
+   strbuf_add(out, "\\b", 2);


Using strbuf_addstr() here would result in the same object code (its
strlen() call is inlined for constants) and avoid having to specify
the redundant length 2.


sure. thanks.



+   else if (c < 0x20)
+   strbuf_addf(out, "\\u%04x", c);
+   else
+   strbuf_addch(out, c);
+   }
+   strbuf_addch(out, '"');
+}
+

[...]

+
+void jw_object_double(struct json_writer *jw, const char *fmt,
+ const char *key, double value)
+{
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   if (!fmt || !*fmt)
+   fmt = "%f";
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+   strbuf_addf(>json, fmt, value);


Hmm.  Can compilers check such a caller-supplied format matches the
value's type?  (I don't know how to specify a format attribute for
GCC and Clang for this function.)


I'll look into this.  thanks.



+}

[...]

+
+void jw_object_sub(struct json_writer *jw, const char *key,
+  const struct json_writer *value)
+{
+   assert_is_terminated(value);
+
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+   strbuf_addstr(>json, value->json.buf);


strbuf_addbuf() would be a better fit here -- it avoids a strlen() call
and NUL handling issues.


sure. thanks. i always forget about _addbuf().



+}
+
+void jw_object_inline_begin_array(struct json_writer *jw, const char *key)
+{
+   assert_in_object(jw, key);
+   maybe_add_comma(jw);
+
+   append_quoted_string(>json, key);
+   strbuf_addch(>json, ':');
+
+   jw_array_begin(jw);
+}


Those duplicate calls in the last ten functions feel mind-numbing.  A
helper function for adding comma, key and colon might be a good idea.


I'll see if I can add another macro to do the dirty work here.

[...]


diff --git a/json-writer.h b/json-writer.h
new file mode 100644
index 000..ad38c95
--- /dev/null
+++ b/json-writer.h
@@ -0,0 +1,86 @@
+#ifndef JSON_WRITER_H
+#define JSON_WRITER_H
+
+/*
+ * JSON data structures are defined at:
+ *  http://json.org/
+ *  http://www.ietf.org/rfc/rfc7159.txt
+ *
+ * 

Re: [PATCH v3] routines to generate JSON data

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 2:14 PM, Ramsay Jones wrote:



On 23/03/18 16:29, g...@jeffhostetler.com wrote:

From: Jeff Hostetler <jeffh...@microsoft.com>

This is version 3 of my JSON data format routines.


I have not looked at v3 yet - the patch below is against
the version in 'pu' @3284f940c (presumably that would be v2).



I built my v3 on Linux and didn't see any warnings.  I'll add
your extra CFLAGS and fix anything that I find and send up a v4
shortly.

Thanks.
Jeff


The version in 'pu' broke my build on Linux, but not on
cygwin - which was a bit of a surprise. That is, until I
saw the warnings and remembered that I have this in my
config.mak on Linux, but not on cygwin:

   ifneq ($(CC),clang)
   CFLAGS += -Wold-style-declaration
   CFLAGS += -Wno-pointer-to-int-cast
   CFLAGS += -Wsystem-headers
   endif

... and the warnings were:

   $ diff nout pout
   1c1
   < GIT_VERSION = 2.17.0.rc1.317.g4a561d2cc
   ---
   > GIT_VERSION = 2.17.0.rc1.445.g3284f940c
   29a30
   > CC commit-graph.o
   73a75,87
   > CC json-writer.o
   > json-writer.c:53:1: warning: ‘inline’ is not at beginning of declaration 
[-Wold-style-declaration]
   >  static void inline assert_in_object(const struct json_writer *jw, const 
char *key)
   >  ^
   > json-writer.c:64:1: warning: ‘inline’ is not at beginning of declaration 
[-Wold-style-declaration]
   >  static void inline assert_in_array(const struct json_writer *jw)
   >  ^
   > json-writer.c:75:1: warning: ‘inline’ is not at beginning of declaration 
[-Wold-style-declaration]
   >  static void inline maybe_add_comma(struct json_writer *jw)
   >  ^
   > json-writer.c:87:1: warning: ‘inline’ is not at beginning of declaration 
[-Wold-style-declaration]
   >  static void inline assert_is_terminated(const struct json_writer *jw)
   >  ^
   83a98
   > CC ls-refs.o

   ...
   $

The '-Wold-style-declaration' gcc warning flag is not a standard
project flag, and I can't quite remember why I have it set, so I
guess you could just ignore it. However, all other 'static inline'
functions in the project have the inline keyword before the return
type, so ... ;-)

Also, sparse spewed 40 warnings for t/helper/test-json-writer.c,
which were mainly about file-local symbols, but had a couple of
'constant is so large ...', like so:

   $ grep warning psp-out | head -8
   t/helper/test-json-writer.c:45:46: warning: constant 0x is 
so big it is unsigned long
   t/helper/test-json-writer.c:108:40: warning: constant 0x is 
so big it is unsigned long
   t/helper/test-json-writer.c:4:12: warning: symbol 'expect_obj1' was not 
declared. Should it be static?
   t/helper/test-json-writer.c:5:12: warning: symbol 'expect_obj2' was not 
declared. Should it be static?
   t/helper/test-json-writer.c:6:12: warning: symbol 'expect_obj3' was not 
declared. Should it be static?
   t/helper/test-json-writer.c:7:12: warning: symbol 'expect_obj4' was not 
declared. Should it be static?
   t/helper/test-json-writer.c:8:12: warning: symbol 'expect_obj5' was not 
declared. Should it be static?
   t/helper/test-json-writer.c:10:20: warning: symbol 'obj1' was not declared. 
Should it be static?
   $

I decided to use the UINT64_C(v) macro from , which is
a C99 feature, and will (hopefully) not be a problem.

ATB,
Ramsay Jones

-- >8 --
Subject: [PATCH] json-writer: fix up gcc and sparse warnings

Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com>
---
  json-writer.c   |  8 ++---
  t/helper/test-json-writer.c | 80 ++---
  2 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index 89a6abb57..ba0365d20 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -50,7 +50,7 @@ static inline void begin(struct json_writer *jw, int is_array)
  /*
   * Assert that we have an open object at this level.
   */
-static void inline assert_in_object(const struct json_writer *jw, const char 
*key)
+static inline void assert_in_object(const struct json_writer *jw, const char 
*key)
  {
if (!jw->nr)
BUG("object: missing jw_object_begin(): '%s'", key);
@@ -61,7 +61,7 @@ static void inline assert_in_object(const struct json_writer 
*jw, const char *ke
  /*
   * Assert that we have an open array at this level.
   */
-static void inline assert_in_array(const struct json_writer *jw)
+static inline void assert_in_array(const struct json_writer *jw)
  {
if (!jw->nr)
BUG("array: missing jw_begin()");
@@ -72,7 +72,7 @@ static void inline assert_in_array(const struct json_writer 
*jw)
  /*
   * Add comma if we have already seen a member at this level.
   */
-static void inline maybe_add_comma(struct json_writer *jw)
+static inline void maybe_add_comma(struct json_writer *jw)
  {
if (jw->levels[jw->nr - 1].level_is_empty)
jw->levels[jw->

Re: [PATCH v3] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/23/2018 1:18 PM, Jonathan Nieder wrote:

g...@jeffhostetler.com wrote:


From: Jeff Hostetler <jeffh...@microsoft.com>

Add basic routines to generate data in JSON format.

Signed-off-by: Jeff Hostetler <jeffh...@microsoft.com>


If I understand the cover letter correctly, this is a JSON-like
format, not actual JSON.  Am I understanding correctly?

What are the requirements for consuming this output?  Will a typical
JSON library be able to handle it without trouble?  If not, is there
some subset of cases where a typical JSON library is able to handle it
without trouble?


I'll add text to the commit message and the .h file explaining
the Unicode limitation in the next reroll.



Can you say a little about the motivation here?  (I know you already
put some of that in the cover letter, but since that doesn't become
part of permanent history, it doesn't help the audience that matters.)


I'll add a note about that to the commit message.
Thanks.



This would also be easier to review if there were an application of it
in the same series.  It's fine to send an RFC like this without such
an application, but I think we should hold off on merging it until we
have one.  Having an application makes review much easier since we can
see how the API works in practice, how well the approach fits what
users need, etc.


That's fine.  I'm planning to push up another patch series next week
that builds upon this, so hopefully that will help.



Thanks and hope that helps,
Jonathan


Thanks,
Jeff



Re: [PATCH v2] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jeff Hostetler



On 3/21/2018 5:25 PM, Junio C Hamano wrote:

g...@jeffhostetler.com writes:


From: Jeff Hostetler <jeffh...@microsoft.com>

Add basic routines to generate data in JSON format.


And the point of having capability to write JSON data in our
codebase is...?


diff --git a/json-writer.c b/json-writer.c
new file mode 100644
index 000..89a6abb
--- /dev/null
+++ b/json-writer.c
@@ -0,0 +1,321 @@
+#include "cache.h"
+#include "json-writer.h"
+
+static char g_ch_open[2]  = { '{', '[' };
+static char g_ch_close[2] = { '}', ']' };


What's "g_" prefix?


Global.

Sorry, very old habits.


+
+/*
+ * Append JSON-quoted version of the given string to 'out'.
+ */
+static void append_quoted_string(struct strbuf *out, const char *in)
+{
+   strbuf_addch(out, '"');
+   for (/**/; *in; in++) {
+   unsigned char c = (unsigned char)*in;


It is clear enough to lose /**/, i.e.

for (; *in; in++) {

but for this one. I wonder if

unsigned char c;
strbuf_addch(out, '"');
while ((c = *in++) != '\0') {
...

is easier to follow, though.


either way is fine.  will fix.




+static inline void begin(struct json_writer *jw, int is_array)
+{
+   ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc);
+
+   jw->levels[jw->nr].level_is_array = !!is_array;
+   jw->levels[jw->nr].level_is_empty = 1;


An element of this array is a struct that represents a level, and
everybody who accesses an element of that type knows it is talking
about a level by the field that has the array being named as
.levels[] (also [*1*]).  In such a context, it is a bit too loud to
name the fields with level_$blah.  IOW,

struct json_writer_level
{
unsigned is_array : 1;
unsigned is_empty : 1;
};


make sense.  will fix.


+struct json_writer_level
+{
+   unsigned level_is_array : 1;
+   unsigned level_is_empty : 1;
+};
+
+struct json_writer
+{
+   struct json_writer_level *levels;
+   int nr, alloc;
+   struct strbuf json;
+};


[Footnote]

*1* I personally prefer to call an array of things "thing[]", not
 "things[]", because then you can refer to an individual element
 e.g. "thing[4]" and read it as "the fourth thing".

 Unless the code often treats an array as a whole, that is, in
 which case, things[] is OK as you'll be calling the whole thing
 with the plural name (e.g. call that function and give all the
 things by passing things[]).

 In this case, one level instance is an element of a stack, and
 the code would be accessing one level at a time most of the
 time, so "writer.level[4].is_empty" would read more naturally
 than "writer.levels[4].level_is_empty".


yeah, that makes sense.

Thanks
Jeff




Re: [PATCH 0/2] routines to generate JSON data

2018-03-20 Thread Jeff Hostetler



On 3/20/2018 1:42 AM, Jeff King wrote:

On Mon, Mar 19, 2018 at 06:19:26AM -0400, Jeff Hostetler wrote:


To make the above work, I think you'd have to store a little more state.
E.g., the "array_append" functions check "out->len" to see if they need
to add a separating comma. That wouldn't work if we might be part of a
nested array. So I think you'd need a context struct like:

struct json_writer {
  int first_item;
  struct strbuf out;
};
#define JSON_WRITER_INIT { 1, STRBUF_INIT }

to store the state and the output. As a bonus, you could also use it to
store some other sanity checks (e.g., keep a "depth" counter and BUG()
when somebody tries to access the finished strbuf with a hanging-open
object or array).


Yeah, I thought about that, but I think it gets more complex than that.
I'd need a stack of "first_item" values.  Or maybe the _begin() needs to
increment a depth and set first_item and the _end() needs to always
unset first_item.  I'll look at this gain.


I think you may be able to get by with just unsetting first_item for any
"end". Because as you "pop" to whatever data structure is holding
whatever has ended, you know it's no longer the first item (whatever
just ended was there before it).

I admit I haven't thought too hard on it, though, so maybe I'm missing
something.


I'll take a look.  Thanks.

 

The thing I liked about the bottom-up construction is that it is easier
to collect multiple sets in parallel and combine them during the final
roll-up.  With the in-line nesting, you're tempted to try to construct
the resulting JSON in a single series and that may not fit what the code
is trying to do.  For example, if I wanted to collect an array of error
messages as they are generated and an array of argv arrays and any alias
expansions, then put together a final JSON string containing them and
the final exit code, I'd need to build it in parts.  I can build these
parts in pieces of JSON and combine them at the end -- or build up other
similar data structures (string arrays, lists, or whatever) and then
have a JSON conversion step.  But we can make it work both ways, I just
wanted to keep it simpler.


Yeah, I agree that kind of bottom-up construction would be nice for some
cases. I'm mostly worried about inefficiency copying the strings over
and over as we build up the final output.  Maybe that's premature
worrying, though.

If the first_item thing isn't too painful, then it might be nice to have
both approaches available.


True.

 

In general I'd really prefer to keep the shell script as the driver for
the tests, and have t/helper programs just be conduits. E.g., something
like:

cat >expect <<-\EOF &&
{"key": "value", "foo": 42}
EOF
test-json-writer >actual \
  object_begin \
  object_append_string key value \
  object_append_int foo 42 \
  object_end &&
test_cmp expect actual

It's a bit tedious (though fairly mechanical) to expose the API in this
way, but it makes it much easier to debug, modify, or add tests later
on (for example, I had to modify the C program to find out that my
append example above wouldn't work).


Yeah, I wasn't sure if such a simple api required exposing all that
machinery to the shell or not.  And the api is fairly self-contained
and not depending on a lot of disk/repo setup or anything, so my tests
would be essentially static WRT everything else.

With my t0019 script you should have been able use -x -v to see what
was failing.


I was able to run the test-helper directly. The tricky thing is that I
had to write new C code to test my theory about how the API worked.
Admittedly that's not something most people would do regularly, but I
often seem to end up doing that kind of probing and debugging. Many
times I've found the more generic t/helper programs useful.

I also wonder if various parts of the system embrace JSON, if we'd want
to have a tool for generating it as part of other tests (e.g., to create
"expect" files).


Ok, let me see what I can come up with.

Thanks
Jeff



Re: [PATCH 0/2] routines to generate JSON data

2018-03-19 Thread Jeff Hostetler



On 3/17/2018 3:38 AM, Jacob Keller wrote:

On Fri, Mar 16, 2018 at 2:18 PM, Jeff King  wrote:

   3. Some other similar format. YAML comes to mind. Last time I looked
  (quite a while ago), it seemed insanely complex, but I think you
  could implement only a reasonable subset. OTOH, I think the tools
  ecosystem for parsing JSON (e.g., jq) is much better.



I would personally avoid YAML. It's "easier" for humans to read/parse,
but honestly JSON is already simple enough and anyone who writes C or
javascript can likely parse and hand-write JSON anyways. YAML lacks
built-in parsers for most languages, where as many scripting languages
already have JSON parsing built in, or have more easily attainable
libraries available. In contrast, the YAML libraries are much more
complex and less likely to be available.

That's just my own experience at $dayjob though.


Agreed.  I just looked at the spec for it and I think it would be
harder for us to be assured we are generating valid output with
leading whitespace being significant (without a lot more inspection
of the strings being passed down to us).

Jeff



Re: [PATCH 0/2] routines to generate JSON data

2018-03-19 Thread Jeff Hostetler



On 3/16/2018 5:18 PM, Jeff King wrote:

On Fri, Mar 16, 2018 at 07:40:55PM +, g...@jeffhostetler.com wrote:


[...]

I really like the idea of being able to send our machine-readable output
in some "standard" syntax for which people may already have parsers. But
one big hangup with JSON is that it assumes all strings are UTF-8. That
may be OK for telemetry data, but it would probably lead to problems for
something like status porcelain, since Git's view of paths is just a
string of bytes (not to mention possible uses elsewhere like author
names, subject lines, etc).

[...]

I'll come back to the UTF-8/YAML questions in a separate response.



Documentation for the new API is given in json-writer.h at the bottom of
the first patch.


The API generally looks pleasant, but the nesting surprised me.  E.g.,
I'd have expected:

   jw_array_begin(out);
   jw_array_begin(out);
   jw_array_append_int(out, 42);
   jw_array_end(out);
   jw_array_end(out);

to result in an array containing an array containing an integer. But
array_begin() actually resets the strbuf, so you can't build up nested
items like this internally. Ditto for objects within objects. You have
to use two separate strbufs and copy the data an extra time.

To make the above work, I think you'd have to store a little more state.
E.g., the "array_append" functions check "out->len" to see if they need
to add a separating comma. That wouldn't work if we might be part of a
nested array. So I think you'd need a context struct like:

   struct json_writer {
 int first_item;
 struct strbuf out;
   };
   #define JSON_WRITER_INIT { 1, STRBUF_INIT }

to store the state and the output. As a bonus, you could also use it to
store some other sanity checks (e.g., keep a "depth" counter and BUG()
when somebody tries to access the finished strbuf with a hanging-open
object or array).


Yeah, I thought about that, but I think it gets more complex than that.
I'd need a stack of "first_item" values.  Or maybe the _begin() needs to
increment a depth and set first_item and the _end() needs to always
unset first_item.  I'll look at this gain.

The thing I liked about the bottom-up construction is that it is easier
to collect multiple sets in parallel and combine them during the final
roll-up.  With the in-line nesting, you're tempted to try to construct
the resulting JSON in a single series and that may not fit what the code
is trying to do.  For example, if I wanted to collect an array of error
messages as they are generated and an array of argv arrays and any alias
expansions, then put together a final JSON string containing them and
the final exit code, I'd need to build it in parts.  I can build these
parts in pieces of JSON and combine them at the end -- or build up other
similar data structures (string arrays, lists, or whatever) and then
have a JSON conversion step.  But we can make it work both ways, I just
wanted to keep it simpler.



I wasn't sure how to unit test the API from a shell script, so I added a
helper command that does most of the work in the second patch.


In general I'd really prefer to keep the shell script as the driver for
the tests, and have t/helper programs just be conduits. E.g., something
like:

   cat >expect <<-\EOF &&
   {"key": "value", "foo": 42}
   EOF
   test-json-writer >actual \
 object_begin \
 object_append_string key value \
 object_append_int foo 42 \
 object_end &&
   test_cmp expect actual

It's a bit tedious (though fairly mechanical) to expose the API in this
way, but it makes it much easier to debug, modify, or add tests later
on (for example, I had to modify the C program to find out that my
append example above wouldn't work).


Yeah, I wasn't sure if such a simple api required exposing all that
machinery to the shell or not.  And the api is fairly self-contained
and not depending on a lot of disk/repo setup or anything, so my tests
would be essentially static WRT everything else.

With my t0019 script you should have been able use -x -v to see what
was failing.



-Peff



thanks for the quick review
Jeff


Re: [PATCH 04/36] t/helper: merge test-lazy-init-name-hash into test-tool

2018-03-19 Thread Jeff Hostetler



On 3/18/2018 4:47 AM, Eric Sunshine wrote:

On Sun, Mar 18, 2018 at 4:25 AM, Duy Nguyen  wrote:

On Sun, Mar 18, 2018 at 3:11 AM, Eric Sunshine  wrote:

On Sat, Mar 17, 2018 at 3:53 AM, Nguyễn Thái Ngọc Duy  wrote:

-extern int test_lazy_init_name_hash(struct index_state *istate, int 
try_threaded);
+extern int lazy_init_name_hash_for_testing(struct index_state *istate, int 
try_threaded);


I get why you renamed this since the "main" function in the test
program wants to be called 'test_lazy_init_name_hash'...


diff --git a/t/helper/test-lazy-init-name-hash.c 
b/t/helper/test-lazy-init-name-hash.c
@@ -9,6 +10,9 @@ static int perf;
+static int (*init_name_hash)(struct index_state *istate, int try_threaded) =
+   lazy_init_name_hash_for_testing;
+
@@ -33,9 +37,9 @@ static void dump_run(void)
 if (single) {
-   test_lazy_init_name_hash(_index, 0);
+   init_name_hash(_index, 0);


... but I'm having trouble understanding why this indirection through
'init_name_hash' is used rather than just calling
lazy_init_name_hash_for_testing() directly. Am I missing something
obvious or is 'init_name_hash' just an unneeded artifact of an earlier
iteration before the rename in cache.{c,h}?


Nope. It just feels too long to me and because we're already in the
test I don't feel the need to repeat _for_testing everywhere. If it's
confusing, I'll remove init_name_hash.


Without an explanatory in-code comment, I'd guess that someone coming
across this in the future will also stumble over it just like I did in
the review.


I agree with Eric, this indirection seems unnecessary and confusing.
Generally, when I see function pointers initialized like that, I
think that there are plans for additional providers/drivers for that
functionality, but I don't see that here.  And it seems odd.



What if, instead, you rename test_lazy_init_name_hash() to
lazy_init_name_hash_test(), shifting 'test' from the front to the back
of the name? That way, the name remains the same length at the call
sites in the test helper, and you don't have to introduce the
confusing, otherwise unneeded 'init_name_hash'.



I see 2 problems.
1. The test function in name-hash.c that needs access to private data.
I'm not a fan of the "_for_testing" suffix, but I'm open.  I might
either leave it as is, or make it a "TEST_" or "test__" prefix (as
a guide for people used to languages with namespaces.

2. The new name for cmd_main().  There's no real need why it needs to
   be "test_*", right?   Your cmds[] maps the command line string to a
   function, but it could be anything.  That is, "cmd_main()" could be
   renamed "cmd__lazy_init_name_hash()".  Then you can conceptually
   reserve the "cmd__" prefix throughout t/helper for each test handler.

Just a thought,
Thanks,
Jeff



Re: What's cooking in git.git (Feb 2018, #03; Wed, 21)

2018-02-27 Thread Jeff Hostetler



On 2/21/2018 7:31 PM, Junio C Hamano wrote:


* jh/status-no-ahead-behind (2018-01-24) 4 commits
  - status: support --no-ahead-behind in long format
  - status: update short status to respect --no-ahead-behind
  - status: add --[no-]ahead-behind to status and commit for V2 format.
  - stat_tracking_info: return +1 when branches not equal

  "git status" can spend a lot of cycles to compute the relation
  between the current branch and its upstream, which can now be
  disabled with "--no-ahead-behind" option.

  At v5; is this ready for 'next'?


yes, i believe so.

thanks
jeff



Re: What's cooking in git.git (Feb 2018, #01; Wed, 7)

2018-02-12 Thread Jeff Hostetler


* jh/status-no-ahead-behind (2018-01-24) 4 commits
  - status: support --no-ahead-behind in long format
  - status: update short status to respect --no-ahead-behind
  - status: add --[no-]ahead-behind to status and commit for V2 format.
  - stat_tracking_info: return +1 when branches not equal

  "git status" can spend a lot of cycles to compute the relation
  between the current branch and its upstream, which can now be
  disabled with "--no-ahead-behind" option.

  At v5; is this ready for 'next'?



I believe so.  I don't recall any further discussions on it.

The only open question was the idea of trying to walk 100 or so
commits and see if one was "close" to being an ancestor of the
other, but we thought that Stolee's graph cache would be a better
solution for that.

So, yes, I think it is ready for 'next'.

Thanks,
Jeff



Re: partial fetch

2018-02-12 Thread Jeff Hostetler



On 2/12/2018 11:24 AM, Basin Ilya wrote:

Hi.
In 2017 a set of patches titled "add object filtering for partial fetch" was 
accepted. Is it what I think it is? Will we be able to download only a subdirectory from a
large project?



yes, that is the goal.
there are several caveats, but yes, that is the goal.

Jeff


Re: [RFC PATCH 000/194] Moving global state into the repository object

2018-02-07 Thread Jeff Hostetler



On 2/5/2018 6:51 PM, Stefan Beller wrote:

This series moves a lot of global state into the repository struct.
It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0)
It can be found at https://github.com/stefanbeller/git/tree/object-store

Motivation for this series:
* Easier to reason about code when all state is stored in one place,
   for example for multithreading
* Easier to move to processing submodules in-process instead of
   calling a processes for the submodule handling.
   The last patch demonstrates this.

[...]

Very nice.  My eyes glazed over a few times, but I like the
direction you're heading here.

Thanks for tackling this!
Jeff




Re: [PATCH v2 00/27] protocol version 2

2018-02-01 Thread Jeff Hostetler



On 1/25/2018 6:58 PM, Brandon Williams wrote:

Changes in v2:
  * Added documentation for fetch
  * changes #defines for state variables to be enums
  * couple code changes to pkt-line functions and documentation
  * Added unit tests for the git-serve binary as well as for ls-refs

[...]


This looks really nice.  I'm eager to get this in so we can do some
additional commands to help make partial clone more efficient.

Thanks,
Jeff



Re: [PATCH 12/26] ls-refs: introduce ls-refs server command

2018-02-01 Thread Jeff Hostetler



On 1/2/2018 7:18 PM, Brandon Williams wrote:

Introduce the ls-refs server command.  In protocol v2, the ls-refs
command is used to request the ref advertisement from the server.  Since
it is a command which can be requested (as opposed to mandatory in v1),
a client can sent a number of parameters in its request to limit the ref
advertisement based on provided ref-patterns.

Signed-off-by: Brandon Williams 
---
  Documentation/technical/protocol-v2.txt | 26 +
  Makefile|  1 +
  ls-refs.c   | 97 +
  ls-refs.h   |  9 +++
  serve.c |  2 +
  5 files changed, 135 insertions(+)
  create mode 100644 ls-refs.c
  create mode 100644 ls-refs.h

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index b87ba3816..5f4d0e719 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -89,3 +89,29 @@ terminate the connection.
  Commands are the core actions that a client wants to perform (fetch, push,
  etc).  Each command will be provided with a list capabilities and
  arguments as requested by a client.
+
+ Ls-refs
+-
+
+Ls-refs is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in parameters
+which can be used to limit the refs sent from the server.
+
+Ls-ref takes in the following parameters wraped in packet-lines:
+
+  symrefs: In addition to the object pointed by it, show the underlying
+  ref pointed by it when showing a symbolic ref.
+  peel: Show peeled tags.
+  ref-pattern : When specified, only references matching the
+given patterns are displayed.
+
+The output of ls-refs is as follows:
+
+output = *ref
+flush-pkt
+ref = PKT-LINE((tip | peeled) LF)
+tip = obj-id SP refname (SP symref-target)
+peeled = obj-id SP refname "^{}"
+
+symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF)
+shallow = PKT-LINE("shallow" SP obj-id LF)


Do you want to talk about ordering requirements on this?
I think packed-refs has one, but I'm not sure it matters here
where the client or server sorts it.

Are there any provisions for compressing the renames, like in the
reftable spec or in index-v4 ?

It doesn't need to be in the initial version.  Just asking.  We could
always add a "ls-refs-2" command that builds upon this.

Thanks,
Jeff


Re: [PATCH 11/26] serve: introduce git-serve

2018-02-01 Thread Jeff Hostetler



On 2/1/2018 1:57 PM, Stefan Beller wrote:

On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <g...@jeffhostetler.com> wrote:



On 1/2/2018 7:18 PM, Brandon Williams wrote:


Introduce git-serve, the base server for protocol version 2.

[...]

+ Special Packets
+-
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message



Previously, a 0001 pkt-line meant that there was 1 byte of data
following, right?


No, the length was including the length field, so 0005 would indicate that
there is one byte following, (+4 bytes of "0005" included)


d'oh.  right.  thanks!


Should we also consider increasing the pkt-line limit to 5 hex-digits
while we're at it ?   That would let us have 1MB buffers if that would
help with large packfiles.


AFAICT there is a static allocation of one pkt-line (of maximum size),
such that the code can read in a full packet and then process it.
If we'd increase the packet size we'd need the static buffer to be 1MB,
which sounds good for my developer machine. But I suspect it may be
too much for people using git on embedded devices?


I got burned by that static buffer once upon a time when I wanted
to have 2 streams going at the same time.  Hopefully, we can move
that into the new reader structure at some point (if it isn't already).



pack files larger than 64k are put into multiple pkt-lines, which is
not a big deal, as the overhead of 4bytes per 64k is negligible.
(also there is progress information in the side channel, which
would come in as a special packet in between real packets,
such that every 64k transmitted you can update your progress
meter; Not sure I feel strongly on fewer progress updates)


  Granted, we're throttled by the network,
so it might not matter.  Would it be interesting to have a 5 digit
prefix with parts of the high bits of first digit being flags ?
Or is this too radical of a change?


What would the flags be for?

As an alternative we could put the channel number in one byte,
such that we can have a side channel not just while streaming the
pack but all the time. (Again, not sure if that buys a lot for us)



Delimiters like the 0001 and the side channel are a couple of
ideas, but I was just thinking out loud.  And right, I'm not sure
it gets us much right now.

Jeff


Re: [PATCH 11/26] serve: introduce git-serve

2018-02-01 Thread Jeff Hostetler



On 1/2/2018 7:18 PM, Brandon Williams wrote:

Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
  .gitignore  |   1 +
  Documentation/technical/protocol-v2.txt |  91 
  Makefile|   2 +
  builtin.h   |   1 +
  builtin/serve.c |  30 
  git.c   |   1 +
  serve.c | 239 
  serve.h |  15 ++
  8 files changed, 380 insertions(+)
  create mode 100644 Documentation/technical/protocol-v2.txt
  create mode 100644 builtin/serve.c
  create mode 100644 serve.c
  create mode 100644 serve.h

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..2d0450c26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
  /git-rm
  /git-send-email
  /git-send-pack
+/git-serve
  /git-sh-i18n
  /git-sh-i18n--envsubst
  /git-sh-setup
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0..b87ba3816
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,91 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service.
+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line (as there will be a single
+capability per pkt-line).
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+
+ Detailed Design
+=
+
+A client can request to speak protocol v2 by sending `version=2` in the
+side-channel `GIT_PROTOCOL` in the initial request to the server.
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Special Packets
+-
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message


Previously, a 0001 pkt-line meant that there was 1 byte of data
following, right?  Does this change that and/or prevent 1 byte
packets?  (Not sure if it is likely, but the odd-tail of a packfile
might get sent in a 0001 line, right?)  Or is it that 0001 is only
special during the V2 negotiation stuff, but not during the packfile
transmission?

(I'm not against having this delimiter -- I think it is useful, but
just curious if will cause problems elsewhere.)

Should we also consider increasing the pkt-line limit to 5 hex-digits
while we're at it ?   That would let us have 1MB buffers if that would
help with large packfiles.  Granted, we're throttled by the network,
so it might not matter.  Would it be interesting to have a 5 digit
prefix with parts of the high bits of first digit being flags ?
Or is this too radical of a change?



+
+ Capability Advertisement
+--
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+capability-advertisement = protocol-version
+  

Re: [RFC PATCH 0/1] Implement CMake build

2018-01-26 Thread Jeff Hostetler



On 1/25/2018 7:21 PM, Isaac Hier wrote:

Hi Jeff,

I have been looking at the build generator, which looks promising, but
I have one concern. Assuming I can generate a CMakeLists.txt that
appropriately updates the library sources, etc. how do you suggest I
handle new portability macros? For example, assume someone adds a
macro HAVE_X to indicate the availability of some platform-specific
function x. In the current Makefile, a comment would be added to the
top indicating when HAVE_X or NO_X should be set, and that option
would toggle the HAVE_X C macro. But CMake can test for the
availability of x, which is one of the main motives for adding a CMake
build. The current build generator uses the output of make, so all it
would know is whether or not HAVE_X is defined on the platform that
ran the Makefile, but not the entire list of platform that git
supports.

Bottom line: should I add the portability tests as they are now,
without accounting for future portability macros? One good alternative
might be to suggest the authors of new portability macros include a
small sample C program to test it. That would allow me to easily patch
the CMake tests whenever that came up. In a best case scenario, a
practice could be established to write the test in a specific
directory with a certain name so that I could automatically update the
CMake tests from the build generator.

Thanks for the help,

Isaac


It's been years since I've used cmake as anything other than
a casual (downstream) consumer, so I'm not sure I can answer
your questions.

The vcxproj target we have is a bit of a hack to automatically
capture the set of source files and target libraries and executables.
We don't try to capture the spirit of all of the HAVE_ and NO_ flags
when we build the *.vcxproj files.  And we make some assumptions in
the generation template for the usual VC/VS settings.  But then
Windows is a single target and we don't have to worry about some
things (like whether or not qsort is present).

I don't want to discourage you from attempting this.  (And I realize
that my initial response might have given that impression -- I mainly
wanted to say that we don't currently have a problem on Windows with
the current Makefile situation.)

A full cmake system would let us simplify some things, but it also
complicates some things.  So we might be trading one set of problems
for another.  For example, the libgit2 project uses cmake.  Not to
pick on them, but when I look at it, I see a lot of the same issues
(but perhaps with better syntax than the makefile).

https://github.com/libgit2/libgit2/blob/master/CMakeLists.txt

As to your portability test questions, I'm afraid I don't know.
Sorry,
Jeff




Re: [RFC PATCH 0/1] Implement CMake build

2018-01-24 Thread Jeff Hostetler



On 1/24/2018 2:59 PM, Isaac Hier wrote:

Jeff, no worries, fair enough. I know https://github.com/grpc/grpc
uses a shared file to generate code for several build systems instead
of maintaining them individually. I plan on doing the work anyway just
because I have my own reasons to use CMake in Git (for packaging in
https://github.com/ruslo/hunter is my main motive here). Whether or
not it is maintained upstream is not a real concern for me at the
moment.

[...]

I'll see how the Windows build currently works and if that makes
sense, maybe I'll try using that build generator here too.

Thanks for the feedback,

Isaac


Look at the "vcxproj:" target in config.mak.uname (in the GfW repo).

Jeff


Re: [RFC PATCH 0/1] Implement CMake build

2018-01-24 Thread Jeff Hostetler



On 1/22/2018 7:16 PM, Isaac Hier wrote:

This patch adds a mostly complete (aside from building tests, documentation,
installation, etc.) CMake build to the git project. I am not sure how much
interest there is in a CMake build, so please send me feedback one way or
another. Personally, I believe CMake will help with Windows builds and is
somewhat easier to read than a Makefile. I considered, adding this to the
contrib directory, but CMakeLists.txt almost always reside in the original
directories, and I'm not sure how wise it would be to do otherwise. If you are
interested in a CMake build, I would be more than happy to finish up the work
here. Decided to wait until I discussed the issue here to finish the final parts
of the build.


On Windows, we use "bash" and "make" from the Git-for-Windows SDK installation
(which gives us a bash shell and most of the usual Unix command line tools) and
the main "Makefile".  We do need a special section in the "config.mak.uname" 
file
to set some platform compiler options and etc., but that is small enough.

Johannes and I recently added a few new options to let Windows build Git from
the command line with either GCC or MSVC and to synthesize MSVS solution (.sln)
and project (.vcxproj) files to allow you to work with the full MSVS IDE and
full intellisense.  And if necessary download and build third-party libraries
not normally present on a Windows machine.  Most of this work is Windows 
specific
and may not yet be upstream.   See GfW [1] and VCPKG [2].

The synthesized solution and project files are automatically generated, so we
do not have to separately track changes in the Makefile to the various file 
lists.
These should be treated as read-only and re-generated in response to changes in
the Makefile.  Using the solution/project files, we can completely build Git in
the IDE or a command prompt and without the SDK.  This further simplifies things
for Windows developers.

So given that, I don't see a need to replace the main Makefile on Windows.

Sorry,
Jeff

[1] https://github.com/git-for-windows/git
[2] https://github.com/Microsoft/vcpkg


Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)

2018-01-24 Thread Jeff Hostetler



On 1/23/2018 5:39 PM, Junio C Hamano wrote:

[Stalled]

* jh/status-no-ahead-behind (2018-01-04) 4 commits
  - status: support --no-ahead-behind in long format
  - status: update short status to respect --no-ahead-behind
  - status: add --[no-]ahead-behind to status and commit for V2 format.
  - stat_tracking_info: return +1 when branches not equal

  Expecting a reroll to finalize the topic.


I sent a V5 (1/9/2018) of this series that removed the config setting.
It looks like your branch on GH only has the V4 version.


WRT the reroll:
There was a long discussion on a limited calculation (option (2)).
While I appreciate how that would be nice to have (for some simple
cases -- like my current feature branch relative to my upstream),
I think that that is something we could add later if we want.

(*) This patch gives a simple on/off.  This has value by itself for
certain situations.

(*) To implement a limit would involve hacking the merge-base
calculation which gets into graph shape questions that I don't
think we want to get into.  Several messages in the thread talk
about some example shapes, but I don't think we have consensus on
what we want here.

(*) Later this week Stolee will be submitting a patch series that add
a client-side commit graph cache.  This allows fast calculations
and includes generation numbers.  I'd rather wait until we have
this in and then see if we want/need the limited option.

Thanks,
Jeff


Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs

2018-01-17 Thread Jeff Hostetler



On 1/17/2018 12:54 PM, Christian Couder wrote:

As sha1_file_name() could be performance sensitive, let's
try to make it faster by seeding the initial buffer size
to avoid any need to realloc and by using strbuf_addstr()
and strbuf_addc() instead of strbuf_addf().

Helped-by: Derrick Stolee <sto...@gmail.com>
Helped-by: Jeff Hostetler <g...@jeffhostetler.com>
Signed-off-by: Christian Couder <chrisc...@tuxfamily.org>
---
  sha1_file.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index f66c21b2da..1a94716962 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -323,8 +323,14 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
  
  void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)

  {
-   strbuf_addf(buf, "%s/", get_object_directory());
+   const char *obj_dir = get_object_directory();
+   size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ;


Very minor nit.  Should this be "+3" rather than "+1"?
One for the slash after obj_dir, one for the slash between
"xx/y[38]", and one for the trailing NUL.

  
+	if (extra > strbuf_avail(buf))

+   strbuf_grow(buf, extra);
+
+   strbuf_addstr(buf, obj_dir);
+   strbuf_addch(buf, '/');
fill_sha1_path(buf, sha1);
  }
  



Re: [PATCH] sha1_file: remove static strbuf from sha1_file_name()

2018-01-16 Thread Jeff Hostetler



On 1/16/2018 9:01 AM, Derrick Stolee wrote:

On 1/16/2018 2:18 AM, Christian Couder wrote:

Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in most of the callers.

So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.


First: this is a good change for preventing bugs in the future. Do not let my 
next thought deter you from making this change.

Second: I wonder if there is any perf hit now that we are allocating buffers 
much more often. Also, how often does get_object_directory() change, so in some 
cases we could cache the buffer and only append the parts for the loose object 
(and not reallocate because the filenames will have equal length).

I'm concerned about the perf implications when inspecting many loose objects 
(100k+) but these code paths seem to be involved with more substantial work, 
such as opening and parsing the objects, so keeping a buffer in-memory is 
probably unnecessary.


---
  cache.h   |  8 +++-
  http-walker.c |  6 --
  http.c    | 16 ++--
  sha1_file.c   | 38 +-
  4 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index d8b975a571..6db565408e 100644
--- a/cache.h
+++ b/cache.h
@@ -957,12 +957,10 @@ extern void check_repository_format(void);
  #define TYPE_CHANGED    0x0040
  /*
- * Return the name of the file in the local object database that would
- * be used to store a loose object with the specified sha1.  The
- * return value is a pointer to a statically allocated buffer that is
- * overwritten each time the function is called.
+ * Put in `buf` the name of the file in the local object database that
+ * would be used to store a loose object with the specified sha1.
   */
-extern const char *sha1_file_name(const unsigned char *sha1);
+extern void sha1_file_name(struct strbuf *buf, const unsigned char *sha1);
  /*
   * Return an abbreviated sha1 unique within this repository's object database.
diff --git a/http-walker.c b/http-walker.c
index 1ae8363de2..07c2b1af82 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -544,8 +544,10 @@ static int fetch_object(struct walker *walker, unsigned 
char *sha1)
  } else if (hashcmp(obj_req->sha1, req->real_sha1)) {
  ret = error("File %s has bad hash", hex);
  } else if (req->rename < 0) {
-    ret = error("unable to write sha1 filename %s",
-    sha1_file_name(req->sha1));
+    struct strbuf buf = STRBUF_INIT;
+    sha1_file_name(, req->sha1);
+    ret = error("unable to write sha1 filename %s", buf.buf);
+    strbuf_release();
  }
  release_http_object_request(req);
diff --git a/http.c b/http.c
index 5977712712..5979305bc9 100644
--- a/http.c
+++ b/http.c
@@ -2168,7 +2168,7 @@ struct http_object_request *new_http_object_request(const 
char *base_url,
  unsigned char *sha1)
  {
  char *hex = sha1_to_hex(sha1);
-    const char *filename;
+    struct strbuf filename = STRBUF_INIT;
  char prevfile[PATH_MAX];
  int prevlocal;
  char prev_buf[PREV_BUF_SIZE];
@@ -2180,14 +2180,15 @@ struct http_object_request 
*new_http_object_request(const char *base_url,
  hashcpy(freq->sha1, sha1);
  freq->localfile = -1;
-    filename = sha1_file_name(sha1);
+    sha1_file_name(, sha1);
  snprintf(freq->tmpfile, sizeof(freq->tmpfile),
- "%s.temp", filename);
+ "%s.temp", filename.buf);
-    snprintf(prevfile, sizeof(prevfile), "%s.prev", filename);
+    snprintf(prevfile, sizeof(prevfile), "%s.prev", filename.buf);
  unlink_or_warn(prevfile);
  rename(freq->tmpfile, prevfile);
  unlink_or_warn(freq->tmpfile);
+    strbuf_release();
  if (freq->localfile != -1)
  error("fd leakage in start: %d", freq->localfile);
@@ -2302,6 +2303,7 @@ void process_http_object_request(struct 
http_object_request *freq)
  int finish_http_object_request(struct http_object_request *freq)
  {
  struct stat st;
+    struct strbuf filename = STRBUF_INIT;
  close(freq->localfile);
  freq->localfile = -1;
@@ -2327,8 +2329,10 @@ int finish_http_object_request(struct 
http_object_request *freq)
  unlink_or_warn(freq->tmpfile);
  return -1;
  }
-    freq->rename =
-    finalize_object_file(freq->tmpfile, sha1_file_name(freq->sha1));
+
+    sha1_file_name(, freq->sha1);
+    freq->rename = finalize_object_file(freq->tmpfile, filename.buf);
+    strbuf_release();
  return freq->rename;
  }
diff --git a/sha1_file.c b/sha1_file.c
index 3da70ac650..f66c21b2da 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -321,15 +321,11 @@ static void fill_sha1_path(struct strbuf *buf, const 
unsigned char *sha1)
  }
  }
-const char *sha1_file_name(const unsigned char *sha1)
+void sha1_file_name(struct strbuf *buf, const unsigned char *sha1)
  {
-    static struct strbuf buf = STRBUF_INIT;
-
-    strbuf_reset();
-    strbuf_addf(, 

Re: What's cooking in git.git (Jan 2018, #02; Tue, 9)

2018-01-10 Thread Jeff Hostetler



On 1/10/2018 2:57 PM, Junio C Hamano wrote:

Jeff Hostetler <g...@jeffhostetler.com> writes:


On 1/9/2018 6:33 PM, Junio C Hamano wrote:

--
[Cooking]


* jh/fsck-promisors (2017-12-08) 10 commits

[...]


* jh/partial-clone (2017-12-08) 13 commits

[...]

Parts 2 and 3 of partial clone have been simmering
for a while now.  I was wondering if there were any
more comments or questions on them.  I don't recall
any existing issues.


Me neither.

I do not mind merging them to 'next' during the feature freeze, but
we won't be merging them to 'master' soon anyway, and I'd like to
see us concentrate more on finding and fixing regressions on the
'master' front for now.


I didn't want to rush this -- just check to see if there was
any thing that I should focus on while waiting for 2.16 to ship
and settle down.

Leave this where it is now if you want and we can merge it in later.

Thanks
Jeff

 


Re: What's cooking in git.git (Jan 2018, #02; Tue, 9)

2018-01-10 Thread Jeff Hostetler



On 1/9/2018 6:33 PM, Junio C Hamano wrote:

--
[Cooking]


* jh/fsck-promisors (2017-12-08) 10 commits

[...]


* jh/partial-clone (2017-12-08) 13 commits

[...]

Parts 2 and 3 of partial clone have been simmering
for a while now.  I was wondering if there were any
more comments or questions on them.  I don't recall
any existing issues.

Thanks
Jeff





  1   2   3   4   5   6   7   >