Re: how does "clone --filter=sparse:path" work?
On 11/8/2018 12:07 AM, Jeff King wrote: [cc-ing people who worked on or seem interested in partial-clone filters] I've been exploring the partial-clone options a bit, and took a look at the "sparse:path" option. I had some confusion initially, because I expected it work something like this: git clone --filter=sparse:path=Documentation . But it really doesn't take an in-repo path. You have to specify a path to a file that contains a file with .gitignore patterns. Except they're actually _inverted_ patterns (i.e., what to include). Which confused me again, but I guess makes sense if these are meant to be adapted from sparse-checkout files. So my first question is: how is this meant to be used? I guess the idea is that somebody (the repo admin?) makes a set of pre-made profiles, with each profile mentioning some subset of paths. And then when you clone, you say, "I want the foo profile". How is that profile stored and accessed? Yes, the basic idea was if you had a large tree and various groups of users that tended to only need their group's portion of the tree, then you could (or your repo admin could) create several profiles. And users could use a profile to request just the subset of trees and blobs they need. During a clone/fetch users could ask for the profile by OID or by :. It would be a local/custom detail where the repo admin collects the various profiles. (Or at least, that was the thought.) Likewise, a user could create their own profile(s) by committing a sparse-checkout file spec to a personal branch. Again, a local convention. If it's a blob in the repository, I think you can use something like "--filter=sparse:oid=profiles:foo". And then after cloning, you'd do "git cat-file blob profiles:foo >.git/sparse-checkout" or similar. That seems like something that could be wrapped up in a single clone option, but I can't find one; I won't be surprised if it simply hasn't happened yet. Right, TBD. But if it's a path, then what? I'd imagine that you'd "somehow" get a copy of the sparse profile you want out-of-band. And then you'd say "I want to clone with the profile in this file", and then copy it into the sparse-checkout file to do the checkout. But the sparse-checkout file in the filter is not expanded locally, with patterns sent over the wire. The _filename_ is sent over the wire, and then upload-pack expands it. So you can't specify an arbitrary set of patterns; you have to know about the profile names (how?) on the server. That's not very flexible, though I imagine it may make certain things easier on the server (e.g., if you pack in such a way to efficiently serve only particular profiles). But I'm also concerned it's dangerous. We're reading gitignore patterns from an arbitrary name on the server's filesystem, with that name coming from an untrusted client. So I can do: git clone --filter=sparse:path=/etc/passwd $remote and the server will read /etc/passwd. There's probably a lot of mischief you can get up to with that. Sadly reading /proc/self/mem doesn't work, because the gitignore code fstat()s the file to find out how much to read, and the st_size there is 0. But there are probably others (/proc/kcore is a fun one, but nobody is running their git server as root, right?). Below is a proof of concept script that uses this as an oracle to explore the filesystem, as well as individual lines of files. Should we simply be disallowing sparse:path filters over upload-pack? The option to allow an absolute path over the wire probably needs more thought as you suggest. Having it in the traverse code was useful for local testing in the client. But mainly I was thinking of a use case on the client of the form: git rev-list --objects --filter=spec:path=.git/sparse-checkout --missing=print and get a list of the blobs that you don't have and would need before you could checkout using the current sparse-checkout definition. You could then have a pre-checkout hook that would bulk fetch them before starting the actual checkout. Since that would be more efficient than demand-loading blobs individually during the checkout. There's more work to do in this area, but that was the idea. But back to your point, yes, I think we should restrict this over the wire. Thanks, Jeff -Peff -- >8 -- # Set this to host:repo.git to see a real cross-machine connection (which makes # it more obvious which side is reading which files). For a simulated local # one, we'll use --no-local to make sure we really run upload-pack. SERVER=server.git # Do these steps manually on the remote side if you're trying it cross-server. case "$SERVER" in *:*) ;; *) # Imagine a server with a repository users can push to, with filters enabled. git init -q --bare $SERVER git -C $SERVER config uploadpack.allowfilter true # Imagine the server has a file outside of the repo we're interested in. echo foo
Re: [PATCH 0/1] commit-reach: fix first-parent heuristic
On 10/18/2018 10:31 PM, Junio C Hamano wrote: "Derrick Stolee via GitGitGadget" writes: I originally reported this fix [1] after playing around with the trace2 series for measuring performance. Since trace2 isn't merging quickly, I pulled the performance fix patch out and am sending it on its own. The only difference here is that we don't have the tracing to verify the performance fix in the test script. Thanks for sending this separately. What's the current status of the tracing patch series, by the way? I'm still working on it. I hope to reroll and submit a new version next week. We are currently dogfooding a version of it with our gvfs users. Jeff
[PATCH v3 2/2] mingw: fix mingw_open_append to work with named pipes
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- compat/mingw.c| 36 --- t/t0051-windows-named-pipe.sh | 2 +- 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 858ca14a57..18caf21969 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode) return ret; } +/* + * Calling CreateFile() using FILE_APPEND_DATA and without FILE_WRITE_DATA + * is documented in [1] as opening a writable file handle in append mode. + * (It is believed that) this is atomic since it is maintained by the + * kernel unlike the O_APPEND flag which is racily maintained by the CRT. + * + * [1] https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants + * + * This trick does not appear to work for named pipes. Instead it creates + * a named pipe client handle that cannot be written to. Callers should + * just use the regular _wopen() for them. (And since client handle gets + * bound to a unique server handle, it isn't really an issue.) + */ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) { HANDLE handle; @@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) NULL, create, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) return errno = err_win_to_posix(GetLastError()), -1; + /* * No O_APPEND here, because the CRT uses it only to reset the -* file pointer to EOF on write(); but that is not necessary -* for a file created with FILE_APPEND_DATA. +* file pointer to EOF before each write(); but that is not +* necessary (and may lead to races) for a file created with +* FILE_APPEND_DATA. */ fd = _open_osfhandle((intptr_t)handle, O_BINARY); if (fd < 0) @@ -371,6 +386,21 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) return fd; } +/* + * Does the pathname map to the local named pipe filesystem? + * That is, does it have a "//./pipe/" prefix? + */ +static int is_local_named_pipe_path(const char *filename) +{ + return (is_dir_sep(filename[0]) && + is_dir_sep(filename[1]) && + filename[2] == '.' && + is_dir_sep(filename[3]) && + !strncasecmp(filename+4, "pipe", 4) && + is_dir_sep(filename[8]) && + filename[9]); +} + int mingw_open (const char *filename, int oflags, ...) { typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, ...); @@ -387,7 +417,7 @@ int mingw_open (const char *filename, int oflags, ...) if (filename && !strcmp(filename, "/dev/null")) filename = "nul"; - if (oflags & O_APPEND) + if ((oflags & O_APPEND) && !is_local_named_pipe_path(filename)) open_fn = mingw_open_append; else open_fn = _wopen; diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh index e3c36341a0..10ac92d225 100755 --- a/t/t0051-windows-named-pipe.sh +++ b/t/t0051-windows-named-pipe.sh @@ -4,7 +4,7 @@ test_description='Windows named pipes' . ./test-lib.sh -test_expect_failure MINGW 'o_append write to named pipe' ' +test_expect_success MINGW 'o_append write to named pipe' ' GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 && { test-tool windows-named-pipe t0051 >actual 2>&1 & } && pid=$! && -- gitgitgadget
[PATCH v3 0/2] Fixup for js/mingw-o-append
The recent change mingw O_APPEND change breaks writing to named pipes on Windows. The first commit adds a new test to confirm the breakage and the second commit fixes the problem. These could be squashed together or we can just keep the fix and omit the test if that would be better. d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND The new mingw_open_append() routine successfully opens the client side of the named pipe, but the first write() to it fails with EBADF. Adding the FILE_WRITE_DATA corrects the problem. Signed-off-by: Jeff Hostetler jeffh...@microsoft.com [jeffh...@microsoft.com] Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: p...@peff.net Jeff Hostetler (2): t0051: test GIT_TRACE to a windows named pipe mingw: fix mingw_open_append to work with named pipes Makefile | 1 + compat/mingw.c | 36 +-- t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 6 files changed, 129 insertions(+), 3 deletions(-) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh base-commit: d641097589160eb795127d8dbcb14c877c217b60 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-35/jeffhostetler/fixup-mingw-o-append-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/35 Range-diff vs v2: 1: ecb30eb47c = 1: ecb30eb47c t0051: test GIT_TRACE to a windows named pipe 2: f0361dd306 ! 2: 5592300ca5 mingw: fix mingw_open_append to work with named pipes @@ -46,22 +46,20 @@ return fd; } -+#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\')) +/* + * Does the pathname map to the local named pipe filesystem? + * That is, does it have a "//./pipe/" prefix? + */ -+static int mingw_is_local_named_pipe_path(const char *filename) ++static int is_local_named_pipe_path(const char *filename) +{ -+ return (IS_SBS(filename[0]) && -+ IS_SBS(filename[1]) && ++ return (is_dir_sep(filename[0]) && ++ is_dir_sep(filename[1]) && + filename[2] == '.' && -+ IS_SBS(filename[3]) && ++ is_dir_sep(filename[3]) && + !strncasecmp(filename+4, "pipe", 4) && -+ IS_SBS(filename[8]) && ++ is_dir_sep(filename[8]) && + filename[9]); +} -+#undef IS_SBS + int mingw_open (const char *filename, int oflags, ...) { @@ -71,7 +69,7 @@ filename = "nul"; - if (oflags & O_APPEND) -+ if ((oflags & O_APPEND) && !mingw_is_local_named_pipe_path(filename)) ++ if ((oflags & O_APPEND) && !is_local_named_pipe_path(filename)) open_fn = mingw_open_append; else open_fn = _wopen; -- gitgitgadget
[PATCH v3 1/2] t0051: test GIT_TRACE to a windows named pipe
From: Jeff Hostetler Create a test-tool helper to create the server side of a windows named pipe, wait for a client connection, and copy data written to the pipe to stdout. Create t0051 test to route GIT_TRACE output of a command to a named pipe using the above test-tool helper. Signed-off-by: Jeff Hostetler --- Makefile | 1 + t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 5 files changed, 96 insertions(+) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh diff --git a/Makefile b/Makefile index e4b503d259..b1b934d295 100644 --- a/Makefile +++ b/Makefile @@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o TEST_BUILTINS_OBJS += test-subprocess.o TEST_BUILTINS_OBJS += test-urlmatch-normalization.o TEST_BUILTINS_OBJS += test-wildmatch.o +TEST_BUILTINS_OBJS += test-windows-named-pipe.o TEST_BUILTINS_OBJS += test-write-cache.o TEST_PROGRAMS_NEED_X += test-dump-fsmonitor diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 805a45de9c..7a9764cd5c 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -41,6 +41,9 @@ static struct test_cmd cmds[] = { { "subprocess", cmd__subprocess }, { "urlmatch-normalization", cmd__urlmatch_normalization }, { "wildmatch", cmd__wildmatch }, +#ifdef GIT_WINDOWS_NATIVE + { "windows-named-pipe", cmd__windows_named_pipe }, +#endif { "write-cache", cmd__write_cache }, }; diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 7116ddfb94..01c34fe5e7 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv); int cmd__subprocess(int argc, const char **argv); int cmd__urlmatch_normalization(int argc, const char **argv); int cmd__wildmatch(int argc, const char **argv); +#ifdef GIT_WINDOWS_NATIVE +int cmd__windows_named_pipe(int argc, const char **argv); +#endif int cmd__write_cache(int argc, const char **argv); #endif diff --git a/t/helper/test-windows-named-pipe.c b/t/helper/test-windows-named-pipe.c new file mode 100644 index 00..b4b752b01a --- /dev/null +++ b/t/helper/test-windows-named-pipe.c @@ -0,0 +1,72 @@ +#include "test-tool.h" +#include "git-compat-util.h" +#include "strbuf.h" + +#ifdef GIT_WINDOWS_NATIVE +static const char *usage_string = ""; + +#define TEST_BUFSIZE (4096) + +int cmd__windows_named_pipe(int argc, const char **argv) +{ + const char *filename; + struct strbuf pathname = STRBUF_INIT; + int err; + HANDLE h; + BOOL connected; + char buf[TEST_BUFSIZE + 1]; + + if (argc < 2) + goto print_usage; + filename = argv[1]; + if (strchr(filename, '/') || strchr(filename, '\\')) + goto print_usage; + strbuf_addf(, "//./pipe/%s", filename); + + /* +* Create a single instance of the server side of the named pipe. +* This will allow exactly one client instance to connect to it. +*/ + h = CreateNamedPipeA( + pathname.buf, + PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE, + PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT, + PIPE_UNLIMITED_INSTANCES, + TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL); + if (h == INVALID_HANDLE_VALUE) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "CreateNamedPipe failed: %s\n", + strerror(err)); + return err; + } + + connected = ConnectNamedPipe(h, NULL) + ? TRUE + : (GetLastError() == ERROR_PIPE_CONNECTED); + if (!connected) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "ConnectNamedPipe failed: %s\n", + strerror(err)); + CloseHandle(h); + return err; + } + + while (1) { + DWORD nbr; + BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL); + if (!success || nbr == 0) + break; + buf[nbr] = 0; + + write(1, buf, nbr); + } + + DisconnectNamedPipe(h); + CloseHandle(h); + return 0; + +print_usage: + fprintf(stderr, "usage: %s %s\n", argv[0], usage_string); + return 1; +} +#endif diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh new file mode 100755 index 00..e3c36341a0 --- /dev/null +++ b/t/t0051-windows-named-pipe.sh @@ -0,0 +1,17 @@ +#!/bin/sh + +test_description='Windows named pipes' + +. ./test-lib.sh + +test_expe
Re: [PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes
On 9/10/2018 6:00 PM, Junio C Hamano wrote: Johannes Sixt writes: +#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\')) I think you already have mingw_is_dir_sep() and its shorter alias is_dir_sep() available to you. good catch. thanks. +/* + * Does the pathname map to the local named pipe filesystem? + * That is, does it have a "//./pipe/" prefix? + */ +static int mingw_is_local_named_pipe_path(const char *filename) There is no need to prefix mingw_ to this function that is file local static. Isn't is_local_named_pipe() descriptive and unique enough? right. will do. +{ + return (IS_SBS(filename[0]) && + IS_SBS(filename[1]) && + filename[2] == '.' && + IS_SBS(filename[3]) && + !strncasecmp(filename+4, "pipe", 4) && + IS_SBS(filename[8]) && + filename[9]); +} +#undef IS_SBS It is kind-of surprising that there hasn't been any existing need for a helper function that would allow us to write this function like so: static int is_local_named_pipe(const char *path) { return path_is_in_directory(path, "//./pipe/"); } Not a suggestion to add such a thing; as long as we know there is no other codepath that would benefit from having one, a generalization like that can and should wait. Yeah, I don't think we need something that general just yet. Named pipes exist in a special namespace using the UNC/network-share syntax (rather than the DOS drive-letter syntax), and we don't do much with UNC paths yet. Perhaps, later we could have something to try splitting a UNC path into , , and and then have is_local_named_pipe() verify the first 2 are as expected. Or have a function like is_path_in_volume(path, server, volume) that does the tests without cutting up strings and allocating. We could do either, but I don't think we need to be that general yet. Thanks, Jeff
Re: [PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes
On 9/10/2018 3:45 PM, Johannes Sixt wrote: Am 10.09.18 um 19:05 schrieb Jeff Hostetler via GitGitGadget: diff --git a/compat/mingw.c b/compat/mingw.c index 858ca14a57..f87376b26a 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode) return ret; } +/* + * Calling CreateFile() using FILE_APPEND_DATA and without FILE_WRITE_DATA + * is documented in [1] as opening a writable file handle in append mode. + * (It is believed that) this is atomic since it is maintained by the + * kernel unlike the O_APPEND flag which is racily maintained by the CRT. + * + * [1] https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants + * + * This trick does not appear to work for named pipes. Instead it creates + * a named pipe client handle that cannot be written to. Callers should + * just use the regular _wopen() for them. (And since client handle gets + * bound to a unique server handle, it isn't really an issue.) + */ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) { HANDLE handle; @@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) NULL, create, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) return errno = err_win_to_posix(GetLastError()), -1; + /* * No O_APPEND here, because the CRT uses it only to reset the - * file pointer to EOF on write(); but that is not necessary - * for a file created with FILE_APPEND_DATA. + * file pointer to EOF before each write(); but that is not + * necessary (and may lead to races) for a file created with + * FILE_APPEND_DATA. */ fd = _open_osfhandle((intptr_t)handle, O_BINARY); if (fd < 0) @@ -371,6 +386,23 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) return fd; } +#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\')) +/* + * Does the pathname map to the local named pipe filesystem? + * That is, does it have a "//./pipe/" prefix? + */ +static int mingw_is_local_named_pipe_path(const char *filename) +{ + return (IS_SBS(filename[0]) && + IS_SBS(filename[1]) && + filename[2] == '.' && + IS_SBS(filename[3]) && + !strncasecmp(filename+4, "pipe", 4) && + IS_SBS(filename[8]) && + filename[9]); +} +#undef IS_SBS + int mingw_open (const char *filename, int oflags, ...) { typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, ...); @@ -387,7 +419,7 @@ int mingw_open (const char *filename, int oflags, ...) if (filename && !strcmp(filename, "/dev/null")) filename = "nul"; - if (oflags & O_APPEND) + if ((oflags & O_APPEND) && !mingw_is_local_named_pipe_path(filename)) open_fn = mingw_open_append; else open_fn = _wopen; This looks reasonable. Thanks for the review. I wonder which part of the code uses local named pipes. Is it downstream in Git for Windows or one of the topics in flight? -- Hannes I'm wanting to use them as a tracing target option in my trace2 series currently in progress. Jeff
[PATCH v2 2/2] mingw: fix mingw_open_append to work with named pipes
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- compat/mingw.c| 38 --- t/t0051-windows-named-pipe.sh | 2 +- 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 858ca14a57..f87376b26a 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -341,6 +341,19 @@ int mingw_mkdir(const char *path, int mode) return ret; } +/* + * Calling CreateFile() using FILE_APPEND_DATA and without FILE_WRITE_DATA + * is documented in [1] as opening a writable file handle in append mode. + * (It is believed that) this is atomic since it is maintained by the + * kernel unlike the O_APPEND flag which is racily maintained by the CRT. + * + * [1] https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants + * + * This trick does not appear to work for named pipes. Instead it creates + * a named pipe client handle that cannot be written to. Callers should + * just use the regular _wopen() for them. (And since client handle gets + * bound to a unique server handle, it isn't really an issue.) + */ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) { HANDLE handle; @@ -360,10 +373,12 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) NULL, create, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) return errno = err_win_to_posix(GetLastError()), -1; + /* * No O_APPEND here, because the CRT uses it only to reset the -* file pointer to EOF on write(); but that is not necessary -* for a file created with FILE_APPEND_DATA. +* file pointer to EOF before each write(); but that is not +* necessary (and may lead to races) for a file created with +* FILE_APPEND_DATA. */ fd = _open_osfhandle((intptr_t)handle, O_BINARY); if (fd < 0) @@ -371,6 +386,23 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) return fd; } +#define IS_SBS(ch) (((ch) == '/') || ((ch) == '\\')) +/* + * Does the pathname map to the local named pipe filesystem? + * That is, does it have a "//./pipe/" prefix? + */ +static int mingw_is_local_named_pipe_path(const char *filename) +{ + return (IS_SBS(filename[0]) && + IS_SBS(filename[1]) && + filename[2] == '.' && + IS_SBS(filename[3]) && + !strncasecmp(filename+4, "pipe", 4) && + IS_SBS(filename[8]) && + filename[9]); +} +#undef IS_SBS + int mingw_open (const char *filename, int oflags, ...) { typedef int (*open_fn_t)(wchar_t const *wfilename, int oflags, ...); @@ -387,7 +419,7 @@ int mingw_open (const char *filename, int oflags, ...) if (filename && !strcmp(filename, "/dev/null")) filename = "nul"; - if (oflags & O_APPEND) + if ((oflags & O_APPEND) && !mingw_is_local_named_pipe_path(filename)) open_fn = mingw_open_append; else open_fn = _wopen; diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh index e3c36341a0..10ac92d225 100755 --- a/t/t0051-windows-named-pipe.sh +++ b/t/t0051-windows-named-pipe.sh @@ -4,7 +4,7 @@ test_description='Windows named pipes' . ./test-lib.sh -test_expect_failure MINGW 'o_append write to named pipe' ' +test_expect_success MINGW 'o_append write to named pipe' ' GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 && { test-tool windows-named-pipe t0051 >actual 2>&1 & } && pid=$! && -- gitgitgadget
[PATCH v2 0/2] Fixup for js/mingw-o-append
The recent change mingw O_APPEND change breaks writing to named pipes on Windows. The first commit adds a new test to confirm the breakage and the second commit fixes the problem. These could be squashed together or we can just keep the fix and omit the test if that would be better. d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND The new mingw_open_append() routine successfully opens the client side of the named pipe, but the first write() to it fails with EBADF. Adding the FILE_WRITE_DATA corrects the problem. Signed-off-by: Jeff Hostetler jeffh...@microsoft.com [jeffh...@microsoft.com] Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: p...@peff.net Jeff Hostetler (2): t0051: test GIT_TRACE to a windows named pipe mingw: fix mingw_open_append to work with named pipes Makefile | 1 + compat/mingw.c | 38 ++-- t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 6 files changed, 131 insertions(+), 3 deletions(-) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh base-commit: d641097589160eb795127d8dbcb14c877c217b60 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-35/jeffhostetler/fixup-mingw-o-append-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/35 Range-diff vs v1: 1: 03453cb521 ! 1: ecb30eb47c t0051: test GIT_TRACE to a windows named pipe @@ -140,7 +140,7 @@ + +. ./test-lib.sh + -+test_expect_success MINGW 'o_append write to named pipe' ' ++test_expect_failure MINGW 'o_append write to named pipe' ' + GIT_TRACE="$(pwd)/expect" git status >/dev/null 2>&1 && + { test-tool windows-named-pipe t0051 >actual 2>&1 & } && + pid=$! && 2: f433937d55 < -: -- mingw: fix mingw_open_append to work with named pipes -: -- > 2: f0361dd306 mingw: fix mingw_open_append to work with named pipes -- gitgitgadget
[PATCH v2 1/2] t0051: test GIT_TRACE to a windows named pipe
From: Jeff Hostetler Create a test-tool helper to create the server side of a windows named pipe, wait for a client connection, and copy data written to the pipe to stdout. Create t0051 test to route GIT_TRACE output of a command to a named pipe using the above test-tool helper. Signed-off-by: Jeff Hostetler --- Makefile | 1 + t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 5 files changed, 96 insertions(+) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh diff --git a/Makefile b/Makefile index e4b503d259..b1b934d295 100644 --- a/Makefile +++ b/Makefile @@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o TEST_BUILTINS_OBJS += test-subprocess.o TEST_BUILTINS_OBJS += test-urlmatch-normalization.o TEST_BUILTINS_OBJS += test-wildmatch.o +TEST_BUILTINS_OBJS += test-windows-named-pipe.o TEST_BUILTINS_OBJS += test-write-cache.o TEST_PROGRAMS_NEED_X += test-dump-fsmonitor diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 805a45de9c..7a9764cd5c 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -41,6 +41,9 @@ static struct test_cmd cmds[] = { { "subprocess", cmd__subprocess }, { "urlmatch-normalization", cmd__urlmatch_normalization }, { "wildmatch", cmd__wildmatch }, +#ifdef GIT_WINDOWS_NATIVE + { "windows-named-pipe", cmd__windows_named_pipe }, +#endif { "write-cache", cmd__write_cache }, }; diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 7116ddfb94..01c34fe5e7 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv); int cmd__subprocess(int argc, const char **argv); int cmd__urlmatch_normalization(int argc, const char **argv); int cmd__wildmatch(int argc, const char **argv); +#ifdef GIT_WINDOWS_NATIVE +int cmd__windows_named_pipe(int argc, const char **argv); +#endif int cmd__write_cache(int argc, const char **argv); #endif diff --git a/t/helper/test-windows-named-pipe.c b/t/helper/test-windows-named-pipe.c new file mode 100644 index 00..b4b752b01a --- /dev/null +++ b/t/helper/test-windows-named-pipe.c @@ -0,0 +1,72 @@ +#include "test-tool.h" +#include "git-compat-util.h" +#include "strbuf.h" + +#ifdef GIT_WINDOWS_NATIVE +static const char *usage_string = ""; + +#define TEST_BUFSIZE (4096) + +int cmd__windows_named_pipe(int argc, const char **argv) +{ + const char *filename; + struct strbuf pathname = STRBUF_INIT; + int err; + HANDLE h; + BOOL connected; + char buf[TEST_BUFSIZE + 1]; + + if (argc < 2) + goto print_usage; + filename = argv[1]; + if (strchr(filename, '/') || strchr(filename, '\\')) + goto print_usage; + strbuf_addf(, "//./pipe/%s", filename); + + /* +* Create a single instance of the server side of the named pipe. +* This will allow exactly one client instance to connect to it. +*/ + h = CreateNamedPipeA( + pathname.buf, + PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE, + PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT, + PIPE_UNLIMITED_INSTANCES, + TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL); + if (h == INVALID_HANDLE_VALUE) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "CreateNamedPipe failed: %s\n", + strerror(err)); + return err; + } + + connected = ConnectNamedPipe(h, NULL) + ? TRUE + : (GetLastError() == ERROR_PIPE_CONNECTED); + if (!connected) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "ConnectNamedPipe failed: %s\n", + strerror(err)); + CloseHandle(h); + return err; + } + + while (1) { + DWORD nbr; + BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL); + if (!success || nbr == 0) + break; + buf[nbr] = 0; + + write(1, buf, nbr); + } + + DisconnectNamedPipe(h); + CloseHandle(h); + return 0; + +print_usage: + fprintf(stderr, "usage: %s %s\n", argv[0], usage_string); + return 1; +} +#endif diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh new file mode 100755 index 00..e3c36341a0 --- /dev/null +++ b/t/t0051-windows-named-pipe.sh @@ -0,0 +1,17 @@ +#!/bin/sh + +test_description='Windows named pipes' + +. ./test-lib.sh + +test_expe
Re: [PATCH 2/2] mingw: fix mingw_open_append to work with named pipes
On 9/10/2018 12:42 PM, Junio C Hamano wrote: Jeff Hostetler writes: Yeah, this whole thing is a little under-documented for my tastes. Let's leave it as you have it. I'll re-roll with a fix to route named pipes to the existing _wopen() code. OK, I have these two patches in 'next', but let's postpone it for now. Windows port will be built with extra topics over what we have a the release anyway, so your improved version may become part of that and then can come back to benefit us in the next cycle ;-) Thanks. That's fine. This can easily wait until after 2.19.0. Thanks Jeff
Re: [PATCH 2/2] mingw: fix mingw_open_append to work with named pipes
On 9/8/2018 2:31 PM, Johannes Sixt wrote: Am 08.09.2018 um 11:26 schrieb Johannes Sixt: Am 07.09.2018 um 20:19 schrieb Jeff Hostetler via GitGitGadget: diff --git a/compat/mingw.c b/compat/mingw.c index 858ca14a57..ef03bbe5d2 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -355,7 +355,7 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) * FILE_SHARE_WRITE is required to permit child processes * to append to the file. */ - handle = CreateFileW(wfilename, FILE_APPEND_DATA, + handle = CreateFileW(wfilename, FILE_WRITE_DATA | FILE_APPEND_DATA, FILE_SHARE_WRITE | FILE_SHARE_READ, NULL, create, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) I did not go with this version because the documentation https://docs.microsoft.com/en-us/windows/desktop/fileio/file-access-rights-constants says: FILE_APPEND_DATA: For a file object, the right to append data to the file. (For local files, write operations will not overwrite existing data if this flag is specified without FILE_WRITE_DATA.) [...] which could be interpreted as: Only if FILE_WRITE_DATA is not set, we have the guarantee that existing data in local files is not overwritten, i.e., new data is appended atomically. Is this interpretation too narrow and we do get atomicity even when FILE_WRITE_DATA is set? Here is are some comments on stackoverflow which let me think that FILE_APPEND_DATA with FILE_WRITE_DATA is no longer atomic: https://stackoverflow.com/questions/20093571/difference-between-file-write-data-and-file-append-data#comment29995346_20108249 -- Hannes Yeah, this whole thing is a little under-documented for my tastes. Let's leave it as you have it. I'll re-roll with a fix to route named pipes to the existing _wopen() code. thanks Jeff
Re: [PATCH 1/2] t0051: test GIT_TRACE to a windows named pipe
On 9/9/2018 3:28 AM, Sebastian Schuberth wrote: On 9/7/2018 8:19 PM, Jeff Hostetler via GitGitGadget wrote: +test_expect_success MINGW 'o_append write to named pipe' ' Shouldn't this be "test_expect_failure" here, and then be changed to "test_expect_success" by your second patch? yes. thanks! Jeff
Re: [PATCH 0/2] Fixup for js/mingw-o-append
GitGitGadget botched the CCs when I submitted this. Replying here to add them. Sorry, Jeff https://github.com/gitgitgadget/gitgitgadget/issues/35 On 9/7/2018 2:19 PM, Jeff Hostetler via GitGitGadget wrote: The recent change mingw O_APPEND change breaks writing to named pipes on Windows. The first commit adds a new test to confirm the breakage and the second commit fixes the problem. These could be squashed together or we can just keep the fix and omit the test if that would be better. d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND The new mingw_open_append() routine successfully opens the client side of the named pipe, but the first write() to it fails with EBADF. Adding the FILE_WRITE_DATA corrects the problem. Signed-off-by: Jeff Hostetler jeffh...@microsoft.com [jeffh...@microsoft.com] Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: p...@peff.net Jeff Hostetler (2): t0051: test GIT_TRACE to a windows named pipe mingw: fix mingw_open_append to work with named pipes Makefile | 1 + compat/mingw.c | 2 +- t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 6 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh base-commit: d641097589160eb795127d8dbcb14c877c217b60 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-35/jeffhostetler/fixup-mingw-o-append-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/35
[PATCH 1/2] t0051: test GIT_TRACE to a windows named pipe
From: Jeff Hostetler Create a test-tool helper to create the server side of a windows named pipe, wait for a client connection, and copy data written to the pipe to stdout. Create t0051 test to route GIT_TRACE output of a command to a named pipe using the above test-tool helper. Signed-off-by: Jeff Hostetler --- Makefile | 1 + t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 5 files changed, 96 insertions(+) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh diff --git a/Makefile b/Makefile index e4b503d259..b1b934d295 100644 --- a/Makefile +++ b/Makefile @@ -731,6 +731,7 @@ TEST_BUILTINS_OBJS += test-submodule-config.o TEST_BUILTINS_OBJS += test-subprocess.o TEST_BUILTINS_OBJS += test-urlmatch-normalization.o TEST_BUILTINS_OBJS += test-wildmatch.o +TEST_BUILTINS_OBJS += test-windows-named-pipe.o TEST_BUILTINS_OBJS += test-write-cache.o TEST_PROGRAMS_NEED_X += test-dump-fsmonitor diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 805a45de9c..7a9764cd5c 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -41,6 +41,9 @@ static struct test_cmd cmds[] = { { "subprocess", cmd__subprocess }, { "urlmatch-normalization", cmd__urlmatch_normalization }, { "wildmatch", cmd__wildmatch }, +#ifdef GIT_WINDOWS_NATIVE + { "windows-named-pipe", cmd__windows_named_pipe }, +#endif { "write-cache", cmd__write_cache }, }; diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 7116ddfb94..01c34fe5e7 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -35,6 +35,9 @@ int cmd__submodule_config(int argc, const char **argv); int cmd__subprocess(int argc, const char **argv); int cmd__urlmatch_normalization(int argc, const char **argv); int cmd__wildmatch(int argc, const char **argv); +#ifdef GIT_WINDOWS_NATIVE +int cmd__windows_named_pipe(int argc, const char **argv); +#endif int cmd__write_cache(int argc, const char **argv); #endif diff --git a/t/helper/test-windows-named-pipe.c b/t/helper/test-windows-named-pipe.c new file mode 100644 index 00..b4b752b01a --- /dev/null +++ b/t/helper/test-windows-named-pipe.c @@ -0,0 +1,72 @@ +#include "test-tool.h" +#include "git-compat-util.h" +#include "strbuf.h" + +#ifdef GIT_WINDOWS_NATIVE +static const char *usage_string = ""; + +#define TEST_BUFSIZE (4096) + +int cmd__windows_named_pipe(int argc, const char **argv) +{ + const char *filename; + struct strbuf pathname = STRBUF_INIT; + int err; + HANDLE h; + BOOL connected; + char buf[TEST_BUFSIZE + 1]; + + if (argc < 2) + goto print_usage; + filename = argv[1]; + if (strchr(filename, '/') || strchr(filename, '\\')) + goto print_usage; + strbuf_addf(, "//./pipe/%s", filename); + + /* +* Create a single instance of the server side of the named pipe. +* This will allow exactly one client instance to connect to it. +*/ + h = CreateNamedPipeA( + pathname.buf, + PIPE_ACCESS_INBOUND | FILE_FLAG_FIRST_PIPE_INSTANCE, + PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT, + PIPE_UNLIMITED_INSTANCES, + TEST_BUFSIZE, TEST_BUFSIZE, 0, NULL); + if (h == INVALID_HANDLE_VALUE) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "CreateNamedPipe failed: %s\n", + strerror(err)); + return err; + } + + connected = ConnectNamedPipe(h, NULL) + ? TRUE + : (GetLastError() == ERROR_PIPE_CONNECTED); + if (!connected) { + err = err_win_to_posix(GetLastError()); + fprintf(stderr, "ConnectNamedPipe failed: %s\n", + strerror(err)); + CloseHandle(h); + return err; + } + + while (1) { + DWORD nbr; + BOOL success = ReadFile(h, buf, TEST_BUFSIZE, , NULL); + if (!success || nbr == 0) + break; + buf[nbr] = 0; + + write(1, buf, nbr); + } + + DisconnectNamedPipe(h); + CloseHandle(h); + return 0; + +print_usage: + fprintf(stderr, "usage: %s %s\n", argv[0], usage_string); + return 1; +} +#endif diff --git a/t/t0051-windows-named-pipe.sh b/t/t0051-windows-named-pipe.sh new file mode 100755 index 00..10ac92d225 --- /dev/null +++ b/t/t0051-windows-named-pipe.sh @@ -0,0 +1,17 @@ +#!/bin/sh + +test_description='Windows named pipes' + +. ./test-lib.sh + +test_expe
[PATCH 2/2] mingw: fix mingw_open_append to work with named pipes
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- compat/mingw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compat/mingw.c b/compat/mingw.c index 858ca14a57..ef03bbe5d2 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -355,7 +355,7 @@ static int mingw_open_append(wchar_t const *wfilename, int oflags, ...) * FILE_SHARE_WRITE is required to permit child processes * to append to the file. */ - handle = CreateFileW(wfilename, FILE_APPEND_DATA, + handle = CreateFileW(wfilename, FILE_WRITE_DATA | FILE_APPEND_DATA, FILE_SHARE_WRITE | FILE_SHARE_READ, NULL, create, FILE_ATTRIBUTE_NORMAL, NULL); if (handle == INVALID_HANDLE_VALUE) -- gitgitgadget
[PATCH 0/2] Fixup for js/mingw-o-append
The recent change mingw O_APPEND change breaks writing to named pipes on Windows. The first commit adds a new test to confirm the breakage and the second commit fixes the problem. These could be squashed together or we can just keep the fix and omit the test if that would be better. d641097589 (js/mingw-o-append) mingw: enable atomic O_APPEND The new mingw_open_append() routine successfully opens the client side of the named pipe, but the first write() to it fails with EBADF. Adding the FILE_WRITE_DATA corrects the problem. Signed-off-by: Jeff Hostetler jeffh...@microsoft.com [jeffh...@microsoft.com] Cc: j6t@kdbg.orgCc: johannes.schinde...@gmx.decc: gitster@pobox.comCc: p...@peff.net Jeff Hostetler (2): t0051: test GIT_TRACE to a windows named pipe mingw: fix mingw_open_append to work with named pipes Makefile | 1 + compat/mingw.c | 2 +- t/helper/test-tool.c | 3 ++ t/helper/test-tool.h | 3 ++ t/helper/test-windows-named-pipe.c | 72 ++ t/t0051-windows-named-pipe.sh | 17 +++ 6 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 t/helper/test-windows-named-pipe.c create mode 100755 t/t0051-windows-named-pipe.sh base-commit: d641097589160eb795127d8dbcb14c877c217b60 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-35%2Fjeffhostetler%2Ffixup-mingw-o-append-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-35/jeffhostetler/fixup-mingw-o-append-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/35 -- gitgitgadget
Re: [PATCH 1/8] trace2: create new combined trace facility
On 9/4/2018 6:30 PM, Junio C Hamano wrote: Stefan Beller writes: The API defines both fixed-field and printf-style functions. The trace2 performance tracing includes thread-specific function nesting and timings. So this only adds the new API, and we need to merge the TRACE into the TRACE2 later? If this is a rhetorical question implying that it would be best if the existing trace() were rewritten to be built on top of trace2() while building this series, especially before adding new callsites that directly use trace2(), I may share that feeling. I haven't studied this new round deeply enough to see how realistic it would be, though. I wanted to come up with a unified API that we liked and was sufficient to handle the default-key, performance-key, the new event-key (currently supporting JSON output), and any other formats/variants that we want (protobufs, syslog, etc). And hopefully get some agreement on it and see what else we want from it. And then look at converting the trace_printf() and trace_performance() calls to trace2. Clearly, I could just replace the existing printf style calls to trace2_printf's, but I thought it would be better to look at them and promote them to higher-level functions. For example, the trace_argv_printf() calls are generally used to dump the command line arguments for the current process or spawned child processes. I have trace2_start() and trace2_child_start() that captures the argv and additional information about it. (The "why" if you will.) So the trace_argv_* forms can go away. Likewise, all of the trace_performance() and trace_performance_since() can be converted to trace2_region_enter/_leave calls. And those forms can be removed from trace.[ch]. The net-net is that trace.[ch] shrinks in a short sequence of commits on top of my initial trace2 commit in a reroll of this patch series. (and replacing some of the demonstration commits in V1) Then I'll address the trace_printf_key() forms, since they write to alternate stream. Then I can delete trace.[ch]. And we can consider renaming trace2_* functions and/or GIT_TR2_* env vars if we want. I wanted to avoid rewriting trace.[ch] just to delete them later in the same patch series. Jeff
Re: [PATCH 1/8] trace2: create new combined trace facility
On 9/4/2018 6:12 PM, Stefan Beller wrote: Create GIT_TR2 trace-key to replace GIT_TRACE, GIT_TR2_PERFORMANCE to replace GIT_TRACE_PERFORMANCE, and a new trace-key GIT_TR2_EVENT to generate JSON data for telemetry purposes. Other structured formats can easily be added later using this new existing model. So the idea is to use the GIT_TR2 instead of GIT_TRACE and we get the same output as well as a new form of structured logging here? (Then GIT_TRACE could be retired, and we'd use the new API to add more events, which are also more structured as the API allows for more than just a string printed?) Define a higher-level event API that selectively writes to all of the new GIT_TR2_* targets (depending on event type) without needing to call different trace_printf*() or trace_performance_*() routines. The API defines both fixed-field and printf-style functions. The trace2 performance tracing includes thread-specific function nesting and timings. So this only adds the new API, and we need to merge the TRACE into the TRACE2 later? +++ b/trace2.c @@ -0,0 +1,1592 @@ [...] + +/* + * TODO remove this section header + */ Yes, please. +/* + * Compute a "unique" session id (SID) for the current process. All events + * from this process will have this label. If we were started by another + * git instance, use our parent's SID as a prefix and count the number of + * nested git processes. (This lets us track parent/child relationships + * even if there is an intermediate shell process.) How does this work with threading. From this description we can have two threads starting new child processes and they have the same ID (-2) Threads are not involved here. A git process computes its own unique session id. It is constructed from { [], , }. So in the following example, fetch spawned rev-list and gc. (I've stripped out fields irrelevant to this discussion.) "sid":"1536153920286494-12592", "argv":["C:\\work\\gfw\\git.exe","--exec-path=.","fetch","gh"] "sid":"1536153920286494-12592/1536153925520530-23024", "argv":["git","rev-list","--objects","--stdin", ...] "sid":"1536153920286494-12592/1536153926081533-23992", "argv":["git","gc","--auto"] So 2 child processes simultaneously spawned from 2 threads in the top-level git command, would still have unique SIDs since their PIDs are unique over the time interval of their execution. In the above example, if rev-list spawned a child git process, that child's SID would have 3 components (the prefix that it inherited plus its own time and pid): 1536153920286494-12592/1536153925520530-23024/- The above model works even if there is a shell command between the top-level git command and the child git processes. + +/* + * TODO remove this section header + */ This looks like a reoccurring pattern. Did you have a tool that needs these? Does the tool want to enforce some level of documentation on each function? No, this is just an that helps me see the different sections as different sections in the editor and helps me group like items. I might change that to a group-level comment that describes the overall concept/purpose of the section. Either way I'll get rid of the stars. + +/* + * Each thread (including the main thread) has a stack of nested regions. + * This is used to indent messages and provide nested perf times. The + * limit here is for simplicity. Note that the region stack is per-thread + * and not per-trace_key. What are the (nested) regions used for? I would imagine recursive operations, e.g. unpacking trees? Where am I supposed to read up on those? Unpacking trees would be a good usage. Duy did something similar in a recent patch series. Nested regions are intended for perf times. The 8th patch in this series demonstrates adding trace2_region_enter() and _leave() calls inside read_directory_recursive(). The second column here shows the elapsed time between the _enter and _leave. (Again, I've stripped out fields not currently relevant.) | region_enter | | status_untracked | region_enter | | ..read_directory_recursive: | region_enter | | read_directory_recursive:.github/ | region_leave | 0.95 | read_directory_recursive:.github/ | region_enter | | read_directory_recursive:block-sha1/ | region_leave | 0.86 | read_directory_recursive:block-sha1/ | region_enter | | read_directory_recursive:builtin/ | region_leave | 0.000716 | read_directory_recursive:builtin/ | region_enter | | read_directory_recursive:ci/ | region_enter | | ..read_directory_recursive:ci/util/ | region_leave | 0.87 | ..read_directory_recursive:ci/util/ | region_leave | 0.000250 |
[PATCH 4/8] trace2: demonstrate trace2 child process classification
From: Jeff Hostetler Classify editory, pager, and sub-process child processes. The former two can be used to identify interactive commands, for example. Signed-off-by: Jeff Hostetler --- editor.c | 1 + pager.c | 1 + sub-process.c | 1 + 3 files changed, 3 insertions(+) diff --git a/editor.c b/editor.c index 9a9b4e12d1..29707de198 100644 --- a/editor.c +++ b/editor.c @@ -66,6 +66,7 @@ int launch_editor(const char *path, struct strbuf *buffer, const char *const *en p.argv = args; p.env = env; p.use_shell = 1; + p.trace2_child_class = "editor"; if (start_command() < 0) return error("unable to start editor '%s'", editor); diff --git a/pager.c b/pager.c index a768797fcf..4168460ae9 100644 --- a/pager.c +++ b/pager.c @@ -100,6 +100,7 @@ void prepare_pager_args(struct child_process *pager_process, const char *pager) argv_array_push(_process->args, pager); pager_process->use_shell = 1; setup_pager_env(_process->env_array); + pager_process->trace2_child_class = "pager"; } void setup_pager(void) diff --git a/sub-process.c b/sub-process.c index 8d2a1707cf..3f4af93555 100644 --- a/sub-process.c +++ b/sub-process.c @@ -88,6 +88,7 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co process->out = -1; process->clean_on_exit = 1; process->clean_on_exit_handler = subprocess_exit_handler; + process->trace2_child_class = "subprocess"; err = start_command(process); if (err) { -- gitgitgadget
[PATCH 1/8] trace2: create new combined trace facility
From: Jeff Hostetler Create trace2 API allowing event-based tracing. This will hopefully eventually replace the existing trace API. Create GIT_TR2 trace-key to replace GIT_TRACE, GIT_TR2_PERFORMANCE to replace GIT_TRACE_PERFORMANCE, and a new trace-key GIT_TR2_EVENT to generate JSON data for telemetry purposes. Other structured formats can easily be added later using this new existing model. Define a higher-level event API that selectively writes to all of the new GIT_TR2_* targets (depending on event type) without needing to call different trace_printf*() or trace_performance_*() routines. The API defines both fixed-field and printf-style functions. The trace2 performance tracing includes thread-specific function nesting and timings. Signed-off-by: Jeff Hostetler --- Makefile |1 + cache.h |1 + trace2.c | 1592 ++ trace2.h | 214 4 files changed, 1808 insertions(+) create mode 100644 trace2.c create mode 100644 trace2.h diff --git a/Makefile b/Makefile index 5a969f5830..88ae7afdd4 100644 --- a/Makefile +++ b/Makefile @@ -974,6 +974,7 @@ LIB_OBJS += tag.o LIB_OBJS += tempfile.o LIB_OBJS += tmp-objdir.o LIB_OBJS += trace.o +LIB_OBJS += trace2.o LIB_OBJS += trailer.o LIB_OBJS += transport.o LIB_OBJS += transport-helper.o diff --git a/cache.h b/cache.h index 4d014541ab..38f0cd2ba0 100644 --- a/cache.h +++ b/cache.h @@ -9,6 +9,7 @@ #include "gettext.h" #include "convert.h" #include "trace.h" +#include "trace2.h" #include "string-list.h" #include "pack-revindex.h" #include "hash.h" diff --git a/trace2.c b/trace2.c new file mode 100644 index 00..13d5c85366 --- /dev/null +++ b/trace2.c @@ -0,0 +1,1592 @@ +#include "cache.h" +#include "config.h" +#include "json-writer.h" +#include "quote.h" +#include "run-command.h" +#include "sigchain.h" +#include "thread-utils.h" +#include "version.h" + +/* + * TODO remove this section header + */ + +static struct strbuf tr2sid_buf = STRBUF_INIT; +static int tr2sid_nr_git_parents; + +/* + * Compute a "unique" session id (SID) for the current process. All events + * from this process will have this label. If we were started by another + * git instance, use our parent's SID as a prefix and count the number of + * nested git processes. (This lets us track parent/child relationships + * even if there is an intermediate shell process.) + */ +static void tr2sid_compute(void) +{ + uint64_t us_now; + const char *parent_sid; + + if (tr2sid_buf.len) + return; + + parent_sid = getenv("SLOG_PARENT_SID"); + if (parent_sid && *parent_sid) { + const char *p; + for (p = parent_sid; *p; p++) + if (*p == '/') + tr2sid_nr_git_parents++; + + strbuf_addstr(_buf, parent_sid); + strbuf_addch(_buf, '/'); + tr2sid_nr_git_parents++; + } + + us_now = getnanotime() / 1000; + strbuf_addf(_buf, "%"PRIuMAX"-%"PRIdMAX, + (uintmax_t)us_now, (intmax_t)getpid()); + + setenv("SLOG_PARENT_SID", tr2sid_buf.buf, 1); +} + +/* + * TODO remove this section header + */ + +/* + * Each thread (including the main thread) has a stack of nested regions. + * This is used to indent messages and provide nested perf times. The + * limit here is for simplicity. Note that the region stack is per-thread + * and not per-trace_key. + */ +#define TR2_MAX_REGION_NESTING (100) +#define TR2_INDENT (2) +#define TR2_INDENT_LENGTH(ctx) (((ctx)->nr_open_regions - 1) * TR2_INDENT) + +#define TR2_MAX_THREAD_NAME (24) + +struct tr2tls_thread_ctx { + struct strbuf thread_name; + uint64_t us_start[TR2_MAX_REGION_NESTING]; + int nr_open_regions; + int thread_id; +}; + +static struct tr2tls_thread_ctx *tr2tls_thread_main; + +static pthread_mutex_t tr2tls_mutex; +static pthread_key_t tr2tls_key; +static int tr2tls_initialized; + +static struct strbuf tr2_dots = STRBUF_INIT; + +static int tr2_next_child_id; +static int tr2_next_thread_id; + +/* + * Create TLS data for the current thread. This gives us a place to + * put per-thread data, such as thread start time, function nesting + * and a per-thread label for our messages. + * + * We assume the first thread is "main". Other threads are given + * non-zero thread-ids to help distinguish messages from concurrent + * threads. + * + * Truncate the thread name if necessary to help with column align
[PATCH 5/8] trace2: demonstrate instrumenting do_read_index
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- read-cache.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/read-cache.c b/read-cache.c index 7b1354d759..7a31ac4da8 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1867,6 +1867,8 @@ static void tweak_split_index(struct index_state *istate) static void post_read_index_from(struct index_state *istate) { + trace2_data_intmax("index", "cache_nr", istate->cache_nr); + check_ce_order(istate); tweak_untracked_cache(istate); tweak_split_index(istate); @@ -2012,7 +2014,9 @@ int read_index_from(struct index_state *istate, const char *path, if (istate->initialized) return istate->cache_nr; + trace2_region_enter("do_read_index"); ret = do_read_index(istate, path, 0); + trace2_region_leave("do_read_index"); trace_performance_since(start, "read cache %s", path); split_index = istate->split_index; @@ -2028,7 +2032,9 @@ int read_index_from(struct index_state *istate, const char *path, base_oid_hex = oid_to_hex(_index->base_oid); base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_oid_hex); + trace2_region_enter("do_read_index"); ret = do_read_index(split_index->base, base_path, 1); + trace2_region_leave("do_read_index"); if (oidcmp(_index->base_oid, _index->base->oid)) die("broken index, expect %s in %s, got %s", base_oid_hex, base_path, -- gitgitgadget
[PATCH 0/8] WIP: trace2: a new trace facility
This patch series contains a new trace2 facility that hopefully addresses the recent trace- and structured-logging-related discussions. The intent is to eventually replace the existing trace_ routines (or to route them to the new trace2_ routines) as time permits. This draft adds new trace2_ calls and leaves most of the original trace_ calls in place. Subsequent drafts will address this. This version should be considered a replacement for my earlier structured logging patch series [1]. It addresses Jonathan Nieder's, Ben Peart's, Peff's, and Junio's comments [2,3,4,5] about merging the existing tracing routines, ease of use, progressive logging rather than logging at the end of the program, hiding all JSON details inside the trace2_ routines, and leaving an opening for additional formats (protobuf or whatever). It also adds a nested performance tracing feature similar to Duy's suggestion in [6]. This version adds per-thread nesting and marks each event with a thread name. [1] https://public-inbox.org/git/20180713165621.52017-1-...@jeffhostetler.com/ [2] https://public-inbox.org/git/20180821044724.ga219...@aiede.svl.corp.google.com/ [3] https://public-inbox.org/git/13302a8c-a114-c3a7-65df-55f47f902...@gmail.com/ [4] https://public-inbox.org/git/20180814195456.ge28...@sigill.intra.peff.net/ [5] https://public-inbox.org/git/xmqqeff0zn53@gitster-ct.c.googlers.com/ [6] https://public-inbox.org/git/20180818144128.19361-2-pclo...@gmail.com/ Cc: gitster@pobox.comCc: peff@peff.netCc: peartben@gmail.comCc: jrnieder@gmail.comCc: pclo...@gmail.com Jeff Hostetler (8): trace2: create new combined trace facility trace2: add trace2 to main trace2: demonstrate trace2 regions in wt-status trace2: demonstrate trace2 child process classification trace2: demonstrate instrumenting do_read_index trace2: demonstrate instrumenting threaded preload_index trace2: demonstrate setting sub-command parameter in checkout trace2: demonstrate use of regions in read_directory_recursive Makefile |1 + builtin/checkout.c |5 + cache.h|1 + compat/mingw.h |3 +- dir.c |3 + editor.c |1 + git-compat-util.h |7 + git.c |9 +- pager.c|1 + preload-index.c| 10 + read-cache.c |6 + repository.c |2 + run-command.c |8 +- run-command.h |5 + sub-process.c |1 + trace2.c | 1592 trace2.h | 214 ++ usage.c|5 + wt-status.c| 14 +- 19 files changed, 1882 insertions(+), 6 deletions(-) create mode 100644 trace2.c create mode 100644 trace2.h base-commit: 2f743933341f27603550fbf383a34dfcfd38 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-29%2Fjeffhostetler%2Fml-trace2-v0-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-29/jeffhostetler/ml-trace2-v0-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/29 -- gitgitgadget
[PATCH 6/8] trace2: demonstrate instrumenting threaded preload_index
From: Jeff Hostetler Add trace2_region_enter() and _leave() around the entire preload_index() call. Also add thread-specific events in the preload_thread() threadproc. Signed-off-by: Jeff Hostetler --- preload-index.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/preload-index.c b/preload-index.c index 71cd2437a3..2483d92c61 100644 --- a/preload-index.c +++ b/preload-index.c @@ -40,10 +40,14 @@ static void *preload_thread(void *_data) struct cache_entry **cep = index->cache + p->offset; struct cache_def cache = CACHE_DEF_INIT; + trace2_thread_start("preload_thread"); + nr = p->nr; if (nr + p->offset > index->cache_nr) nr = index->cache_nr - p->offset; + trace2_printf("preload {offset %7d}{count %7d}", p->offset, nr); + do { struct cache_entry *ce = *cep++; struct stat st; @@ -70,6 +74,9 @@ static void *preload_thread(void *_data) mark_fsmonitor_valid(ce); } while (--nr > 0); cache_def_clear(); + + trace2_thread_exit(); + return NULL; } @@ -118,6 +125,9 @@ int read_index_preload(struct index_state *index, { int retval = read_index(index); + trace2_region_enter("preload_index"); preload_index(index, pathspec); + trace2_region_leave("preload_index"); + return retval; } -- gitgitgadget
[PATCH 3/8] trace2: demonstrate trace2 regions in wt-status
From: Jeff Hostetler Add trace2_region_enter() and trace2_region_leave() calls around the various phases of a status scan. This gives elapsed time for each phase in the GIT_TR2_PERFORMANCE trace target. Signed-off-by: Jeff Hostetler --- wt-status.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/wt-status.c b/wt-status.c index 5ffab61015..9e37a73c73 100644 --- a/wt-status.c +++ b/wt-status.c @@ -726,13 +726,23 @@ static void wt_status_collect_untracked(struct wt_status *s) void wt_status_collect(struct wt_status *s) { + trace2_region_enter("status_worktrees"); wt_status_collect_changes_worktree(s); + trace2_region_leave("status_worktrees"); - if (s->is_initial) + if (s->is_initial) { + trace2_region_enter("status_initial"); wt_status_collect_changes_initial(s); - else + trace2_region_leave("status_initial"); + } else { + trace2_region_enter("status_index"); wt_status_collect_changes_index(s); + trace2_region_leave("status_index"); + } + + trace2_region_enter("status_untracked"); wt_status_collect_untracked(s); + trace2_region_leave("status_untracked"); } static void wt_longstatus_print_unmerged(struct wt_status *s) -- gitgitgadget
[PATCH 7/8] trace2: demonstrate setting sub-command parameter in checkout
From: Jeff Hostetler Add trace2_param() events in checkout to report whether the command is switching branches or just checking out a file. Signed-off-by: Jeff Hostetler --- builtin/checkout.c | 5 + 1 file changed, 5 insertions(+) diff --git a/builtin/checkout.c b/builtin/checkout.c index 29ef50013d..0934587781 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -251,6 +251,8 @@ static int checkout_paths(const struct checkout_opts *opts, int errs = 0; struct lock_file lock_file = LOCK_INIT; + trace2_param("subcommand", (opts->patch_mode ? "patch" : "path")); + if (opts->track != BRANCH_TRACK_UNSPECIFIED) die(_("'%s' cannot be used with updating paths"), "--track"); @@ -828,6 +830,9 @@ static int switch_branches(const struct checkout_opts *opts, void *path_to_free; struct object_id rev; int flag, writeout_error = 0; + + trace2_param("subcommand", "switch"); + memset(_branch_info, 0, sizeof(old_branch_info)); old_branch_info.path = path_to_free = resolve_refdup("HEAD", 0, , ); if (old_branch_info.path) -- gitgitgadget
[PATCH 8/8] trace2: demonstrate use of regions in read_directory_recursive
From: Jeff Hostetler Add trace2_region_enter() and _leave() calls inside read_directory_recursive() to show nesting and per-level timing. TODO Drop this commit or ifdef the calls. It generates too much data to be in the production version. It is included in this patch series for illustration purposes only. It is typical of what a developer might do during a perf investigation. Signed-off-by: Jeff Hostetler --- dir.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/dir.c b/dir.c index aceb0d4869..c7c0654da5 100644 --- a/dir.c +++ b/dir.c @@ -1951,6 +1951,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, strbuf_add(, base, baselen); + trace2_region_enter("read_directory_recursive:%.*s", baselen, base); + if (open_cached_dir(, dir, untracked, istate, , check_only)) goto out; @@ -2042,6 +2044,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, close_cached_dir(); out: strbuf_release(); + trace2_region_leave("read_directory_recursive:%.*s", baselen, base); return dir_state; } -- gitgitgadget
[PATCH 2/8] trace2: add trace2 to main
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- compat/mingw.h| 3 +-- git-compat-util.h | 7 +++ git.c | 9 - repository.c | 2 ++ run-command.c | 8 +++- run-command.h | 5 + usage.c | 5 + 7 files changed, 35 insertions(+), 4 deletions(-) diff --git a/compat/mingw.h b/compat/mingw.h index 571019d0bd..606402faeb 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -144,8 +144,7 @@ static inline int fcntl(int fd, int cmd, ...) errno = EINVAL; return -1; } -/* bash cannot reliably detect negative return codes as failure */ -#define exit(code) exit((code) & 0xff) + #define sigemptyset(x) (void)0 static inline int sigaddset(sigset_t *set, int signum) { return 0; } diff --git a/git-compat-util.h b/git-compat-util.h index 5f2e90932f..c0901d9ec6 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1219,6 +1219,13 @@ static inline int is_missing_file_error(int errno_) extern int cmd_main(int, const char **); +/* + * Intercept all calls to exit() and route them to trace2 to + * optionally emit a message before calling the real exit(). + */ +int trace2_exit_fl(const char *file, int line, int code); +#define exit(code) exit(trace2_exit_fl(__FILE__, __LINE__, (code))) + /* * You can mark a stack variable with UNLEAK(var) to avoid it being * reported as a leak by tools like LSAN or valgrind. The argument diff --git a/git.c b/git.c index c27c38738b..cc56279a8c 100644 --- a/git.c +++ b/git.c @@ -331,6 +331,8 @@ static int handle_alias(int *argcp, const char ***argv) argv_array_push(, alias_string + 1); argv_array_pushv(, (*argv) + 1); + trace2_alias(alias_command, child.args.argv); + ret = run_command(); if (ret >= 0) /* normal exit */ exit(ret); @@ -365,6 +367,8 @@ static int handle_alias(int *argcp, const char ***argv) /* insert after command name */ memcpy(new_argv + count, *argv + 1, sizeof(char *) * *argcp); + trace2_alias(alias_command, new_argv); + *argv = new_argv; *argcp += count - 1; @@ -413,6 +417,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) setup_work_tree(); trace_argv_printf(argv, "trace: built-in: git"); + trace2_command(p->cmd); validate_cache_entries(_index); status = p->fn(argc, argv, prefix); @@ -719,6 +724,8 @@ int cmd_main(int argc, const char **argv) cmd = slash + 1; } + trace2_start(argv); + trace_command_performance(argv); /* @@ -782,5 +789,5 @@ int cmd_main(int argc, const char **argv) fprintf(stderr, _("failed to run command '%s': %s\n"), cmd, strerror(errno)); - return 1; + return trace2_exit(1); } diff --git a/repository.c b/repository.c index 5dd1486718..c169f61ccd 100644 --- a/repository.c +++ b/repository.c @@ -113,6 +113,8 @@ out: void repo_set_worktree(struct repository *repo, const char *path) { repo->worktree = real_pathdup(path, 1); + + trace2_worktree(repo->worktree); } static int read_and_verify_repository_format(struct repository_format *format, diff --git a/run-command.c b/run-command.c index 84b883c213..e833d9a277 100644 --- a/run-command.c +++ b/run-command.c @@ -706,6 +706,7 @@ fail_pipe: cmd->err = fderr[0]; } + trace2_child_start(cmd); trace_run_command(cmd); fflush(NULL); @@ -911,6 +912,8 @@ fail_pipe: #endif if (cmd->pid < 0) { + trace2_child_exit(cmd, -1); + if (need_in) close_pair(fdin); else if (cmd->in) @@ -949,13 +952,16 @@ fail_pipe: int finish_command(struct child_process *cmd) { int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0); + trace2_child_exit(cmd, ret); child_process_clear(cmd); return ret; } int finish_command_in_signal(struct child_process *cmd) { - return wait_or_whine(cmd->pid, cmd->argv[0], 1); + int ret = wait_or_whine(cmd->pid, cmd->argv[0], 1); + trace2_child_exit(cmd, ret); + return ret; } diff --git a/run-command.h b/run-command.h index 3932420ec8..a91206b08c 100644 --- a/run-command.h +++ b/run-command.h @@ -12,6 +12,11 @@ struct child_process { struct argv_array args; struct argv_array env_array; pid_t pid; + + int trace2_child_id; + uint64_t trace2_child_us_start; + const char *trace2_child_class; + /* * Using .in, .out, .err: * - Specify 0 for no redirections (child inherits stdin, stdout, diff --git a/usage.c b/usage.c index cc803336bd..1838c46d20 100644 --- a/usage.c +++ b
Re: [PATCH v1 00/25] RFC: structured logging
On 8/28/2018 1:38 PM, Junio C Hamano wrote: g...@jeffhostetler.com writes: From: Jeff Hostetler This RFC patch series adds structured logging to git. The motivation, ... Jeff Hostetler (25): structured-logging: design document structured-logging: add STRUCTURED_LOGGING=1 to Makefile structured-logging: add structured logging framework structured-logging: add session-id to log events structured-logging: set sub_command field for branch command structured-logging: set sub_command field for checkout command structured-logging: t0420 basic tests structured-logging: add detail-event facility structured-logging: add detail-event for lazy_init_name_hash structured-logging: add timer facility structured-logging: add timer around do_read_index structured-logging: add timer around do_write_index structured-logging: add timer around wt-status functions structured-logging: add timer around preload_index structured-logging: t0420 tests for timers structured-logging: add aux-data facility structured-logging: add aux-data for index size structured-logging: add aux-data for size of sparse-checkout file structured-logging: t0420 tests for aux-data structured-logging: add structured logging to remote-curl structured-logging: add detail-events for child processes structured-logging: add child process classification structured-logging: t0420 tests for child process detail events structured-logging: t0420 tests for interacitve child_summary structured-logging: add config data facility I noticed that Travis job has been failing with a trivially fixable failure, so I'll push out today's 'pu' with the attached applied on top. This may become unapplicable to the code when issues raised in recent reviews addressed, though. structured-logging.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/structured-logging.c b/structured-logging.c index 0e3f79ee48..78abcd2e59 100644 --- a/structured-logging.c +++ b/structured-logging.c @@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name) * the cmd_() and/or it may be too early to force a * lazy load. */ - if (my__command_name) - free(my__command_name); + free(my__command_name); my__command_name = xstrdup(command_name); } @@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name) * the cmd_() and/or it may be too early to force a * lazy load. */ - if (my__sub_command_name) - free(my__sub_command_name); + free(my__sub_command_name); my__sub_command_name = xstrdup(sub_command_name); } Sorry about that. Let me withdraw the current series. I'm working on a new version that addresses the comments on the mailing list. It combines my logging with a variation on the nested perf logging that Duy suggested and the consolidation that you were talking about last week. Jeff
Re: [PATCH v7 09/16] fetch: support filters
On 8/19/2018 7:24 AM, Duy Nguyen wrote: On Fri, Dec 8, 2017 at 5:00 PM Jeff Hostetler wrote: From: Jeff Hostetler Teach fetch to support filters. This is only allowed for the remote configured in extensions.partialcloneremote. Signed-off-by: Jonathan Tan --- builtin/fetch.c | 23 +-- connected.c | 2 ++ remote-curl.c | 6 ++ t/t5500-fetch-pack.sh | 36 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 1b1f039..14aab71 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -18,6 +18,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { N_("git fetch [] [ [...]]"), @@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND; static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static struct list_objects_filter_options filter_options; static int git_fetch_config(const char *k, const char *v, void *cb) { @@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = { TRANSPORT_FAMILY_IPV4), OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"), TRANSPORT_FAMILY_IPV6), + OPT_PARSE_LIST_OBJECTS_FILTER(_options), Documentation is missing. Please add something to git-fetch.txt (or fetch-options.txt) about this option. I would make a patch but I don't know enough about this to write and I'm in the middle of something else. Documentation for --filter= (and --no-filter) were added to rev-list-options.txt. The opening paragraph talks about --objects which would need to be omitted for fetch and clone, but the rest of that text is relevant. I'll push up a patch shortly. Thanks Jeff
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On 8/14/2018 2:44 PM, Stefan Beller wrote: On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen wrote: On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler wrote: I'm looking at adding code to my SLOG (better name suggestions welcome) patch series to eventually replace the existing git_trace facility. Complement maybe. Replace, please no. I'd rather not stare at json messages. From the sidelines: We'd only need one logging infrastructure in place, as the formatting would be done as a later step? For local operations we'd certainly find better formatting than json, and we figured that we might end up desiring ProtocolBuffers[1] instead of JSon, so if it would be easy to change the output of the structured logging easily that would be great. But AFAICT these series are all about putting the sampling points into the code base, so formatting would be orthogonal to it? Stefan [1] https://developers.google.com/protocol-buffers/ Last time I checked, protocol-buffers has a C++ binding but not a C binding. I've not had a chance to use pbuffers, so I have to ask what advantages would they have over JSON or some other similar self-describing format? And/or would it be possible for you to tail the json log file and convert it to whatever format you preferred? It seems like the important thing is to capture structured data (whatever the format) to disk first. Jeff
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On 8/13/2018 6:41 PM, Junio C Hamano wrote: Jeff King writes: I can buy the argument that it's nice to have some form of profiling that works everywhere, even if it's lowest-common-denominator. I just wonder if we could be investing effort into tooling around existing solutions that will end up more powerful and flexible in the long run. Another thing I noticed is that the codepaths we would find interesting to annotate with trace_performance_* stuff often overlaps with the "slog" thing. If the latter aims to eventually replace GIT_TRACE (and if not, I suspect there is not much point adding it in the first place), perhaps we can extend it to also cover the need of these trace_performance_* calls, so that we do not have to carry three different tracing mechanisms. I'm looking at adding code to my SLOG (better name suggestions welcome) patch series to eventually replace the existing git_trace facility. And I would like to have a set of nested messages like Duy has proposed be a part of that. In an independent effort I've found the nested messages being very helpful in certain contexts. They are not a replacement for the various platform tools, like PerfView and friends as discussed earlier on this thread, but then again I can ask a customer to turn a knob and run it again and send me the output and hopefully get a rough idea of the problem -- without having them install a bunch of perf tools. Jeff
Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
On 8/13/2018 2:14 PM, Matthew DeVore wrote: Teach list-objects the "tree:0" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:0 - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also consider only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. The name "tree:0" allows later filtering based on depth, i.e. "tree:1" would filter out all but the root tree and blobs. In order to avoid confusion between 0 and capital O, the documentation was worded in a somewhat round-about way that also hints at this future improvement to the feature. Signed-off-by: Matthew DeVore --- Documentation/rev-list-options.txt | 3 ++ list-objects-filter-options.c | 4 +++ list-objects-filter-options.h | 1 + list-objects-filter.c | 50 ++ t/t5317-pack-objects-filter-objects.sh | 27 ++ t/t5616-partial-clone.sh | 27 ++ t/t6112-rev-list-filters-objects.sh| 13 +++ 7 files changed, 125 insertions(+) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7b273635d..9e351ec2a 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -743,6 +743,9 @@ specification contained in . A debug option to help with future "partial clone" development. This option specifies how missing objects are handled. + +The form '--filter=tree:' omits all blobs and trees deeper than + from the root tree. Currently, only =0 is supported. ++ The form '--missing=error' requests that rev-list stop with an error if a missing object is encountered. This is the default action. + diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index c0e2bd6a0..a28382940 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter( return 0; } + } else if (!strcmp(arg, "tree:0")) { + filter_options->choice = LOFC_TREE_NONE; + return 0; + } else if (skip_prefix(arg, "sparse:oid=", )) { struct object_context oc; struct object_id sparse_oid; diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index a61f8..af64e5c66 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -10,6 +10,7 @@ enum list_objects_filter_choice { LOFC_DISABLED = 0, LOFC_BLOB_NONE, LOFC_BLOB_LIMIT, + LOFC_TREE_NONE, LOFC_SPARSE_OID, LOFC_SPARSE_PATH, LOFC__COUNT /* must be last */ diff --git a/list-objects-filter.c b/list-objects-filter.c index a0ba78b20..8e3caf5bf 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -80,6 +80,55 @@ static void *filter_blobs_none__init( return d; } +/* + * A filter for list-objects to omit ALL trees and blobs from the traversal. + * Can OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_trees_none_data { + struct oidset *omits; +}; + +static enum list_objects_filter_result filter_trees_none( + enum list_objects_filter_situation filter_situation, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_trees_none_data *filter_data = filter_data_; + + switch (filter_situation) { + default: + die("unknown filter_situation"); + return LOFR_ZERO; + + case LOFS_BEGIN_TREE: + case LOFS_BLOB: + if (filter_data->omits) + oidset_insert(filter_data->omits, >oid); + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */ + + case LOFS_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + } +} There are a couple of options here: [] If really want to omit all trees and blobs (and we DO NOT want the oidset of everything omitted), then we might be able to shortcut the traversal and speed things up. {} add a LOFR_SKIP_TREE bit to list_objects_filter_result {} test this bit process_tree() and avoid the init_tree_desc() and the while loop and some adjacent setup/tear-down code. {} make this filter something like: case LOFS_BEGIN_TREE: if (filter_data->omits) { oidset_insert(filter_data->omits, >oid); return LOFR_MARK_SEEN; /* ... (hard omit) */ } else
Re: [PATCH] mingw: enable atomic O_APPEND
On 8/13/2018 3:02 PM, Johannes Sixt wrote: The Windows CRT implements O_APPEND "manually": on write() calls, the file pointer is set to EOF before the data is written. Clearly, this is not atomic. And in fact, this is the root cause of failures observed in t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where different processes write to the same trace file simultanously; it also occurred in t5400-send-pack.sh, but there it was worked around in 71406ed4d6 ("t5400: avoid concurrent writes into a trace file", 2017-05-18). Fortunately, Windows does support atomic O_APPEND semantics using the file access mode FILE_APPEND_DATA. Provide an implementation that does. This implementation is minimal in such a way that it only implements the open modes that are actually used in the Git code base. Emulation for other modes can be added as necessary later. To become aware of the necessity early, the unusal error ENOSYS is reported if an unsupported mode is encountered. Diagnosed-by: Johannes Schindelin Helped-by: Jeff Hostetler Signed-off-by: Johannes Sixt --- compat/mingw.c | 41 +++-- 1 file changed, 39 insertions(+), 2 deletions(-) [...] This looks good. Thanks for following up on this. Jeff
Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none
On 8/10/2018 7:06 PM, Matthew DeVore wrote: Teach list-objects the "tree:none" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:none - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also consider only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. Signed-off-by: Matthew DeVore --- Documentation/rev-list-options.txt | 2 ++ list-objects-filter-options.c | 4 +++ list-objects-filter-options.h | 1 + list-objects-filter.c | 49 +++--- t/t5317-pack-objects-filter-objects.sh | 27 ++ t/t5616-partial-clone.sh | 27 ++ t/t6112-rev-list-filters-objects.sh| 13 +++ 7 files changed, 110 insertions(+), 13 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7b273635d..68b4b9552 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -743,6 +743,8 @@ specification contained in . A debug option to help with future "partial clone" development. This option specifies how missing objects are handled. + +The form '--filter=tree:none' omits all blobs and trees. ++ The form '--missing=error' requests that rev-list stop with an error if a missing object is encountered. This is the default action. + diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index c0e2bd6a0..523cb00a0 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter( return 0; } + } else if (!strcmp(arg, "tree:none")) { + filter_options->choice = LOFC_TREE_NONE; + return 0; + } else if (skip_prefix(arg, "sparse:oid=", )) { struct object_context oc; struct object_id sparse_oid; diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index a61f8..af64e5c66 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -10,6 +10,7 @@ enum list_objects_filter_choice { LOFC_DISABLED = 0, LOFC_BLOB_NONE, LOFC_BLOB_LIMIT, + LOFC_TREE_NONE, LOFC_SPARSE_OID, LOFC_SPARSE_PATH, LOFC__COUNT /* must be last */ diff --git a/list-objects-filter.c b/list-objects-filter.c index a0ba78b20..22c894093 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -26,38 +26,45 @@ #define FILTER_SHOWN_BUT_REVISIT (1<<21) /* - * A filter for list-objects to omit ALL blobs from the traversal. - * And to OPTIONALLY collect a list of the omitted OIDs. + * A filter for list-objects to omit ALL blobs from the traversal, and possibly + * trees as well. + * Can OPTIONALLY collect a list of the omitted OIDs. */ -struct filter_blobs_none_data { +struct filter_none_of_type_data { + /* blobs are always omitted */ + unsigned omit_trees : 1; struct oidset *omits; }; I'm not sure I'd convert the existing filter types. When I created this file, I created a set of function pairs for each filter type: filter_() and filter___init() with the latter being added to the s_filters[] array and created a choice enum having corresponding values LOFC_ Here you're adding a new _init() and LOFC_ key, but mapping both the original "blob:none" and the new "tree:none" to a combined filter function and blends these 2 modes. Style-wise, I'd keep the original filters as they were and add a new function pair for the new tree:none filter. Then you can simplify the logic inside your new filter. For example, in your filter "filter_data->omit_trees" will always be true, so you can just do the "if (filter_data->omits) oidset_insert(...); return _SEEN" and not have the fallthru stuff -- or get rid of the asserts() and put the case labels together. One of the things I wanted to do (when I found some free time) was to add a "tree:none" and maybe a "tree:root" filter. (The latter only including the root trees associated with the fetched commits, since there are/were some places where we implicitly also load the root tree when loading the commit object.) So in that vein, it might be that we would want a "tree:" filter instead with 0 = none and 1 = root. I wasn't ready to propose that when I did the filtering, but I had that in mind. (And is partially why I suggest keeping your new filter independent of the existing ones.) Jeff -static enum list_objects_filter_result filter_blobs_none( +static enum list_objects_filter_result filter_none_of_type(
Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems
On 8/12/2018 5:07 AM, Nguyễn Thái Ngọc Duy wrote: Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. This patch is tested with vim-colorschemes repository on a JFS partition with case insensitive support on Linux. This repository has two files darkBlue.vim and darkblue.vim. Signed-off-by: Nguyễn Thái Ngọc Duy --- v4 removes nr_duplicates (and fixes that false warning Szeder reported). It also hints about case insensitivity as a cause of problem because it's most likely the case when this warning shows up. builtin/clone.c | 1 + cache.h | 1 + entry.c | 28 t/t5601-clone.sh | 8 +++- unpack-trees.c | 28 unpack-trees.h | 1 + 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = _index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, +clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..c70340df8e 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,31 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i; + + ce->ce_flags |= CE_MATCHED; + +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if (dup->ce_stat_data.sd_ino == st->st_ino) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +#endif +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +480,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, ); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd0680f11e..443df048ef 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -359,6 +359,12 @@ static int check_updates(struct unpack_trees_options *o) state.refresh_cache = 1; state.istate = index; + if (o->clone) { + state.clone = 1; + for (i = 0; i < index->cache_nr; i++) + index->cache[i]->ce_flags &= ~CE_MATCHED; + } + progress = get_progress(o); if (o->update) @@ -423,6 +429,28 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(); if (o->update)
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:51 PM, Jeff Hostetler wrote: On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) d'oh s/GENERIC_READ/GENERIC_WRITE/ Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
On 8/9/2018 10:23 AM, Jeff King wrote: On Wed, Aug 08, 2018 at 05:41:10PM -0700, Junio C Hamano wrote: If we have an equivalence-class hashmap and feed it inodes (or again, some system equivalent) as the keys, we should get buckets of collisions. I guess one way to get "some system equivalent" that can be used as the last resort, when there absolutely is no inum equivalent, is to rehash the working tree file that shouldn't be there when we detect a collision. If we found that there is something when we tried to write out "Foo.txt", if we open "Foo.txt" on the working tree and hash-object it, we should find the matching blob somewhere in the index _before_ "Foo.txt". On a case-insensitive filesytem, it may well be "foo.txt", but we do not even have to know "foo.txt" and "Foo.txt" only differ in case. Clever. You might still run into false positives when there is duplicated content in the repository (especially, say, zero-length files). But the fact that you only do the hashing on known duplicates helps with that. I worry that the false positives make this a non-starter. I mean, if clone creates files 'A' and 'B' (both equal) and then tries to create 'b', would the collision code reports that 'b' collided with 'A' because that was the first OID match? Ideally with this scheme we'd have to search the entire index prior to 'b' and then report that 'b' collided with either 'A' or 'B'. Neither message instills confidence. And there's no way to prefer answer 'B' over 'A' without using knowledge of the FS name mangling/aliasing rules -- unless we want to just assume ignore-case for this iteration. One of the things I did like about the equivalence-class approach is that it can be done in a single linear pass in the worst case. Whereas anything that searches when we see a collision is quite likely to be quadratic. But as I said before, it may not be worth worrying too much about that for an error code path where we expect the number of collisions to be small. -Peff Sorry to be paranoid, but I have an index with 3.5M entries, the word "quadratic" rings all kinds of alarm bells for me. :-) Granted, we expect the number of collisions to be small, but searching back for each collision over the already-populated portion of the index could be expensive. Jeff
Re: [PATCH v1 01/25] structured-logging: design document
On 8/3/2018 11:26 AM, Ben Peart wrote: On 7/13/2018 12:55 PM, g...@jeffhostetler.com wrote: From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- Documentation/technical/structured-logging.txt | 816 + 1 file changed, 816 insertions(+) create mode 100644 Documentation/technical/structured-logging.txt diff --git a/Documentation/technical/structured-logging.txt b/Documentation/technical/structured-logging.txt new file mode 100644 index 000..794c614 --- /dev/null +++ b/Documentation/technical/structured-logging.txt @@ -0,0 +1,816 @@ +Structured Logging +== + +Structured Logging (SLOG) is an optional feature to allow Git to +generate structured log data for executed commands. This includes +command line arguments, command run times, error codes and messages, +child process information, time spent in various critical functions, +and repository data-shape information. Data is written to a target +log file in JSON[1,2,3] format. + +SLOG is disabled by default. Several steps are required to enable it: + +1. Add the compile-time flag "STRUCTURED_LOGGING=1" when building git + to include the SLOG routines in the git executable. + Is the intent to remove this compile-time flag before this is merged? With it off by default in builds, the audience for this is limited to those who build their own/custom versions of git. I can see other organizations wanting to use this that don't have a custom fork of git they build and install on their users machines. Like the GIT_TRACE mechanism today, I think this should be compiled in but turned off via the default settings by default. I would like to get rid of this compile-time flag and just have it be available to those who want to use it. And defaulted to off. But I wasn't sure what kind of reaction or level of interest this feature would receive from the mailing list. +2. Set "slog.*" config settings[5] to enable SLOG in your repo. + + +Motivation +== + +Git users may be faced with scenarios that are surprisingly slow or +produce unexpected results. And Git developers may have difficulty +reproducing these experiences. Structured logging allows users to +provide developers with additional usage, performance and error data +that can help diagnose and debug issues. + +Many Git hosting providers and users with many developers have bespoke +efforts to help troubleshoot problems; for example, command wrappers, +custom pre- and post-command hooks, and custom instrumentation of Git +code. These are inefficient and/or difficult to maintain. The goal +of SLOG is to provide this data as efficiently as possible. + +And having structured rather than free format log data, will help +developers with their analysis. + + +Background (Git Merge 2018 Barcelona) += + +Performance and/or error logging was discussed during the contributor's +summit in Barcelona. Here are the relevant notes from the meeting +minutes[6]. + +> Performance misc (Ævar) +> --- +> [...] +> - central error reporting for git +> - `git status` logging +> - git config that collects data, pushes to known endpoint with `git push` +> - pre_command and post_command hooks, for logs +> - `gvfs diagnose` that looks at packfiles, etc +> - detect BSODs, etc +> - Dropbox writes out json with index properties and command-line +> information for status/fetch/push, fork/execs external tool to upload +> - windows trace facility; would be nice to have cross-platform +> - would hosting providers care? +> - zipfile of logs to give when debugging +> - sanitizing data is harder +> - more in a company setting +> - fileshare to upload zipfile +> - most of the errors are proxy when they shouldn't, wrong proxy, proxy +> specific to particular URL; so upload endpoint wouldn't work +> - GIT_TRACE is supposed to be that (for proxy) +> - but we need more trace variables +> - series to make tracing cheaper +> - except that curl selects the proxy +> - trace should have an API, so it can call an executable +> - dump to .git/traces/... and everything else happens externally +> - tools like visual studio can't set GIT_TRACE, so +> - sourcetree has seen user environments where commands just take forever +> - third-party tools like perf/strace - could we be better leveraging those? +> - distribute turn-key solution to handout to collect more data? + While it makes sense to have clear goals in the design document, the motivation and background sections feel somehow out of place. I'd recommend you clearly articulate the design goals and drop the background data that led you to the goals. good point. thanks. + +A Quick Example +=== + +Note: JSON pretty-printing is enabled in all of the exa
Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems
On 8/7/2018 3:31 PM, Junio C Hamano wrote: Nguyễn Thái Ngọc Duy writes: One nice thing about this is we don't need platform specific code for detecting the duplicate entries. I think ce_match_stat() works even on Windows. And it's now equally expensive on all platforms :D ce_match_stat() may not be a very good measure to see if two paths refer to the same file, though. After a fresh checkout, I would not be surprised if two completely unrelated paths have the same size and have same mtime/ctime. In its original use case, i.e. "I have one specific path in mind and took a snapshot of its metadata earlier. Is it still the same, or has it been touched?", that may be sufficient to detect that the path has been _modified_, but without reliable inum, it may be a poor measure to say two paths refer to the same. I agree with Junio on this one. The mtime values are sloppy at best. On FAT file systems, they have 2 second resolution. Even NTFS IIRC has only 100ns resolution, so we might get a lot of false matches using this technique, right? It might be better to build an equivalence-class hash-map for the colliding entries. Compute a "normalized" version of the pathname (such as convert to lowercase, strip final-dots/spaces, strip the digits following tilda of a shortname, and etc for the MAC's UTF-isms). Then when you rescan the index entries to find the matches, apply the equivalence operator on the pathname and do the hashmap lookup. When you find a match, you have a "potential" collider pair (I say potential only because of the ambiguity of shortnames). Then we can use inum/file-index/whatever to see if they actually collide. Jeff
Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
On 8/3/2018 2:53 PM, Jeff King wrote: On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote: Maybe. It might not work as ino_t. Or it might be expensive to get. Or maybe it's simply impossible. I don't know much about Windows. Some searching implies that NTFS does have a "file index" concept which is supposed to be unique. This is hard and/or expensive on Windows. Yes, you can get the "file index" values for an open file handle with a cost similar to an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent to the opendir/readdir routines), don't give you that data. So we'd have to scan the directory and then open and stat each file. This is terribly expensive on Windows -- and the reason we have the fscache layer (in the GfW version) to intercept the lstat() calls whenever possible. I think that high cost might be OK for our purposes here. This code would _only_ kick in during a clone, and then only on the error path once we knew we had a collision during the checkout step. Good point. I've confirmed that the "file index" values can be used to determine whether 2 path names are equivalent under NTFS for case variation, final-dot/space, and short-names vs long-names. I ran out of time this morning to search the directory for equivalent paths. I'll look at that shortly. Jeff
Re: [PATCH/RFC] clone: report duplicate entries on case-insensitive filesystems
On 8/2/2018 5:28 PM, Jeff King wrote: On Thu, Aug 02, 2018 at 02:14:30PM -0700, Junio C Hamano wrote: Jeff King writes: I also wonder if Windows could return some other file-unique identifier that would work in place of an inode here. That would be pretty easy to swap in via an #ifdef's helper function. I'd be OK shipping without that and letting Windows folks fill it in later (as long as we do not do anything too stupid until then, like claim all of the inode==0 files are the same). Yeah, but such a useful file-unique identifier would probably be used in place of inum in their (l)stat emulation already, if exists, no? Maybe. It might not work as ino_t. Or it might be expensive to get. Or maybe it's simply impossible. I don't know much about Windows. Some searching implies that NTFS does have a "file index" concept which is supposed to be unique. This is hard and/or expensive on Windows. Yes, you can get the "file index" values for an open file handle with a cost similar to an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent to the opendir/readdir routines), don't give you that data. So we'd have to scan the directory and then open and stat each file. This is terribly expensive on Windows -- and the reason we have the fscache layer (in the GfW version) to intercept the lstat() calls whenever possible. It might be possible to use the NTFS Master File Table to discover this (very big handwave), but I would need to do a little digging. This would all be NTFS specific. FAT and other volume types would not be covered. Another thing to keep in mind is that the collision could be because of case folding (or other such nonsense) on a directory in the path. I mean, if someone on Linux builds a commit containing: a/b/c/D/e/foo.txt a/b/c/d/e/foo.txt we'll get a similar collision as if one of them were spelled "FOO.txt". Also, do we need to worry about hard-links or symlinks here? If checkout populates symlinks, then you might have another collision opportunity. For example: a/b/c/D/e/foo.txt a/link -> ./b/c/d a/link/e/foo.txt Also, some platforms (like the Mac) allow directory hard-links. Granted, Git doesn't create hard-links during checkout, but the user might. I'm sure there are other edge cases here that make reporting difficult; these are just a few I thought of. I guess what I'm trying to say is that as a first step just report that you found a collision -- without trying to identify the set existing objects that it collided with. At any rate, until we have an actual plan for Windows, I think it would make sense only to split the cases into "has working inodes" and "other", and make sure "other" does something sensible in the meantime (like mention the conflict, but skip trying to list duplicates). Yes, this should be split. Do the "easy" Linux version first. Keep in mind that there may also be a different solution for the Mac. When somebody wants to work on Windows support, then we can figure out if it just needs to wrap the "get unique identifier" operation, or if it would use a totally different algorithm. -Peff Jeff
Re: Git clone and case sensitivity
On 7/29/2018 5:28 AM, Jeff King wrote: On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote: strcasecmp() will only catch a subset of the cases. We really need to follow the same folding rules that the filesystem would. True. But that's how we handle case insensitivity internally. If a filesytem has more sophisticated folding rules then git will not work well on that one anyway. Hrm. Yeah, I guess that's the best we can do for the actual in-memory checks. Everything else depends on doing an actual filesystem operation, and our icase stuff kicks in way before then. I was mostly thinking of HFS+ utf8 normalization weirdness, but I guess people are accustomed to that by now. For the case of clone, I actually wonder if we could detect during the checkout step that a file already exists. Since we know that the directory we started with was empty, then if it does, either: - there's some funny case-folding going on that means two paths in the repository map to the same name in the filesystem; or - somebody else is writing to the directory at the same time as us This is exactly what my first patch does (minus the sparse checkout part). Right, sorry, I should have read that one more carefully. But without knowing the exact folding rules, I don't think we can locate this "somebody else" who wrote the first path. So if N paths are treated the same by this filesystem, we could only report N-1 of them. If we want to report just one path when this happens though, then this works quite well. Hmm. Since most such systems are case-preserving, would it be possible to report the name of the existing file? Doing it via opendir/readdir is hacky, and anyway puts the burden on us to find the matching name. Doing it via fstat() on the opened file doesn't work because at that the filesystem has resolved the name to an inode. So yeah, perhaps strcasecmp() is the best we can do (I do agree that being able to mention all of the conflicting names is a benefit). I guess we should be using fspathcmp(), though, in case it later learns to be smarter. -Peff As has already been mentioned, this gets into weird territory really fast, between case folding, final space/dot on windows, utf8 NFC/NFD weirdness on the mac, utf8 invisible chars on the mac, long/short names on windows, and etc. And that's just for filenames. Things really get weird if directory names have these ambiguities. Perhaps just print the problematic paths (where the collision is detected) and let the user decide how to correct them. Perhaps we could have a separate tool that could scan the index or commit for potential conflicts and warn them in advance (granted, it might not be perfect and may report a few false positives). Forcing them into a sparse-checkout situation might be over their skill level. Jeff
Re: [PATCH v1 03/25] structured-logging: add structured logging framework
On 7/26/2018 5:09 AM, SZEDER Gábor wrote: +void slog_set_command_name(const char *command_name) +{ + /* +* Capture the command name even if logging is not enabled +* because we don't know if the config has been loaded yet by +* the cmd_() and/or it may be too early to force a +* lazy load. +*/ + if (my__command_name) + free(my__command_name); + my__command_name = xstrdup(command_name); +} + +void slog_set_sub_command_name(const char *sub_command_name) +{ + /* +* Capture the sub-command name even if logging is not enabled +* because we don't know if the config has been loaded yet by +* the cmd_() and/or it may be too early to force a +* lazy load. +*/ + if (my__sub_command_name) + free(my__sub_command_name); Please drop the condition in these two functions; free() handles NULL arguments just fine. sure. (Sidenote: what's the deal with these 'my__' prefixes anyway?) simply a way to identify file-scope variables and distinguish them from local variables. Jeff
Re: [PATCH v1] config.c: fix msvc compile error
On 7/24/2018 3:56 PM, Taylor Blau wrote: On Tue, Jul 24, 2018 at 03:30:10PM +, g...@jeffhostetler.com wrote: From: Jeff Hostetler In commit fb0dc3bac135e9f6243bd6d293e8c9293c73b9cd code was added to builtin/config.c to define a new function and a forward declaration for an array of unknown size. This causes a compile error under MSVC. Thanks for spending time fixing that. fb0dc3bac1 (builtin/config.c: support `--type=` as preferred alias for `--`, 2018-04-18) is from me, so I owe you an extra thanks for patching up mistakes :-). As others have noted in this thread, another patch was sent into similar effect, which has already been queued, and I agree that we should prefer that, since it's further along. Thanks, Taylor Yes, the other version is further along. Let's take it. Jeff
Re: [PATCH v1] msvc: fix non-standard escape sequence in source
On 7/24/2018 2:13 PM, Beat Bolli wrote: Hi Jeff On 24.07.18 16:42, g...@jeffhostetler.com wrote: From: Jeff Hostetler Replace non-standard "\e" escape sequence with "\x1B". This was already fixed in <20180708144342.11922-4-dev+...@drbeat.li>. Cheers, Beat Thanks for the pointer. I see that commit is in 'next'. I was only looking in 'master'. Thanks, Jeff
Re: Git 2.18: RUNTIME_PREFIX... is it working?
On 7/19/2018 6:26 PM, brian m. carlson wrote: On Wed, Jul 18, 2018 at 02:18:44PM +0200, Johannes Schindelin wrote: On Sat, 14 Jul 2018, brian m. carlson wrote: I will say that at cPanel, we have a configuration where end users can end up inside a mount namespace without /proc (depending on the preferences of the administrator). However, it's easy enough for us to simply build without RUNTIME_PREFIX if necessary. If we turn it on by default, it would be nice if we documented (maybe in the Makefile) that it requires /proc on Linux for the benefit of other people who might be in a similar situation. Is there *really* no other way on Linux to figure out the absolute path of the current executable than to open a pseudo file in the `/proc` file system? Nope, not that I'm aware of. You have to read the destination of the /proc/PID/exe symlink. Getting the full path of the current executable is a very Windows thing. On most Unix-based systems it just isn't possible (even if Linux does have the /proc thing). Think about hard-links, for example. There just isn't a single canonical pathname for an inode. Jeff
Re: [RFC] push: add documentation on push v2
On 7/18/2018 1:15 PM, Brandon Williams wrote: On 07/18, Stefan Beller wrote: On Wed, Jul 18, 2018 at 6:31 AM Derrick Stolee wrote: On 7/17/2018 7:25 PM, Stefan Beller wrote: On Tue, Jul 17, 2018 at 2:09 PM Brandon Williams wrote: Signed-off-by: Brandon Williams --- Since introducing protocol v2 and enabling fetch I've been thinking about what its inverse 'push' would look like. After talking with a number of people I have a longish list of things that could be done to improve push and I think I've been able to distill the core features we want in push v2. It would be nice to know which things you want to improve. Hopefully we can also get others to chime in with things they don't like about the existing protocol. What pain points exist, and what can we do to improve at the transport layer before considering new functionality? Another thing that I realized last night was the possibility to chunk requests. The web of today is driven by lots of small http(s) requests. I know our server team fights with the internal tools all the time because the communication involved in git-fetch is usually a large http request (large packfile). So it would be nice to have the possibility of chunking the request. But I think that can be added as a capability? (Not sure how) Fetch and push requests/responses are already "chunked" when using the http transport. So I'm not sure what you mean by adding a capability because the protocol doesn't care about which transport you're using. This is of course unless you're talking about a different "chunking" from what it means to chunk an http request/response. Internally, we've talked about wanting to have resumable pushes and fetches. I realize this is difficult to do when the server is replicated and the repeated request might be talking to a different server instance. And there's a problem with temp files littering the server as it waits for the repeated attempt. But still, the packfile sent/received can be large and connections do get dropped. That is, if we think about sending 1 large packfile and just using a byte-range-like approach to resuming the transfer. If we allowed the request to send a series of packfiles, with each "chunk" being self-contained and usable. So if a push connection was dropped the server could apply the successfully received packfile(s) (add the received objects and update the refs to the commits received so far). And ignore the interrupted and unreceived packfile(s) and let the client retry later. When/if the client retried the push, it would renegotiate haves/wants and send a new series of packfile(s). With the assumption being that the server would have updated refs from the earlier aborted push, so the packfile(s) computed for the second attempt would not repeat the content successfully transmitted in the first attempt. This would require that the client build an ordered set of packfiles from oldest to newest so that the server can apply them in-order and the graph remain connected. That may be outside your scope here. Also, we might have to add a few messages to the protocol after the negotiation, for the client to say that it is going to send the push content in 'n' packfiles and send 'n' messages with the intermediate ref values being updated in each packfile. Just thinking out loud here. Jeff
Re: [PATCH v1 00/25] RFC: structured logging
On 7/13/2018 2:51 PM, David Lang wrote: Please make an option for git to write these logs to syslog, not just a local file. Every modern syslog daemon has lots of tools to be able to deal with json messages well. David Lang That is certainly possible and we can easily add it in a later draft, but for now I'd like to stay platform-neutral and just log events to a file. My main goal right now is to get consensus on the basic structured logging framework -- the shape of the SLOG API, the event message format, and etc. Thanks, Jeff
Re: [PATCH v8 2/2] json-writer: t0019: add perl unit test
On 6/7/2018 1:13 PM, Eric Sunshine wrote: On Thu, Jun 7, 2018 at 10:12 AM, wrote: Test json-writer output using perl script. Signed-off-by: Jeff Hostetler --- diff --git a/t/t0019/parse_json.perl b/t/t0019/parse_json.perl @@ -0,0 +1,52 @@ +#!/usr/bin/perl +use strict; +use warnings; +use JSON; This new script is going to have to be protected by a prerequisite since the JSON module is not part of the standard Perl installation, thus will not necessarily be installed everywhere (it isn't on any of my machines, for instance). Something like: test_lazy_prereq PERLJSON ' perl -MJSON -e "exit 0" ' which would be used like this: test_expect_success PERLJSON 'parse JSON using Perl' ' ... ' will do. thanks! Jeff
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
On 6/8/2018 4:07 PM, René Scharfe wrote: Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: From: Jeff Hostetler [...] + if (jw->first_stack.buf[jw->first_stack.len - 1] == '1') + jw->first_stack.buf[jw->first_stack.len - 1] = '0'; + else + strbuf_addch(>json, ','); You only need a binary flag to indicate if a comma is needed, not a stack. We need a comma at the current level if and only if we already wrote a child item. If we finish a level then we do need a comma at the previous level because we just wrote a sub-object or sub-array there. And that should cover all cases. Am I missing something? I get a sense of déjà vu, by the way. :) [...] Yes, your way is simpler. I'll update and re-roll. (And yes, we did discuss this earlier. I found a problem with my first version where it wouldn't handle empty sub-items, but you found the missing piece.) Thanks Jeff
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
On 6/8/2018 2:05 AM, René Scharfe wrote: Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: From: Jeff Hostetler Add a series of jw_ routines and "struct json_writer" structure to compose [...] TEST_PROGRAMS_NEED_X += test-index-version +TEST_PROGRAMS_NEED_X += test-json-writer> TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash TEST_PROGRAMS_NEED_X += test-line-buffer TEST_PROGRAMS_NEED_X += test-match-trees This doesn't apply cleanly on master, because most test helpers have been combined into a single binary to reduce their disk footprint and link times. See efd71f8913 (t/helper: add an empty test-tool program) for the overall rationale. test-json-writer could become a built-in as well, I think. You can see e.g. in c932a5ff28 (t/helper: merge test-string-list into test-tool) what needs to be done to convert a helper. René You're right, the test helper framework changed since I started this patch series. I was trying to keep the same parent commit as my V1 to make it easier to compare, but that's not working out so well. I'll move it forward to the current master and fix it up. Thanks, Jeff
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
On 6/8/2018 4:32 PM, René Scharfe wrote: Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com: Makefile| 2 + json-writer.c | 419 json-writer.h | 113 + t/helper/test-json-writer.c | 572 t/t0019-json-writer.sh | 236 ++ 5 files changed, 1342 insertions(+) create mode 100644 json-writer.c create mode 100644 json-writer.h create mode 100644 t/helper/test-json-writer.c create mode 100755 t/t0019-json-writer.sh $ git grep 'static inline' '*json*' json-writer.c:static inline void indent_pretty(struct json_writer *jw) json-writer.c:static inline void begin(struct json_writer *jw, char ch_open, int pretty) json-writer.c:static inline void assert_in_object(const struct json_writer *jw, const char *key) json-writer.c:static inline void assert_in_array(const struct json_writer *jw) json-writer.c:static inline void maybe_add_comma(struct json_writer *jw) json-writer.c:static inline void fmt_double(struct json_writer *jw, int precision, json-writer.c:static inline void object_common(struct json_writer *jw, const char *key) json-writer.c:static inline void array_common(struct json_writer *jw) json-writer.c:static inline void assert_is_terminated(const struct json_writer *jw) t/helper/test-json-writer.c:static inline int scripted(int argc, const char **argv) t/helper/test-json-writer.c:static inline int my_usage(void) Do you need all those inline keywords? I'd rather leave the decision about inlining to the compiler and (via optimization flags) the user as much as possible. Not a biggie, but the high frequency of that word made me blink.. René I was just trying to match the patterns I found elsewhere in the code. $ grep "static inline" *.c builtin/*.c | wc -l 111 But yeah, it's no big deal. I can remove them. Jeff
Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format
On 6/7/2018 1:24 PM, Eric Sunshine wrote: On Thu, Jun 7, 2018 at 10:12 AM, wrote: Add a series of jw_ routines and "struct json_writer" structure to compose JSON data. The resulting string data can then be output by commands wanting to support a JSON output format. [...] Signed-off-by: Jeff Hostetler --- diff --git a/t/t0019-json-writer.sh b/t/t0019-json-writer.sh @@ -0,0 +1,236 @@ +test_expect_success 'simple object' ' + cat >expect <<-\EOF && + {"a":"abc","b":42,"c":3.14,"d":true,"e":false,"f":null} + EOF + test-json-writer >actual \ + @object \ + @object-string a abc \ + @object-int b 42 \ + @object-double c 2 3.140 \ + @object-true d \ + @object-false e \ + @object-null f \ + @end && + test_cmp expect actual +' To make it easier on people writing these tests, it might be nice for this to be less noisy by getting rid of "@" and "\". To get rid of "\", the test program could grab its script commands from stdin (one instruction per line) rather than from argv[]. For instance: test-json-writer >actual <<-\EOF && object object-string a abc ... end EOF Not a big deal, and certainly not worth a re-roll. I hadn't thought about doing it that way. Might be a little easier to use. Let me take a look and see if it would be much work to switch. Thanks Jeff
Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test
On 6/6/2018 5:03 PM, Jeff King wrote: On Wed, Jun 06, 2018 at 01:10:52PM -0400, Todd Zullinger wrote: g...@jeffhostetler.com wrote: +# As a sanity check, ask Python to parse our generated JSON. Let Python +# recursively dump the resulting dictionary in sorted order. Confirm that +# that matches our expectations. +test_expect_success PYTHON 'parse JSON using Python' ' [...] + python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual && Would this be better using $PYTHON_PATH rather than hard-coding python as the command? Probably. We may want to go the same route as we did for perl in a0e0ec9f7d (t: provide a perl() function which uses $PERL_PATH, 2013-10-28) so that test writers don't have to remember this. That said, I wonder if it would be hard to simply do the python bits here in perl. This is the first use of python in our test scripts (and really the only user in the whole code base outside of a few fringe commands). Leaving aside any perl vs python flame-war, I think there's value in keeping the number of languages limited when there's not a compelling reason to do otherwise. Perl works too. (I don't know perl, but I'm told it does work for stuff like this. :-) I'll take a stab at it. Jeff
Re: [PATCH v7 2/2] json-writer: t0019: add Python unit test
On 6/6/2018 1:10 PM, Todd Zullinger wrote: g...@jeffhostetler.com wrote: +# As a sanity check, ask Python to parse our generated JSON. Let Python +# recursively dump the resulting dictionary in sorted order. Confirm that +# that matches our expectations. +test_expect_success PYTHON 'parse JSON using Python' ' [...] + python "$TEST_DIRECTORY"/t0019/parse_json_1.py actual && Would this be better using $PYTHON_PATH rather than hard-coding python as the command? good catch! thanks Jeff
Re: [PATCH v4] json_writer: new routines to create data in JSON format
On 6/2/2018 1:13 AM, Jeff King wrote: On Sat, Jun 02, 2018 at 06:41:06AM +0200, Duy Nguyen wrote: On Mon, Mar 26, 2018 at 4:31 PM, wrote: +static inline void assert_in_array(const struct json_writer *jw) +{ + if (!jw->open_stack.len) + die("json-writer: array: missing jw_array_begin()"); When you reroll, please consider marking all these string for translation with _(), unless these are for machine consumption. Actually, these are all basically asserts, I think. So ideally they'd be BUG()s and not translated. Yes, these are basically asserts. I'll convert them to BUG's and send up another version this week. Thanks Jeff
Re: [PATCH v2 1/5] core.aheadbehind: add new config setting
On 4/3/2018 7:39 AM, Derrick Stolee wrote: On 4/3/2018 6:18 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Apr 03 2018, Lars Schneider wrote: What is the state of this series? I can't find it in git/git nor in git-for-windows/git. I think Stolee mentioned the config in his Git Merge talk [1] and I was about to test it/roll it out :-) It's in the gvfs branch of g...@github.com:Microsoft/git.git, i.e. it's not in Git for Windows, but used in Microsoft's own in-house version used for Windows.git. Thanks for adding me to CC. I mentioned it in my talk because that was one thing we shipped internally as a "quick fix" until we could do the right thing. If I remember correctly, Jeff abandoned shipping this upstream because it did have the feel of a hack and we wanted to see if users used the config setting or really cared about the output values. We saw fast adoption of the feature and even turned the config setting on automatically in the following version of GVFS. I may be misunderstanding this feature, but my impression was that it was a kludge as a workaround until the commit graph code landed, because once we have that then surely we can just cheaply report the actual (or approximate?) number in the common case, but of course it may still be slow if your commit graph file is out of date. Right, the only thing in master are the changes to take the new command line option and to alter the output of status. We did not reach consensus on the need for the config setting and/or whether it should be in "core." or "status." or another namespace and/or how it should work. And yes, it was also seen as a hack (just turn it off) until the client-side commit graph was ready (at least for interactive use). Because there are callers that don't need the answer (regardless of whether it is cheap to compute) and so the explicit command line arg limitation is sufficient for them. This part is in upstream master: commit 4094e47fd2c49fcdbd0152d20ed4d610d72680d7 Merge: c710d182ea f39a757dd9 Author: Junio C Hamano <gits...@pobox.com> Date: Thu Mar 8 12:36:24 2018 -0800 Merge branch 'jh/status-no-ahead-behind' These parts are in the 'gvfs' branch in the g...@github.com:Microsoft/git.git repo: commit 039f65946968fa654a9c3bca27a4f4e93c1c9381 Author: Jeff Hostetler <jeffh...@microsoft.com> Date: Wed Jan 10 13:50:24 2018 -0500 status: add warning when a/b calculation takes too long for long/normal format commit 0d6756f06d0ad6f1fdc8dba0ead7911e411c9704 Author: Jeff Hostetler <jeffh...@microsoft.com> Date: Mon Feb 5 09:44:04 2018 -0500 status: ignore status.aheadbehind in porcelain formats Teach porcelain V[12] formats to ignore the status.aheadbehind config setting. They only respect the --[no-]ahead-behind command line argument. This is for backwards compatibility with existing scripts. commit 0dd122d6cd43106a5928587d768a7381cfe9e7a3 Author: Jeff Hostetler <jeffh...@microsoft.com> Date: Tue Jan 9 14:16:07 2018 -0500 status: add status.aheadbehind setting Add "status.aheadbehind" config setting to change the default behavior of ALL git status formats. Hope this helps, Jeff
Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)
On 3/30/2018 4:38 PM, Junio C Hamano wrote: * jh/json-writer (2018-03-28) 1 commit - json_writer: new routines to create data in JSON format Is this ready for 'next'? Yes, I believe it is. This has the V6 fixup for the HEREDOC with leading whitespace, so I think we're good. Thanks Jeff
Re: [RFC/PATCH] upload-pack: disable object filtering when disabled by config
On 3/28/2018 6:12 PM, Junio C Hamano wrote: Jonathan Niederwrites: When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12, 2017-12-08), it was guarded by the uploadpack.allowFilter config item to allow server operators to control when they start supporting it. That config item didn't go far enough, though: it controls whether the 'filter' capability is advertised, but if a (custom) client ignores the capability advertisement and passes a filter specification anyway, the server would handle that despite allowFilter being false. This is particularly significant if a security bug is discovered in this new experimental partial clone code. Installations without uploadpack.allowFilter ought not to be affected since they don't intend to support partial clone, but they would be swept up into being vulnerable. Simplify and limit the attack surface by making uploadpack.allowFilter disable the feature, not just the advertisement of it. NEEDSWORK: tests Signed-off-by: Jonathan Nieder --- Noticed while reviewing the corresponding JGit code. If this change seems like a good idea, I can add tests and re-send it for real. Yup. The names of the static variables tell almost the whole story to convince me why this is a good change ;-). It is not about advertising a feature alone, but if the feature is actually enabled, so advertisement and handling of the feature should be either both enabled or disabled. Thanks. I agree. Thanks.
Re: [PATCH v4] json_writer: new routines to create data in JSON format
On 3/27/2018 11:45 AM, Ramsay Jones wrote: On 27/03/18 04:18, Ramsay Jones wrote: On 26/03/18 15:31, g...@jeffhostetler.com wrote: From: Jeff Hostetler <jeffh...@microsoft.com> [snip] Thanks, this version fixes all issues I had (with the compilation and sparse warnings). [Was using UINT64_C(0x) a problem on windows?] BTW, I forgot to mention that you had some whitespace problems with this patch, viz: I ran "make sparse" on this and it did not complain about any of this. What command do you run to get these messages? I know they appear as red in diffs (and I usually double check that), but I had not seen a command to complain about them like this. $ git log --oneline -1 --check master-json ab643d838 (master-json) json_writer: new routines to create data in JSON format t/helper/test-json-writer.c:280: trailing whitespace. + */ t/t0019-json-writer.sh:179: indent with spaces. +"g": 0, t/t0019-json-writer.sh:180: indent with spaces. +"h": 1 $ the leading spaces are required in this case. the pretty json output contains 8 spaces for that sub-structure not a tab. is there a preferred way to denote this in the test script? Jeff
Re: Fwd: New Defects reported by Coverity Scan for git
On 3/26/2018 7:39 PM, Stefan Beller wrote: coverity scan failed for the last couple month (since Nov 20th) without me noticing, I plan on running it again nightly for the Git project. Anyway, here are issues that piled up (in origin/pu) since then. Stefan -- Forwarded message -- [...] *** CID 1433539: Null pointer dereferences (FORWARD_NULL) /t/helper/test-json-writer.c: 278 in scripted() 272 struct json_writer jw = JSON_WRITER_INIT; 273 int k; 274 275 if (!strcmp(argv[0], "@object")) 276 jw_object_begin(); 277 else if (!strcmp(argv[0], "@array")) CID 1433539: Null pointer dereferences (FORWARD_NULL) Passing "" to "jw_array_begin", which dereferences null "jw.levels". 278 jw_array_begin(); 279 else 280 die("first script term must be '@object' or '@array': '%s'", argv[0]); 281 282 for (k = 1; k < argc; k++) { 283 const char *a_k = argv[k]; ** CID 1433538: Null pointer dereferences (FORWARD_NULL) The "jw.levels" field has been removed in the json-writer V4 reroll, so this isn't an issue going forward. Thanks, Jeff
Re: [PATCH v4] json_writer: new routines to create data in JSON format
On 3/26/2018 11:18 PM, Ramsay Jones wrote: On 26/03/18 15:31, g...@jeffhostetler.com wrote: From: Jeff Hostetler <jeffh...@microsoft.com> [snip] Thanks, this version fixes all issues I had (with the compilation and sparse warnings). [Was using UINT64_C(0x) a problem on windows?] Thanks for the confirmation. I was building on Linux. I haven't tried using UINT64_C() for anything, but I'll keep that in mind next time. Thanks, Jeff
Re: [RFC PATCH 1/1] json-writer: incorrect format specifier
On 3/26/2018 11:26 PM, Ramsay Jones wrote: On 26/03/18 18:04, Junio C Hamano wrote: Ramsay Joneswrites: [...] I must confess to not having given any thought to the wider implications of the code. I don't really know what this code is going to be used for. [Although I did shudder when I read some mention of a 'universal interchange format' - I still have nightmares about XML :-D ] [...] My current goals are to add telemetry in a friendly way and have events written in JSON to some audit destination. Something like: { "argv":["./git","status"], "pid":84941, "exit-code":0, "elapsed-time":0.011121, "version":"2.16.2.5.g71445db.dirty", ... } Later, we could add a JSON formatter to a command like "status" and then do things like: $ git status --json | python '... json.load ...' and eliminate the need to write custom parsers for normal or porcelain formats. There are other commands that could be similarly adapted and save callers a lot of screen-scraping code. But that is later. Thanks, Jeff
Re: [RFC PATCH v5 0/8] rebase-interactive
On 3/27/2018 1:07 AM, Junio C Hamano wrote: Jeff Hostetler <g...@jeffhostetler.com> writes: [...] So I would think it is most sensible to have double, uintmax_t and intmax_t variants. If you do not care about the extra value range that unsigned integral types afford, a single intmax_t variant would also be fine. I'll reroll with just the double and intmax_t variants. Thanks for the feedback and sorry for all the noise. Jeff
Re: Including object type and size in object id (Re: Git Merge contributor summit notes)
On 3/26/2018 5:00 PM, Jonathan Nieder wrote: Jeff Hostetler wrote: [long quote snipped] While we are converting to a new hash function, it would be nice if we could add a couple of fields to the end of the OID: the object type and the raw uncompressed object size. If would be nice if we could extend the OID to include 6 bytes of data (4 or 8 bits for the type and the rest for the raw object size), and just say that an OID is a {hash,type,size} tuple. There are lots of places where we open an object to see what type it is or how big it is. This requires uncompressing/undeltafying the object (or at least decoding enough to get the header). In the case of missing objects (partial clone or a gvfs-like projection) it requires either dynamically fetching the object or asking an object-size-server for the data. All of these cases could be eliminated if the type/size were available in the OID. This implies a limit on the object size (e.g. 5 bytes in your example). What happens when someone wants to encode an object larger than that limit? I could say add a full uint64 to the tail end of the hash, but we currently don't handle blobs/objects larger then 4GB right now anyway, right? 5 bytes for the size is just a compromise -- 1TB blobs would be terrible to think about... This also decreases the number of bits available for the hash, but that shouldn't be a big issue. I was suggesting extending the OIDs by 6 bytes while we are changing the hash function. Aside from those two, I don't see any downsides. It would mean that tree objects contain information about the sizes of blobs contained there, which helps with virtual file systems. It's also possible to do that without putting the size in the object id, but maybe having it in the object id is simpler. Will think more about this. Thanks for the idea, Jonathan Thanks Jeff
Re: Git Merge contributor summit notes
On 3/26/2018 1:56 PM, Stefan Beller wrote: On Mon, Mar 26, 2018 at 10:33 AM Jeff Hostetler <g...@jeffhostetler.com> wrote: On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote: On Sat, Mar 10 2018, Alex Vandiver wrote: New hash (Stefan, etc) -- - discussed on the mailing list - actual plan checked in to Documentation/technical/hash-function-transition.txt - lots of work renaming - any actual work with the transition plan? - local conversion first; fetch/push have translation table - like git-svn - also modified pack and index format to have lookup/translation efficiently - brian's series to eliminate SHA1 strings from the codebase - testsuite is not working well because hardcoded SHA1 values - flip a bit in the sha1 computation and see what breaks in the testsuite - will also need a way to do the conversion itself; traverse and write out new version - without that, can start new repos, but not work on old ones - on-disk formats will need to change -- something to keep in mind with new index work - documentation describes packfile and index formats - what time frame are we talking? - public perception question - signing commits doesn't help (just signs commit object) unless you "recursive sign" - switched to SHA1dc; we detect and reject known collision technique - do it now because it takes too long if we start when the collision drops - always call it "new hash" to reduce bikeshedding - is translation table a backdoor? has it been reviewed by crypto folks? - no, but everything gets translated - meant to avoid a flag day for entire repositories - linus can decide to upgrade to newhash; if pushes to server that is not newhash aware, that's fine - will need a wire protocol change - v2 might add a capability for newhash - "now that you mention md5, it's a good idea" - can use md5 to test the conversion - is there a technical reason for why not /n/ hashes? - the slow step goes away as people converge to the new hash - beneficial to make up some fake hash function for testing - is there a plan on how we decide which hash function? - trust junio to merge commits when appropriate - conservancy committee explicitly does not make code decisions - waiting will just give better data - some hash functions are in silicon (e.g. microsoft cares) - any movement in libgit2 / jgit? - basic stuff for libgit2; same testsuite problems - no work in jgit - most optimistic forecast? - could be done in 1-2y - submodules with one hash function? - unable to convert project unless all submodules are converted - OO-ing is not a prereq Late reply, but one thing I brought up at the time is that we'll want to keep this code around even after the NewHash migration at least for testing purposes, should we ever need to move to NewNewHash. It occurred to me recently that once we have such a layer it could be (ab)used with some relatively minor changes to do any arbitrary local-to-remote object content translation, unless I've missed something (but I just re-read hash-function-transition.txt now...). E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a remote server so that you upload a GPG encrypted version of all your blobs, and have your trees reference those blobs. Because we'd be doing arbitrary translations for all of commits/trees/blobs this could go further than other bolted-on encryption solutions for Git. E.g. paths in trees could be encrypted too, as well as all the content of the commit object that isn't parent info & the like (but that would have different hashes). Basically clean/smudge filters on steroids, but for every object in the repo. Anyone who got a hold of it would still see the shape of the repo & approximate content size, but other than that it wouldn't be more info than they'd get via `fast-export --anonymize` now. I mainly find it interesting because presents an intersection between a feature we might want to offer anyway, and something that would stress the hash transition codepath going forward, to make sure it hasn't all bitrotted by the time we'll need NewHash->NewNewHash. Git hosting providers would hate it, but they should probably be charging users by how much Michael Haggerty's git-sizer tool hates their repo anyway :) While we are converting to a new hash function, it would be nice if we could add a couple of fields to the end of the OID: the object type and the raw uncompressed object size. This would allow to craft invalid OIDs, i.e. the correct hash value with the wrong object type. (This is different field of "invalid" compared to today, where we either have or do not have the object named by the hash value. If we don't have it, it may be just unknown to us, but not "wrong".) An invalid OID (such as a wrong ob
Re: [RFC PATCH v5 0/8] rebase-interactive
On 3/26/2018 2:00 PM, Junio C Hamano wrote: Jeff Hostetler <g...@jeffhostetler.com> writes: I am concerned that the above compiler error message says that uintmax_t is defined as an "unsigned long" (which is defined as *at least* 32 bits, but not necessarily 64. But a uint64_t is defined as a "unsigned long long" and guaranteed as a 64 bit value. On a platform whose uintmax_t is u32, is it realistic to expect that we would be able to use u64, even if we explicitly ask for it, in the first place? In other words, on a platform that handles uint64_t, I would expect uintmax_t to be wide enough to hold an uint64_t value without truncation. I was just going by what the reported compiler error message was. It said that "unsigned long" didn't match the uint64_t variable. And that made me nervous. If all of the platforms we build on define uintmax_t >= 64 bits, then it doesn't matter. If we do have a platform where uintmax_t is u32, then we'll have a lot more breakage than in just the new function I added. Thanks, Jeff
Re: [RFC PATCH v5 0/8] rebase-interactive
On 3/26/2018 1:57 PM, Junio C Hamano wrote: Jeff Hostetler <g...@jeffhostetler.com> writes: I defined that routine to take a uint64_t because I wanted to pass a nanosecond value received from getnanotime() and that's what it returns. Hmph, but the target format does not have different representation of inttypes in different sizes, no? I personally doubt that we would benefit from having a group of functions (i.e. format_int{8,16,32,64}_to_json()) that callers have to choose from, depending on the exact size of the integer they want to serialize. The de-serializing side would be the same story. Even if the variable a potential caller of the formetter is a sized type that is different from uintmax_t, the caller shouldn't have to add an extra cast. Am I missing some obvious merit for having these separate functions for explicit sizes? I did the uint64_t for the unsigned ns times. I did the other one for the usual signed ints. I could convert them both to a single signed 64 bit typed function if we only want to have one function. Jeff
Re: [RFC PATCH 1/1] json-writer: incorrect format specifier
On 3/26/2018 1:04 PM, Junio C Hamano wrote: Ramsay Joneswrites: @@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char *key, uint64_t value) maybe_add_comma(jw); append_quoted_string(>json, key); - strbuf_addf(>json, ":%"PRIuMAX, value); + strbuf_addf(>json, ":%"PRIu64, value); In this code-base, that would normally be written as: strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t) value); heh, I should learn not to reply in a hurry, just before going out ... I had not noticed that 'value' was declared with an 'sized type' of uint64_t, so using PRIu64 should be fine. But why is this codepath using a sized type in the first place? It is not like it wants to read/write a fixed binary file format---it just wants to use an integer type that is wide enough to handle any inttype the platform uses, for which uintmax_t would be a more appropriate type, no? [Somehow the conversation forked and this compiler warning appeared in both the json-writer and the rebase-interactive threads. I'm copying here the response that I already made on the latter.] I defined that routine to take a uint64_t because I wanted to pass a nanosecond value received from getnanotime() and that's what it returns. My preference would be to change the PRIuMAX to PRIu64, but there aren't any other references in the code to that symbol and I didn't want to start a new trend here. I am concerned that the above compiler error message says that uintmax_t is defined as an "unsigned long" (which is defined as *at least* 32 bits, but not necessarily 64. But a uint64_t is defined as a "unsigned long long" and guaranteed as a 64 bit value. So while I'm not really worried about 128 bit integers right now, I'm more concerned about 32 bit compilers truncating that value without any warnings. Jeff
Re: [RFC PATCH v2 1/1] json-writer: add cast to uintmax_t
On 3/24/2018 2:38 PM, Wink Saville wrote: Correct a compile error on Mac OSX by adding a cast to uintmax_t in calls to strbuf_addf. Helped-by: Ramsay JonesTested-by: travis-ci Signed-off-by: Wink Saville --- json-writer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/json-writer.c b/json-writer.c index 89a6abb57..1f40482ff 100644 --- a/json-writer.c +++ b/json-writer.c @@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char *key, uint64_t value) maybe_add_comma(jw); append_quoted_string(>json, key); - strbuf_addf(>json, ":%"PRIuMAX, value); + strbuf_addf(>json, ":%"PRIuMAX, (uintmax_t)value); } void jw_object_double(struct json_writer *jw, const char *fmt, @@ -225,7 +225,7 @@ void jw_array_uint64(struct json_writer *jw, uint64_t value) assert_in_array(jw); maybe_add_comma(jw); - strbuf_addf(>json, "%"PRIuMAX, value); + strbuf_addf(>json, "%"PRIuMAX, (uintmax_t)value); } void jw_array_double(struct json_writer *jw, const char *fmt, double value) FYI. I included and squashed this change into V4 of my json-writer series. Jeff
Re: Git Merge contributor summit notes
On 3/25/2018 6:58 PM, Ævar Arnfjörð Bjarmason wrote: On Sat, Mar 10 2018, Alex Vandiver wrote: New hash (Stefan, etc) -- - discussed on the mailing list - actual plan checked in to Documentation/technical/hash-function-transition.txt - lots of work renaming - any actual work with the transition plan? - local conversion first; fetch/push have translation table - like git-svn - also modified pack and index format to have lookup/translation efficiently - brian's series to eliminate SHA1 strings from the codebase - testsuite is not working well because hardcoded SHA1 values - flip a bit in the sha1 computation and see what breaks in the testsuite - will also need a way to do the conversion itself; traverse and write out new version - without that, can start new repos, but not work on old ones - on-disk formats will need to change -- something to keep in mind with new index work - documentation describes packfile and index formats - what time frame are we talking? - public perception question - signing commits doesn't help (just signs commit object) unless you "recursive sign" - switched to SHA1dc; we detect and reject known collision technique - do it now because it takes too long if we start when the collision drops - always call it "new hash" to reduce bikeshedding - is translation table a backdoor? has it been reviewed by crypto folks? - no, but everything gets translated - meant to avoid a flag day for entire repositories - linus can decide to upgrade to newhash; if pushes to server that is not newhash aware, that's fine - will need a wire protocol change - v2 might add a capability for newhash - "now that you mention md5, it's a good idea" - can use md5 to test the conversion - is there a technical reason for why not /n/ hashes? - the slow step goes away as people converge to the new hash - beneficial to make up some fake hash function for testing - is there a plan on how we decide which hash function? - trust junio to merge commits when appropriate - conservancy committee explicitly does not make code decisions - waiting will just give better data - some hash functions are in silicon (e.g. microsoft cares) - any movement in libgit2 / jgit? - basic stuff for libgit2; same testsuite problems - no work in jgit - most optimistic forecast? - could be done in 1-2y - submodules with one hash function? - unable to convert project unless all submodules are converted - OO-ing is not a prereq Late reply, but one thing I brought up at the time is that we'll want to keep this code around even after the NewHash migration at least for testing purposes, should we ever need to move to NewNewHash. It occurred to me recently that once we have such a layer it could be (ab)used with some relatively minor changes to do any arbitrary local-to-remote object content translation, unless I've missed something (but I just re-read hash-function-transition.txt now...). E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a remote server so that you upload a GPG encrypted version of all your blobs, and have your trees reference those blobs. Because we'd be doing arbitrary translations for all of commits/trees/blobs this could go further than other bolted-on encryption solutions for Git. E.g. paths in trees could be encrypted too, as well as all the content of the commit object that isn't parent info & the like (but that would have different hashes). Basically clean/smudge filters on steroids, but for every object in the repo. Anyone who got a hold of it would still see the shape of the repo & approximate content size, but other than that it wouldn't be more info than they'd get via `fast-export --anonymize` now. I mainly find it interesting because presents an intersection between a feature we might want to offer anyway, and something that would stress the hash transition codepath going forward, to make sure it hasn't all bitrotted by the time we'll need NewHash->NewNewHash. Git hosting providers would hate it, but they should probably be charging users by how much Michael Haggerty's git-sizer tool hates their repo anyway :) While we are converting to a new hash function, it would be nice if we could add a couple of fields to the end of the OID: the object type and the raw uncompressed object size. If would be nice if we could extend the OID to include 6 bytes of data (4 or 8 bits for the type and the rest for the raw object size), and just say that an OID is a {hash,type,size} tuple. There are lots of places where we open an object to see what type it is or how big it is. This requires uncompressing/undeltafying the object (or at least decoding enough to get the header). In the case of missing objects (partial clone or a gvfs-like projection) it requires either dynamically fetching the object or asking an object-size-server for the data. All of these cases could be
Re: [RFC PATCH v5 0/8] rebase-interactive
On 3/26/2018 11:56 AM, Junio C Hamano wrote: Wink Savillewrites: json-writer.c:123:38: error: format specifies type 'uintmax_t' (aka 'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat] strbuf_addf(>json, ":%"PRIuMAX, value); ~~ ^ json-writer.c:228:37: error: format specifies type 'uintmax_t' (aka 'unsigned long') but the argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat] [0m strbuf_addf(>json, "%"PRIuMAX, value); ~~ ^ 2 errors generated. make: *** [json-writer.o] Error 1 make: *** Waiting for unfinished jobs For whatever reason, our codebase seems to shy away from PRIu64, even though there are liberal uses of PRIu32. Showing the value casted to uintmax_t with PRIuMAX seems to be our preferred way to say "We cannot say how wide this type is on different platforms, and are playing safe by using widest-possible int type" (e.g. showing a pid_t value from daemon.c). In this codepath, the actual values are specified to be uint64_t, so the use of PRIu64 may be OK, but I have to wonder why the codepath is not dealing with uintmax_t in the first place. When even larger than present archs are prevalent in N years and 64-bit starts to feel a tad small (like we feel for 16-bit ints these days), it will feel a bit silly to have a subsystem that is limited to such a "fixed and a tad small these days" types and pretend it to be be a generic seriealizer, I suspect. I defined that routine to take a uint64_t because I wanted to pass a nanosecond value received from getnanotime() and that's what it returns. My preference would be to change the PRIuMAX to PRIu64, but there aren't any other references in the code to that symbol and I didn't want to start a new trend here. I am concerned that the above compiler error message says that uintmax_t is defined as an "unsigned long" (which is defined as *at least* 32 bits, but not necessarily 64. But a uint64_t is defined as a "unsigned long long" and guaranteed as a 64 bit value. So while I'm not really worried about 128 bit integers right now, I'm more concerned about 32 bit compilers truncating that value without any warnings. Jeff
Re: [RFC PATCH 1/1] json-writer: incorrect format specifier
On 3/24/2018 1:37 AM, Wink Saville wrote: In routines jw_object_uint64 and jw_object_double strbuf_addf is invoked with strbuf_addf(>json, ":%"PRIuMAX, value) where value is a uint64_t. This causes a compile error on OSX. The correct format specifier is PRIu64 instead of PRIuMax. Signed-off-by: Wink SavilleThat's odd. A grep on the Git source tree did not find a "PRIu64" symbol. Searching public-inbox only found one message [1] talking about it (other than the ones associated with your messages here). I have to wonder if that is defined in a OSX header file and you're getting it from there [2]. (I don't have a MAC in front of me, so I can't verify what's in that header.) But [2] defines PRIuMAX as PRIu64, so we shouldn't need to make that change in json-writer -- unless something is getting lost in the #ifdefs. Could you double check this in the header files on your system? Any chance you are doing a 32-bit build? Thanks Jeff [1] https://public-inbox.org/git/mwhpr21mb0478181ae0b64901da2c07cdf4...@mwhpr21mb0478.namprd21.prod.outlook.com/raw [2] https://opensource.apple.com/source/gcc/gcc-926/inttypes.h.auto.html --- json-writer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/json-writer.c b/json-writer.c index 89a6abb57..04045448a 100644 --- a/json-writer.c +++ b/json-writer.c @@ -120,7 +120,7 @@ void jw_object_uint64(struct json_writer *jw, const char *key, uint64_t value) maybe_add_comma(jw); append_quoted_string(>json, key); - strbuf_addf(>json, ":%"PRIuMAX, value); + strbuf_addf(>json, ":%"PRIu64, value); } void jw_object_double(struct json_writer *jw, const char *fmt, @@ -225,7 +225,7 @@ void jw_array_uint64(struct json_writer *jw, uint64_t value) assert_in_array(jw); maybe_add_comma(jw); - strbuf_addf(>json, "%"PRIuMAX, value); + strbuf_addf(>json, "%"PRIu64, value); } void jw_array_double(struct json_writer *jw, const char *fmt, double value)
Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/23/2018 4:11 PM, René Scharfe wrote: Am 23.03.2018 um 20:55 schrieb Jeff Hostetler: +struct json_writer_level +{ + unsigned level_is_array : 1; + unsigned level_is_empty : 1; +}; + +struct json_writer +{ + struct json_writer_level *levels; + int nr, alloc; + struct strbuf json; +}; A simpler and probably more compact representation of is_array would be a strbuf with one char per level, e.g. '[' for an array and '{' for an object (or ']' and '}'). I don't understand the need to track emptiness per level. Only the top level array/object can ever be empty, can it? My expectation was that any sub-object or sub-array could be empty. That is, this should be valid (and the JSON parser in Python allows): {"a":{}, "b":[], "c":[[]], "d":[{}]} Sure, but the emptiness of finished arrays and objects doesn't matter for the purposes of error checking, comma setting or closing. At most one of them is empty *and* unclosed while writing the overall JSON object -- the last one opened: { {"a":{ {"a":{}, "b":[ {"a":{}, "b":[], "c":[ {"a":{}, "b":[], "c":[[ {"a":{}, "b":[], "c":[[]], "d":[ {"a":{}, "b":[], "c":[[]], "d":[{ Any of the earlier written arrays/objects are either closed or contain at least a half-done sub-array/object, which makes them non-empty. René good point. i'll revisit. thanks. Jeff
Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/23/2018 2:01 PM, René Scharfe wrote: Am 21.03.2018 um 20:28 schrieb g...@jeffhostetler.com: From: Jeff Hostetler <jeffh...@microsoft.com> Add basic routines to generate data in JSON format. Signed-off-by: Jeff Hostetler <jeffh...@microsoft.com> --- Makefile| 2 + json-writer.c | 321 + json-writer.h | 86 + t/helper/test-json-writer.c | 420 t/t0019-json-writer.sh | 102 +++ 5 files changed, 931 insertions(+) create mode 100644 json-writer.c create mode 100644 json-writer.h create mode 100644 t/helper/test-json-writer.c create mode 100755 t/t0019-json-writer.sh diff --git a/Makefile b/Makefile index 1a9b23b..57f58e6 100644 --- a/Makefile +++ b/Makefile @@ -662,6 +662,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh TEST_PROGRAMS_NEED_X += test-genrandom TEST_PROGRAMS_NEED_X += test-hashmap TEST_PROGRAMS_NEED_X += test-index-version +TEST_PROGRAMS_NEED_X += test-json-writer TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash TEST_PROGRAMS_NEED_X += test-line-buffer TEST_PROGRAMS_NEED_X += test-match-trees @@ -815,6 +816,7 @@ LIB_OBJS += hashmap.o LIB_OBJS += help.o LIB_OBJS += hex.o LIB_OBJS += ident.o +LIB_OBJS += json-writer.o LIB_OBJS += kwset.o LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o diff --git a/json-writer.c b/json-writer.c new file mode 100644 index 000..89a6abb --- /dev/null +++ b/json-writer.c @@ -0,0 +1,321 @@ +#include "cache.h" +#include "json-writer.h" + +static char g_ch_open[2] = { '{', '[' }; +static char g_ch_close[2] = { '}', ']' }; + +/* + * Append JSON-quoted version of the given string to 'out'. + */ +static void append_quoted_string(struct strbuf *out, const char *in) +{ + strbuf_addch(out, '"'); + for (/**/; *in; in++) { + unsigned char c = (unsigned char)*in; + if (c == '"') + strbuf_add(out, "\\\"", 2); + else if (c == '\\') + strbuf_add(out, "", 2); + else if (c == '\n') + strbuf_add(out, "\\n", 2); + else if (c == '\r') + strbuf_add(out, "\\r", 2); + else if (c == '\t') + strbuf_add(out, "\\t", 2); + else if (c == '\f') + strbuf_add(out, "\\f", 2); + else if (c == '\b') + strbuf_add(out, "\\b", 2); Using strbuf_addstr() here would result in the same object code (its strlen() call is inlined for constants) and avoid having to specify the redundant length 2. sure. thanks. + else if (c < 0x20) + strbuf_addf(out, "\\u%04x", c); + else + strbuf_addch(out, c); + } + strbuf_addch(out, '"'); +} + [...] + +void jw_object_double(struct json_writer *jw, const char *fmt, + const char *key, double value) +{ + assert_in_object(jw, key); + maybe_add_comma(jw); + + if (!fmt || !*fmt) + fmt = "%f"; + + append_quoted_string(>json, key); + strbuf_addch(>json, ':'); + strbuf_addf(>json, fmt, value); Hmm. Can compilers check such a caller-supplied format matches the value's type? (I don't know how to specify a format attribute for GCC and Clang for this function.) I'll look into this. thanks. +} [...] + +void jw_object_sub(struct json_writer *jw, const char *key, + const struct json_writer *value) +{ + assert_is_terminated(value); + + assert_in_object(jw, key); + maybe_add_comma(jw); + + append_quoted_string(>json, key); + strbuf_addch(>json, ':'); + strbuf_addstr(>json, value->json.buf); strbuf_addbuf() would be a better fit here -- it avoids a strlen() call and NUL handling issues. sure. thanks. i always forget about _addbuf(). +} + +void jw_object_inline_begin_array(struct json_writer *jw, const char *key) +{ + assert_in_object(jw, key); + maybe_add_comma(jw); + + append_quoted_string(>json, key); + strbuf_addch(>json, ':'); + + jw_array_begin(jw); +} Those duplicate calls in the last ten functions feel mind-numbing. A helper function for adding comma, key and colon might be a good idea. I'll see if I can add another macro to do the dirty work here. [...] diff --git a/json-writer.h b/json-writer.h new file mode 100644 index 000..ad38c95 --- /dev/null +++ b/json-writer.h @@ -0,0 +1,86 @@ +#ifndef JSON_WRITER_H +#define JSON_WRITER_H + +/* + * JSON data structures are defined at: + * http://json.org/ + * http://www.ietf.org/rfc/rfc7159.txt + * + *
Re: [PATCH v3] routines to generate JSON data
On 3/23/2018 2:14 PM, Ramsay Jones wrote: On 23/03/18 16:29, g...@jeffhostetler.com wrote: From: Jeff Hostetler <jeffh...@microsoft.com> This is version 3 of my JSON data format routines. I have not looked at v3 yet - the patch below is against the version in 'pu' @3284f940c (presumably that would be v2). I built my v3 on Linux and didn't see any warnings. I'll add your extra CFLAGS and fix anything that I find and send up a v4 shortly. Thanks. Jeff The version in 'pu' broke my build on Linux, but not on cygwin - which was a bit of a surprise. That is, until I saw the warnings and remembered that I have this in my config.mak on Linux, but not on cygwin: ifneq ($(CC),clang) CFLAGS += -Wold-style-declaration CFLAGS += -Wno-pointer-to-int-cast CFLAGS += -Wsystem-headers endif ... and the warnings were: $ diff nout pout 1c1 < GIT_VERSION = 2.17.0.rc1.317.g4a561d2cc --- > GIT_VERSION = 2.17.0.rc1.445.g3284f940c 29a30 > CC commit-graph.o 73a75,87 > CC json-writer.o > json-writer.c:53:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] > static void inline assert_in_object(const struct json_writer *jw, const char *key) > ^ > json-writer.c:64:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] > static void inline assert_in_array(const struct json_writer *jw) > ^ > json-writer.c:75:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] > static void inline maybe_add_comma(struct json_writer *jw) > ^ > json-writer.c:87:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] > static void inline assert_is_terminated(const struct json_writer *jw) > ^ 83a98 > CC ls-refs.o ... $ The '-Wold-style-declaration' gcc warning flag is not a standard project flag, and I can't quite remember why I have it set, so I guess you could just ignore it. However, all other 'static inline' functions in the project have the inline keyword before the return type, so ... ;-) Also, sparse spewed 40 warnings for t/helper/test-json-writer.c, which were mainly about file-local symbols, but had a couple of 'constant is so large ...', like so: $ grep warning psp-out | head -8 t/helper/test-json-writer.c:45:46: warning: constant 0x is so big it is unsigned long t/helper/test-json-writer.c:108:40: warning: constant 0x is so big it is unsigned long t/helper/test-json-writer.c:4:12: warning: symbol 'expect_obj1' was not declared. Should it be static? t/helper/test-json-writer.c:5:12: warning: symbol 'expect_obj2' was not declared. Should it be static? t/helper/test-json-writer.c:6:12: warning: symbol 'expect_obj3' was not declared. Should it be static? t/helper/test-json-writer.c:7:12: warning: symbol 'expect_obj4' was not declared. Should it be static? t/helper/test-json-writer.c:8:12: warning: symbol 'expect_obj5' was not declared. Should it be static? t/helper/test-json-writer.c:10:20: warning: symbol 'obj1' was not declared. Should it be static? $ I decided to use the UINT64_C(v) macro from , which is a C99 feature, and will (hopefully) not be a problem. ATB, Ramsay Jones -- >8 -- Subject: [PATCH] json-writer: fix up gcc and sparse warnings Signed-off-by: Ramsay Jones <ram...@ramsayjones.plus.com> --- json-writer.c | 8 ++--- t/helper/test-json-writer.c | 80 ++--- 2 files changed, 44 insertions(+), 44 deletions(-) diff --git a/json-writer.c b/json-writer.c index 89a6abb57..ba0365d20 100644 --- a/json-writer.c +++ b/json-writer.c @@ -50,7 +50,7 @@ static inline void begin(struct json_writer *jw, int is_array) /* * Assert that we have an open object at this level. */ -static void inline assert_in_object(const struct json_writer *jw, const char *key) +static inline void assert_in_object(const struct json_writer *jw, const char *key) { if (!jw->nr) BUG("object: missing jw_object_begin(): '%s'", key); @@ -61,7 +61,7 @@ static void inline assert_in_object(const struct json_writer *jw, const char *ke /* * Assert that we have an open array at this level. */ -static void inline assert_in_array(const struct json_writer *jw) +static inline void assert_in_array(const struct json_writer *jw) { if (!jw->nr) BUG("array: missing jw_begin()"); @@ -72,7 +72,7 @@ static void inline assert_in_array(const struct json_writer *jw) /* * Add comma if we have already seen a member at this level. */ -static void inline maybe_add_comma(struct json_writer *jw) +static inline void maybe_add_comma(struct json_writer *jw) { if (jw->levels[jw->nr - 1].level_is_empty) jw->levels[jw->
Re: [PATCH v3] json_writer: new routines to create data in JSON format
On 3/23/2018 1:18 PM, Jonathan Nieder wrote: g...@jeffhostetler.com wrote: From: Jeff Hostetler <jeffh...@microsoft.com> Add basic routines to generate data in JSON format. Signed-off-by: Jeff Hostetler <jeffh...@microsoft.com> If I understand the cover letter correctly, this is a JSON-like format, not actual JSON. Am I understanding correctly? What are the requirements for consuming this output? Will a typical JSON library be able to handle it without trouble? If not, is there some subset of cases where a typical JSON library is able to handle it without trouble? I'll add text to the commit message and the .h file explaining the Unicode limitation in the next reroll. Can you say a little about the motivation here? (I know you already put some of that in the cover letter, but since that doesn't become part of permanent history, it doesn't help the audience that matters.) I'll add a note about that to the commit message. Thanks. This would also be easier to review if there were an application of it in the same series. It's fine to send an RFC like this without such an application, but I think we should hold off on merging it until we have one. Having an application makes review much easier since we can see how the API works in practice, how well the approach fits what users need, etc. That's fine. I'm planning to push up another patch series next week that builds upon this, so hopefully that will help. Thanks and hope that helps, Jonathan Thanks, Jeff
Re: [PATCH v2] json_writer: new routines to create data in JSON format
On 3/21/2018 5:25 PM, Junio C Hamano wrote: g...@jeffhostetler.com writes: From: Jeff Hostetler <jeffh...@microsoft.com> Add basic routines to generate data in JSON format. And the point of having capability to write JSON data in our codebase is...? diff --git a/json-writer.c b/json-writer.c new file mode 100644 index 000..89a6abb --- /dev/null +++ b/json-writer.c @@ -0,0 +1,321 @@ +#include "cache.h" +#include "json-writer.h" + +static char g_ch_open[2] = { '{', '[' }; +static char g_ch_close[2] = { '}', ']' }; What's "g_" prefix? Global. Sorry, very old habits. + +/* + * Append JSON-quoted version of the given string to 'out'. + */ +static void append_quoted_string(struct strbuf *out, const char *in) +{ + strbuf_addch(out, '"'); + for (/**/; *in; in++) { + unsigned char c = (unsigned char)*in; It is clear enough to lose /**/, i.e. for (; *in; in++) { but for this one. I wonder if unsigned char c; strbuf_addch(out, '"'); while ((c = *in++) != '\0') { ... is easier to follow, though. either way is fine. will fix. +static inline void begin(struct json_writer *jw, int is_array) +{ + ALLOC_GROW(jw->levels, jw->nr + 1, jw->alloc); + + jw->levels[jw->nr].level_is_array = !!is_array; + jw->levels[jw->nr].level_is_empty = 1; An element of this array is a struct that represents a level, and everybody who accesses an element of that type knows it is talking about a level by the field that has the array being named as .levels[] (also [*1*]). In such a context, it is a bit too loud to name the fields with level_$blah. IOW, struct json_writer_level { unsigned is_array : 1; unsigned is_empty : 1; }; make sense. will fix. +struct json_writer_level +{ + unsigned level_is_array : 1; + unsigned level_is_empty : 1; +}; + +struct json_writer +{ + struct json_writer_level *levels; + int nr, alloc; + struct strbuf json; +}; [Footnote] *1* I personally prefer to call an array of things "thing[]", not "things[]", because then you can refer to an individual element e.g. "thing[4]" and read it as "the fourth thing". Unless the code often treats an array as a whole, that is, in which case, things[] is OK as you'll be calling the whole thing with the plural name (e.g. call that function and give all the things by passing things[]). In this case, one level instance is an element of a stack, and the code would be accessing one level at a time most of the time, so "writer.level[4].is_empty" would read more naturally than "writer.levels[4].level_is_empty". yeah, that makes sense. Thanks Jeff
Re: [PATCH 0/2] routines to generate JSON data
On 3/20/2018 1:42 AM, Jeff King wrote: On Mon, Mar 19, 2018 at 06:19:26AM -0400, Jeff Hostetler wrote: To make the above work, I think you'd have to store a little more state. E.g., the "array_append" functions check "out->len" to see if they need to add a separating comma. That wouldn't work if we might be part of a nested array. So I think you'd need a context struct like: struct json_writer { int first_item; struct strbuf out; }; #define JSON_WRITER_INIT { 1, STRBUF_INIT } to store the state and the output. As a bonus, you could also use it to store some other sanity checks (e.g., keep a "depth" counter and BUG() when somebody tries to access the finished strbuf with a hanging-open object or array). Yeah, I thought about that, but I think it gets more complex than that. I'd need a stack of "first_item" values. Or maybe the _begin() needs to increment a depth and set first_item and the _end() needs to always unset first_item. I'll look at this gain. I think you may be able to get by with just unsetting first_item for any "end". Because as you "pop" to whatever data structure is holding whatever has ended, you know it's no longer the first item (whatever just ended was there before it). I admit I haven't thought too hard on it, though, so maybe I'm missing something. I'll take a look. Thanks. The thing I liked about the bottom-up construction is that it is easier to collect multiple sets in parallel and combine them during the final roll-up. With the in-line nesting, you're tempted to try to construct the resulting JSON in a single series and that may not fit what the code is trying to do. For example, if I wanted to collect an array of error messages as they are generated and an array of argv arrays and any alias expansions, then put together a final JSON string containing them and the final exit code, I'd need to build it in parts. I can build these parts in pieces of JSON and combine them at the end -- or build up other similar data structures (string arrays, lists, or whatever) and then have a JSON conversion step. But we can make it work both ways, I just wanted to keep it simpler. Yeah, I agree that kind of bottom-up construction would be nice for some cases. I'm mostly worried about inefficiency copying the strings over and over as we build up the final output. Maybe that's premature worrying, though. If the first_item thing isn't too painful, then it might be nice to have both approaches available. True. In general I'd really prefer to keep the shell script as the driver for the tests, and have t/helper programs just be conduits. E.g., something like: cat >expect <<-\EOF && {"key": "value", "foo": 42} EOF test-json-writer >actual \ object_begin \ object_append_string key value \ object_append_int foo 42 \ object_end && test_cmp expect actual It's a bit tedious (though fairly mechanical) to expose the API in this way, but it makes it much easier to debug, modify, or add tests later on (for example, I had to modify the C program to find out that my append example above wouldn't work). Yeah, I wasn't sure if such a simple api required exposing all that machinery to the shell or not. And the api is fairly self-contained and not depending on a lot of disk/repo setup or anything, so my tests would be essentially static WRT everything else. With my t0019 script you should have been able use -x -v to see what was failing. I was able to run the test-helper directly. The tricky thing is that I had to write new C code to test my theory about how the API worked. Admittedly that's not something most people would do regularly, but I often seem to end up doing that kind of probing and debugging. Many times I've found the more generic t/helper programs useful. I also wonder if various parts of the system embrace JSON, if we'd want to have a tool for generating it as part of other tests (e.g., to create "expect" files). Ok, let me see what I can come up with. Thanks Jeff
Re: [PATCH 0/2] routines to generate JSON data
On 3/17/2018 3:38 AM, Jacob Keller wrote: On Fri, Mar 16, 2018 at 2:18 PM, Jeff Kingwrote: 3. Some other similar format. YAML comes to mind. Last time I looked (quite a while ago), it seemed insanely complex, but I think you could implement only a reasonable subset. OTOH, I think the tools ecosystem for parsing JSON (e.g., jq) is much better. I would personally avoid YAML. It's "easier" for humans to read/parse, but honestly JSON is already simple enough and anyone who writes C or javascript can likely parse and hand-write JSON anyways. YAML lacks built-in parsers for most languages, where as many scripting languages already have JSON parsing built in, or have more easily attainable libraries available. In contrast, the YAML libraries are much more complex and less likely to be available. That's just my own experience at $dayjob though. Agreed. I just looked at the spec for it and I think it would be harder for us to be assured we are generating valid output with leading whitespace being significant (without a lot more inspection of the strings being passed down to us). Jeff
Re: [PATCH 0/2] routines to generate JSON data
On 3/16/2018 5:18 PM, Jeff King wrote: On Fri, Mar 16, 2018 at 07:40:55PM +, g...@jeffhostetler.com wrote: [...] I really like the idea of being able to send our machine-readable output in some "standard" syntax for which people may already have parsers. But one big hangup with JSON is that it assumes all strings are UTF-8. That may be OK for telemetry data, but it would probably lead to problems for something like status porcelain, since Git's view of paths is just a string of bytes (not to mention possible uses elsewhere like author names, subject lines, etc). [...] I'll come back to the UTF-8/YAML questions in a separate response. Documentation for the new API is given in json-writer.h at the bottom of the first patch. The API generally looks pleasant, but the nesting surprised me. E.g., I'd have expected: jw_array_begin(out); jw_array_begin(out); jw_array_append_int(out, 42); jw_array_end(out); jw_array_end(out); to result in an array containing an array containing an integer. But array_begin() actually resets the strbuf, so you can't build up nested items like this internally. Ditto for objects within objects. You have to use two separate strbufs and copy the data an extra time. To make the above work, I think you'd have to store a little more state. E.g., the "array_append" functions check "out->len" to see if they need to add a separating comma. That wouldn't work if we might be part of a nested array. So I think you'd need a context struct like: struct json_writer { int first_item; struct strbuf out; }; #define JSON_WRITER_INIT { 1, STRBUF_INIT } to store the state and the output. As a bonus, you could also use it to store some other sanity checks (e.g., keep a "depth" counter and BUG() when somebody tries to access the finished strbuf with a hanging-open object or array). Yeah, I thought about that, but I think it gets more complex than that. I'd need a stack of "first_item" values. Or maybe the _begin() needs to increment a depth and set first_item and the _end() needs to always unset first_item. I'll look at this gain. The thing I liked about the bottom-up construction is that it is easier to collect multiple sets in parallel and combine them during the final roll-up. With the in-line nesting, you're tempted to try to construct the resulting JSON in a single series and that may not fit what the code is trying to do. For example, if I wanted to collect an array of error messages as they are generated and an array of argv arrays and any alias expansions, then put together a final JSON string containing them and the final exit code, I'd need to build it in parts. I can build these parts in pieces of JSON and combine them at the end -- or build up other similar data structures (string arrays, lists, or whatever) and then have a JSON conversion step. But we can make it work both ways, I just wanted to keep it simpler. I wasn't sure how to unit test the API from a shell script, so I added a helper command that does most of the work in the second patch. In general I'd really prefer to keep the shell script as the driver for the tests, and have t/helper programs just be conduits. E.g., something like: cat >expect <<-\EOF && {"key": "value", "foo": 42} EOF test-json-writer >actual \ object_begin \ object_append_string key value \ object_append_int foo 42 \ object_end && test_cmp expect actual It's a bit tedious (though fairly mechanical) to expose the API in this way, but it makes it much easier to debug, modify, or add tests later on (for example, I had to modify the C program to find out that my append example above wouldn't work). Yeah, I wasn't sure if such a simple api required exposing all that machinery to the shell or not. And the api is fairly self-contained and not depending on a lot of disk/repo setup or anything, so my tests would be essentially static WRT everything else. With my t0019 script you should have been able use -x -v to see what was failing. -Peff thanks for the quick review Jeff
Re: [PATCH 04/36] t/helper: merge test-lazy-init-name-hash into test-tool
On 3/18/2018 4:47 AM, Eric Sunshine wrote: On Sun, Mar 18, 2018 at 4:25 AM, Duy Nguyenwrote: On Sun, Mar 18, 2018 at 3:11 AM, Eric Sunshine wrote: On Sat, Mar 17, 2018 at 3:53 AM, Nguyễn Thái Ngọc Duy wrote: -extern int test_lazy_init_name_hash(struct index_state *istate, int try_threaded); +extern int lazy_init_name_hash_for_testing(struct index_state *istate, int try_threaded); I get why you renamed this since the "main" function in the test program wants to be called 'test_lazy_init_name_hash'... diff --git a/t/helper/test-lazy-init-name-hash.c b/t/helper/test-lazy-init-name-hash.c @@ -9,6 +10,9 @@ static int perf; +static int (*init_name_hash)(struct index_state *istate, int try_threaded) = + lazy_init_name_hash_for_testing; + @@ -33,9 +37,9 @@ static void dump_run(void) if (single) { - test_lazy_init_name_hash(_index, 0); + init_name_hash(_index, 0); ... but I'm having trouble understanding why this indirection through 'init_name_hash' is used rather than just calling lazy_init_name_hash_for_testing() directly. Am I missing something obvious or is 'init_name_hash' just an unneeded artifact of an earlier iteration before the rename in cache.{c,h}? Nope. It just feels too long to me and because we're already in the test I don't feel the need to repeat _for_testing everywhere. If it's confusing, I'll remove init_name_hash. Without an explanatory in-code comment, I'd guess that someone coming across this in the future will also stumble over it just like I did in the review. I agree with Eric, this indirection seems unnecessary and confusing. Generally, when I see function pointers initialized like that, I think that there are plans for additional providers/drivers for that functionality, but I don't see that here. And it seems odd. What if, instead, you rename test_lazy_init_name_hash() to lazy_init_name_hash_test(), shifting 'test' from the front to the back of the name? That way, the name remains the same length at the call sites in the test helper, and you don't have to introduce the confusing, otherwise unneeded 'init_name_hash'. I see 2 problems. 1. The test function in name-hash.c that needs access to private data. I'm not a fan of the "_for_testing" suffix, but I'm open. I might either leave it as is, or make it a "TEST_" or "test__" prefix (as a guide for people used to languages with namespaces. 2. The new name for cmd_main(). There's no real need why it needs to be "test_*", right? Your cmds[] maps the command line string to a function, but it could be anything. That is, "cmd_main()" could be renamed "cmd__lazy_init_name_hash()". Then you can conceptually reserve the "cmd__" prefix throughout t/helper for each test handler. Just a thought, Thanks, Jeff
Re: What's cooking in git.git (Feb 2018, #03; Wed, 21)
On 2/21/2018 7:31 PM, Junio C Hamano wrote: * jh/status-no-ahead-behind (2018-01-24) 4 commits - status: support --no-ahead-behind in long format - status: update short status to respect --no-ahead-behind - status: add --[no-]ahead-behind to status and commit for V2 format. - stat_tracking_info: return +1 when branches not equal "git status" can spend a lot of cycles to compute the relation between the current branch and its upstream, which can now be disabled with "--no-ahead-behind" option. At v5; is this ready for 'next'? yes, i believe so. thanks jeff
Re: What's cooking in git.git (Feb 2018, #01; Wed, 7)
* jh/status-no-ahead-behind (2018-01-24) 4 commits - status: support --no-ahead-behind in long format - status: update short status to respect --no-ahead-behind - status: add --[no-]ahead-behind to status and commit for V2 format. - stat_tracking_info: return +1 when branches not equal "git status" can spend a lot of cycles to compute the relation between the current branch and its upstream, which can now be disabled with "--no-ahead-behind" option. At v5; is this ready for 'next'? I believe so. I don't recall any further discussions on it. The only open question was the idea of trying to walk 100 or so commits and see if one was "close" to being an ancestor of the other, but we thought that Stolee's graph cache would be a better solution for that. So, yes, I think it is ready for 'next'. Thanks, Jeff
Re: partial fetch
On 2/12/2018 11:24 AM, Basin Ilya wrote: Hi. In 2017 a set of patches titled "add object filtering for partial fetch" was accepted. Is it what I think it is? Will we be able to download only a subdirectory from a large project? yes, that is the goal. there are several caveats, but yes, that is the goal. Jeff
Re: [RFC PATCH 000/194] Moving global state into the repository object
On 2/5/2018 6:51 PM, Stefan Beller wrote: This series moves a lot of global state into the repository struct. It applies on top of 2512f15446149235156528dafbe75930c712b29e (2.16.0) It can be found at https://github.com/stefanbeller/git/tree/object-store Motivation for this series: * Easier to reason about code when all state is stored in one place, for example for multithreading * Easier to move to processing submodules in-process instead of calling a processes for the submodule handling. The last patch demonstrates this. [...] Very nice. My eyes glazed over a few times, but I like the direction you're heading here. Thanks for tackling this! Jeff
Re: [PATCH v2 00/27] protocol version 2
On 1/25/2018 6:58 PM, Brandon Williams wrote: Changes in v2: * Added documentation for fetch * changes #defines for state variables to be enums * couple code changes to pkt-line functions and documentation * Added unit tests for the git-serve binary as well as for ls-refs [...] This looks really nice. I'm eager to get this in so we can do some additional commands to help make partial clone more efficient. Thanks, Jeff
Re: [PATCH 12/26] ls-refs: introduce ls-refs server command
On 1/2/2018 7:18 PM, Brandon Williams wrote: Introduce the ls-refs server command. In protocol v2, the ls-refs command is used to request the ref advertisement from the server. Since it is a command which can be requested (as opposed to mandatory in v1), a client can sent a number of parameters in its request to limit the ref advertisement based on provided ref-patterns. Signed-off-by: Brandon Williams--- Documentation/technical/protocol-v2.txt | 26 + Makefile| 1 + ls-refs.c | 97 + ls-refs.h | 9 +++ serve.c | 2 + 5 files changed, 135 insertions(+) create mode 100644 ls-refs.c create mode 100644 ls-refs.h diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index b87ba3816..5f4d0e719 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -89,3 +89,29 @@ terminate the connection. Commands are the core actions that a client wants to perform (fetch, push, etc). Each command will be provided with a list capabilities and arguments as requested by a client. + + Ls-refs +- + +Ls-refs is the command used to request a reference advertisement in v2. +Unlike the current reference advertisement, ls-refs takes in parameters +which can be used to limit the refs sent from the server. + +Ls-ref takes in the following parameters wraped in packet-lines: + + symrefs: In addition to the object pointed by it, show the underlying + ref pointed by it when showing a symbolic ref. + peel: Show peeled tags. + ref-pattern : When specified, only references matching the +given patterns are displayed. + +The output of ls-refs is as follows: + +output = *ref +flush-pkt +ref = PKT-LINE((tip | peeled) LF) +tip = obj-id SP refname (SP symref-target) +peeled = obj-id SP refname "^{}" + +symref = PKT-LINE("symref" SP symbolic-ref SP resolved-ref LF) +shallow = PKT-LINE("shallow" SP obj-id LF) Do you want to talk about ordering requirements on this? I think packed-refs has one, but I'm not sure it matters here where the client or server sorts it. Are there any provisions for compressing the renames, like in the reftable spec or in index-v4 ? It doesn't need to be in the initial version. Just asking. We could always add a "ls-refs-2" command that builds upon this. Thanks, Jeff
Re: [PATCH 11/26] serve: introduce git-serve
On 2/1/2018 1:57 PM, Stefan Beller wrote: On Thu, Feb 1, 2018 at 10:48 AM, Jeff Hostetler <g...@jeffhostetler.com> wrote: On 1/2/2018 7:18 PM, Brandon Williams wrote: Introduce git-serve, the base server for protocol version 2. [...] + Special Packets +- + +In protocol v2 these special packets will have the following semantics: + + * '' Flush Packet (flush-pkt) - indicates the end of a message + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message Previously, a 0001 pkt-line meant that there was 1 byte of data following, right? No, the length was including the length field, so 0005 would indicate that there is one byte following, (+4 bytes of "0005" included) d'oh. right. thanks! Should we also consider increasing the pkt-line limit to 5 hex-digits while we're at it ? That would let us have 1MB buffers if that would help with large packfiles. AFAICT there is a static allocation of one pkt-line (of maximum size), such that the code can read in a full packet and then process it. If we'd increase the packet size we'd need the static buffer to be 1MB, which sounds good for my developer machine. But I suspect it may be too much for people using git on embedded devices? I got burned by that static buffer once upon a time when I wanted to have 2 streams going at the same time. Hopefully, we can move that into the new reader structure at some point (if it isn't already). pack files larger than 64k are put into multiple pkt-lines, which is not a big deal, as the overhead of 4bytes per 64k is negligible. (also there is progress information in the side channel, which would come in as a special packet in between real packets, such that every 64k transmitted you can update your progress meter; Not sure I feel strongly on fewer progress updates) Granted, we're throttled by the network, so it might not matter. Would it be interesting to have a 5 digit prefix with parts of the high bits of first digit being flags ? Or is this too radical of a change? What would the flags be for? As an alternative we could put the channel number in one byte, such that we can have a side channel not just while streaming the pack but all the time. (Again, not sure if that buys a lot for us) Delimiters like the 0001 and the side channel are a couple of ideas, but I was just thinking out loud. And right, I'm not sure it gets us much right now. Jeff
Re: [PATCH 11/26] serve: introduce git-serve
On 1/2/2018 7:18 PM, Brandon Williams wrote: Introduce git-serve, the base server for protocol version 2. Protocol version 2 is intended to be a replacement for Git's current wire protocol. The intention is that it will be a simpler, less wasteful protocol which can evolve over time. Protocol version 2 improves upon version 1 by eliminating the initial ref advertisement. In its place a server will export a list of capabilities and commands which it supports in a capability advertisement. A client can then request that a particular command be executed by providing a number of capabilities and command specific parameters. At the completion of a command, a client can request that another command be executed or can terminate the connection by sending a flush packet. Signed-off-by: Brandon Williams--- .gitignore | 1 + Documentation/technical/protocol-v2.txt | 91 Makefile| 2 + builtin.h | 1 + builtin/serve.c | 30 git.c | 1 + serve.c | 239 serve.h | 15 ++ 8 files changed, 380 insertions(+) create mode 100644 Documentation/technical/protocol-v2.txt create mode 100644 builtin/serve.c create mode 100644 serve.c create mode 100644 serve.h diff --git a/.gitignore b/.gitignore index 833ef3b0b..2d0450c26 100644 --- a/.gitignore +++ b/.gitignore @@ -140,6 +140,7 @@ /git-rm /git-send-email /git-send-pack +/git-serve /git-sh-i18n /git-sh-i18n--envsubst /git-sh-setup diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt new file mode 100644 index 0..b87ba3816 --- /dev/null +++ b/Documentation/technical/protocol-v2.txt @@ -0,0 +1,91 @@ + Git Wire Protocol, Version 2 +== + +This document presents a specification for a version 2 of Git's wire +protocol. Protocol v2 will improve upon v1 in the following ways: + + * Instead of multiple service names, multiple commands will be +supported by a single service. + * Easily extendable as capabilities are moved into their own section +of the protocol, no longer being hidden behind a NUL byte and +limited by the size of a pkt-line (as there will be a single +capability per pkt-line). + * Separate out other information hidden behind NUL bytes (e.g. agent +string as a capability and symrefs can be requested using 'ls-refs') + * Reference advertisement will be omitted unless explicitly requested + * ls-refs command to explicitly request some refs + + Detailed Design += + +A client can request to speak protocol v2 by sending `version=2` in the +side-channel `GIT_PROTOCOL` in the initial request to the server. + +In protocol v2 communication is command oriented. When first contacting a +server a list of capabilities will advertised. Some of these capabilities +will be commands which a client can request be executed. Once a command +has completed, a client can reuse the connection and request that other +commands be executed. + + Special Packets +- + +In protocol v2 these special packets will have the following semantics: + + * '' Flush Packet (flush-pkt) - indicates the end of a message + * '0001' Delimiter Packet (delim-pkt) - separates sections of a message Previously, a 0001 pkt-line meant that there was 1 byte of data following, right? Does this change that and/or prevent 1 byte packets? (Not sure if it is likely, but the odd-tail of a packfile might get sent in a 0001 line, right?) Or is it that 0001 is only special during the V2 negotiation stuff, but not during the packfile transmission? (I'm not against having this delimiter -- I think it is useful, but just curious if will cause problems elsewhere.) Should we also consider increasing the pkt-line limit to 5 hex-digits while we're at it ? That would let us have 1MB buffers if that would help with large packfiles. Granted, we're throttled by the network, so it might not matter. Would it be interesting to have a 5 digit prefix with parts of the high bits of first digit being flags ? Or is this too radical of a change? + + Capability Advertisement +-- + +A server which decides to communicate (based on a request from a client) +using protocol version 2, notifies the client by sending a version string +in its initial response followed by an advertisement of its capabilities. +Each capability is a key with an optional value. Clients must ignore all +unknown keys. Semantics of unknown values are left to the definition of +each key. Some capabilities will describe commands which can be requested +to be executed by the client. + +capability-advertisement = protocol-version +
Re: [RFC PATCH 0/1] Implement CMake build
On 1/25/2018 7:21 PM, Isaac Hier wrote: Hi Jeff, I have been looking at the build generator, which looks promising, but I have one concern. Assuming I can generate a CMakeLists.txt that appropriately updates the library sources, etc. how do you suggest I handle new portability macros? For example, assume someone adds a macro HAVE_X to indicate the availability of some platform-specific function x. In the current Makefile, a comment would be added to the top indicating when HAVE_X or NO_X should be set, and that option would toggle the HAVE_X C macro. But CMake can test for the availability of x, which is one of the main motives for adding a CMake build. The current build generator uses the output of make, so all it would know is whether or not HAVE_X is defined on the platform that ran the Makefile, but not the entire list of platform that git supports. Bottom line: should I add the portability tests as they are now, without accounting for future portability macros? One good alternative might be to suggest the authors of new portability macros include a small sample C program to test it. That would allow me to easily patch the CMake tests whenever that came up. In a best case scenario, a practice could be established to write the test in a specific directory with a certain name so that I could automatically update the CMake tests from the build generator. Thanks for the help, Isaac It's been years since I've used cmake as anything other than a casual (downstream) consumer, so I'm not sure I can answer your questions. The vcxproj target we have is a bit of a hack to automatically capture the set of source files and target libraries and executables. We don't try to capture the spirit of all of the HAVE_ and NO_ flags when we build the *.vcxproj files. And we make some assumptions in the generation template for the usual VC/VS settings. But then Windows is a single target and we don't have to worry about some things (like whether or not qsort is present). I don't want to discourage you from attempting this. (And I realize that my initial response might have given that impression -- I mainly wanted to say that we don't currently have a problem on Windows with the current Makefile situation.) A full cmake system would let us simplify some things, but it also complicates some things. So we might be trading one set of problems for another. For example, the libgit2 project uses cmake. Not to pick on them, but when I look at it, I see a lot of the same issues (but perhaps with better syntax than the makefile). https://github.com/libgit2/libgit2/blob/master/CMakeLists.txt As to your portability test questions, I'm afraid I don't know. Sorry, Jeff
Re: [RFC PATCH 0/1] Implement CMake build
On 1/24/2018 2:59 PM, Isaac Hier wrote: Jeff, no worries, fair enough. I know https://github.com/grpc/grpc uses a shared file to generate code for several build systems instead of maintaining them individually. I plan on doing the work anyway just because I have my own reasons to use CMake in Git (for packaging in https://github.com/ruslo/hunter is my main motive here). Whether or not it is maintained upstream is not a real concern for me at the moment. [...] I'll see how the Windows build currently works and if that makes sense, maybe I'll try using that build generator here too. Thanks for the feedback, Isaac Look at the "vcxproj:" target in config.mak.uname (in the GfW repo). Jeff
Re: [RFC PATCH 0/1] Implement CMake build
On 1/22/2018 7:16 PM, Isaac Hier wrote: This patch adds a mostly complete (aside from building tests, documentation, installation, etc.) CMake build to the git project. I am not sure how much interest there is in a CMake build, so please send me feedback one way or another. Personally, I believe CMake will help with Windows builds and is somewhat easier to read than a Makefile. I considered, adding this to the contrib directory, but CMakeLists.txt almost always reside in the original directories, and I'm not sure how wise it would be to do otherwise. If you are interested in a CMake build, I would be more than happy to finish up the work here. Decided to wait until I discussed the issue here to finish the final parts of the build. On Windows, we use "bash" and "make" from the Git-for-Windows SDK installation (which gives us a bash shell and most of the usual Unix command line tools) and the main "Makefile". We do need a special section in the "config.mak.uname" file to set some platform compiler options and etc., but that is small enough. Johannes and I recently added a few new options to let Windows build Git from the command line with either GCC or MSVC and to synthesize MSVS solution (.sln) and project (.vcxproj) files to allow you to work with the full MSVS IDE and full intellisense. And if necessary download and build third-party libraries not normally present on a Windows machine. Most of this work is Windows specific and may not yet be upstream. See GfW [1] and VCPKG [2]. The synthesized solution and project files are automatically generated, so we do not have to separately track changes in the Makefile to the various file lists. These should be treated as read-only and re-generated in response to changes in the Makefile. Using the solution/project files, we can completely build Git in the IDE or a command prompt and without the SDK. This further simplifies things for Windows developers. So given that, I don't see a need to replace the main Makefile on Windows. Sorry, Jeff [1] https://github.com/git-for-windows/git [2] https://github.com/Microsoft/vcpkg
Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)
On 1/23/2018 5:39 PM, Junio C Hamano wrote: [Stalled] * jh/status-no-ahead-behind (2018-01-04) 4 commits - status: support --no-ahead-behind in long format - status: update short status to respect --no-ahead-behind - status: add --[no-]ahead-behind to status and commit for V2 format. - stat_tracking_info: return +1 when branches not equal Expecting a reroll to finalize the topic. I sent a V5 (1/9/2018) of this series that removed the config setting. It looks like your branch on GH only has the V4 version. WRT the reroll: There was a long discussion on a limited calculation (option (2)). While I appreciate how that would be nice to have (for some simple cases -- like my current feature branch relative to my upstream), I think that that is something we could add later if we want. (*) This patch gives a simple on/off. This has value by itself for certain situations. (*) To implement a limit would involve hacking the merge-base calculation which gets into graph shape questions that I don't think we want to get into. Several messages in the thread talk about some example shapes, but I don't think we have consensus on what we want here. (*) Later this week Stolee will be submitting a patch series that add a client-side commit graph cache. This allows fast calculations and includes generation numbers. I'd rather wait until we have this in and then see if we want/need the limited option. Thanks, Jeff
Re: [PATCH v2 2/2] sha1_file: improve sha1_file_name() perfs
On 1/17/2018 12:54 PM, Christian Couder wrote: As sha1_file_name() could be performance sensitive, let's try to make it faster by seeding the initial buffer size to avoid any need to realloc and by using strbuf_addstr() and strbuf_addc() instead of strbuf_addf(). Helped-by: Derrick Stolee <sto...@gmail.com> Helped-by: Jeff Hostetler <g...@jeffhostetler.com> Signed-off-by: Christian Couder <chrisc...@tuxfamily.org> --- sha1_file.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/sha1_file.c b/sha1_file.c index f66c21b2da..1a94716962 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -323,8 +323,14 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) void sha1_file_name(struct strbuf *buf, const unsigned char *sha1) { - strbuf_addf(buf, "%s/", get_object_directory()); + const char *obj_dir = get_object_directory(); + size_t extra = strlen(obj_dir) + 1 + GIT_MAX_HEXSZ; Very minor nit. Should this be "+3" rather than "+1"? One for the slash after obj_dir, one for the slash between "xx/y[38]", and one for the trailing NUL. + if (extra > strbuf_avail(buf)) + strbuf_grow(buf, extra); + + strbuf_addstr(buf, obj_dir); + strbuf_addch(buf, '/'); fill_sha1_path(buf, sha1); }
Re: [PATCH] sha1_file: remove static strbuf from sha1_file_name()
On 1/16/2018 9:01 AM, Derrick Stolee wrote: On 1/16/2018 2:18 AM, Christian Couder wrote: Using a static buffer in sha1_file_name() is error prone and the performance improvements it gives are not needed in most of the callers. So let's get rid of this static buffer and, if necessary or helpful, let's use one in the caller. First: this is a good change for preventing bugs in the future. Do not let my next thought deter you from making this change. Second: I wonder if there is any perf hit now that we are allocating buffers much more often. Also, how often does get_object_directory() change, so in some cases we could cache the buffer and only append the parts for the loose object (and not reallocate because the filenames will have equal length). I'm concerned about the perf implications when inspecting many loose objects (100k+) but these code paths seem to be involved with more substantial work, such as opening and parsing the objects, so keeping a buffer in-memory is probably unnecessary. --- cache.h | 8 +++- http-walker.c | 6 -- http.c | 16 ++-- sha1_file.c | 38 +- 4 files changed, 42 insertions(+), 26 deletions(-) diff --git a/cache.h b/cache.h index d8b975a571..6db565408e 100644 --- a/cache.h +++ b/cache.h @@ -957,12 +957,10 @@ extern void check_repository_format(void); #define TYPE_CHANGED 0x0040 /* - * Return the name of the file in the local object database that would - * be used to store a loose object with the specified sha1. The - * return value is a pointer to a statically allocated buffer that is - * overwritten each time the function is called. + * Put in `buf` the name of the file in the local object database that + * would be used to store a loose object with the specified sha1. */ -extern const char *sha1_file_name(const unsigned char *sha1); +extern void sha1_file_name(struct strbuf *buf, const unsigned char *sha1); /* * Return an abbreviated sha1 unique within this repository's object database. diff --git a/http-walker.c b/http-walker.c index 1ae8363de2..07c2b1af82 100644 --- a/http-walker.c +++ b/http-walker.c @@ -544,8 +544,10 @@ static int fetch_object(struct walker *walker, unsigned char *sha1) } else if (hashcmp(obj_req->sha1, req->real_sha1)) { ret = error("File %s has bad hash", hex); } else if (req->rename < 0) { - ret = error("unable to write sha1 filename %s", - sha1_file_name(req->sha1)); + struct strbuf buf = STRBUF_INIT; + sha1_file_name(, req->sha1); + ret = error("unable to write sha1 filename %s", buf.buf); + strbuf_release(); } release_http_object_request(req); diff --git a/http.c b/http.c index 5977712712..5979305bc9 100644 --- a/http.c +++ b/http.c @@ -2168,7 +2168,7 @@ struct http_object_request *new_http_object_request(const char *base_url, unsigned char *sha1) { char *hex = sha1_to_hex(sha1); - const char *filename; + struct strbuf filename = STRBUF_INIT; char prevfile[PATH_MAX]; int prevlocal; char prev_buf[PREV_BUF_SIZE]; @@ -2180,14 +2180,15 @@ struct http_object_request *new_http_object_request(const char *base_url, hashcpy(freq->sha1, sha1); freq->localfile = -1; - filename = sha1_file_name(sha1); + sha1_file_name(, sha1); snprintf(freq->tmpfile, sizeof(freq->tmpfile), - "%s.temp", filename); + "%s.temp", filename.buf); - snprintf(prevfile, sizeof(prevfile), "%s.prev", filename); + snprintf(prevfile, sizeof(prevfile), "%s.prev", filename.buf); unlink_or_warn(prevfile); rename(freq->tmpfile, prevfile); unlink_or_warn(freq->tmpfile); + strbuf_release(); if (freq->localfile != -1) error("fd leakage in start: %d", freq->localfile); @@ -2302,6 +2303,7 @@ void process_http_object_request(struct http_object_request *freq) int finish_http_object_request(struct http_object_request *freq) { struct stat st; + struct strbuf filename = STRBUF_INIT; close(freq->localfile); freq->localfile = -1; @@ -2327,8 +2329,10 @@ int finish_http_object_request(struct http_object_request *freq) unlink_or_warn(freq->tmpfile); return -1; } - freq->rename = - finalize_object_file(freq->tmpfile, sha1_file_name(freq->sha1)); + + sha1_file_name(, freq->sha1); + freq->rename = finalize_object_file(freq->tmpfile, filename.buf); + strbuf_release(); return freq->rename; } diff --git a/sha1_file.c b/sha1_file.c index 3da70ac650..f66c21b2da 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -321,15 +321,11 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) } } -const char *sha1_file_name(const unsigned char *sha1) +void sha1_file_name(struct strbuf *buf, const unsigned char *sha1) { - static struct strbuf buf = STRBUF_INIT; - - strbuf_reset(); - strbuf_addf(,
Re: What's cooking in git.git (Jan 2018, #02; Tue, 9)
On 1/10/2018 2:57 PM, Junio C Hamano wrote: Jeff Hostetler <g...@jeffhostetler.com> writes: On 1/9/2018 6:33 PM, Junio C Hamano wrote: -- [Cooking] * jh/fsck-promisors (2017-12-08) 10 commits [...] * jh/partial-clone (2017-12-08) 13 commits [...] Parts 2 and 3 of partial clone have been simmering for a while now. I was wondering if there were any more comments or questions on them. I don't recall any existing issues. Me neither. I do not mind merging them to 'next' during the feature freeze, but we won't be merging them to 'master' soon anyway, and I'd like to see us concentrate more on finding and fixing regressions on the 'master' front for now. I didn't want to rush this -- just check to see if there was any thing that I should focus on while waiting for 2.16 to ship and settle down. Leave this where it is now if you want and we can merge it in later. Thanks Jeff
Re: What's cooking in git.git (Jan 2018, #02; Tue, 9)
On 1/9/2018 6:33 PM, Junio C Hamano wrote: -- [Cooking] * jh/fsck-promisors (2017-12-08) 10 commits [...] * jh/partial-clone (2017-12-08) 13 commits [...] Parts 2 and 3 of partial clone have been simmering for a while now. I was wondering if there were any more comments or questions on them. I don't recall any existing issues. Thanks Jeff