Re: [PATCH] sub-process: refactor handshake to common function

2017-07-26 Thread Junio C Hamano
Lars Schneider  writes:

> Please note that I've recently refactored the capabilities negotiation a bit:
> https://github.com/git/git/commit/1514c8edd62d96006cd1de31e906ed5798dd4681
>
> This change is still cooking in `next`. I am not sure how this should/could
> be handled but maybe you can use my refactoring as your base?

I think they can play well together as independent topics.  The
known_caps[] thing you introduced essentially is the same as the
struct subprocess_capability capablities[] Jonathan has.

Please check

$ git show "pu^{/jt/subprocess-handshake' into pu}" convert.c

to see if it makes sense.

Thanks.






Re: [PATCH] sub-process: refactor handshake to common function

2017-07-26 Thread Lars Schneider

> On 24 Jul 2017, at 23:38, Jonathan Tan  wrote:
> 
> Refactor, into a common function, the version and capability negotiation
> done when invoking a long-running process as a clean or smudge filter.
> This will be useful for other Git code that needs to interact similarly
> with a long-running process.
> 
> Signed-off-by: Jonathan Tan 
> ---
> This will be useful for anyone implementing functionality like that in
> [1].
> 
> It is unfortunate that the code grew larger because it had to be more
> generic, but I think this is better than having (in the future) 2 or
> more separate handshake functions.
> 
> I also don't think that the protocol should be permissive - I think
> things would be simpler if all protocol errors were fatal errors. As it
> is, most parts are permissive, but packet_read_line() already dies
> anyway upon a malformed packet, so it may not be too drastic a change to
> change this. For reference, the original protocol comes from [2].
> 
> [1] 
> https://public-inbox.org/git/20170714132651.170708-2-benpe...@microsoft.com/
> [2] edcc858 ("convert: add filter..process option", 2016-10-17)

Thanks for this refactoring! That's great!

Please note that I've recently refactored the capabilities negotiation a bit:
https://github.com/git/git/commit/1514c8edd62d96006cd1de31e906ed5798dd4681

This change is still cooking in `next`. I am not sure how this should/could
be handled but maybe you can use my refactoring as your base?

Thanks,
Lars


> ---
> convert.c | 61 -
> pkt-line.c| 19 ---
> pkt-line.h|  2 --
> sub-process.c | 94 +++
> sub-process.h | 18 ++
> t/t0021-conversion.sh |  2 +-
> 6 files changed, 120 insertions(+), 76 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index deaf0ba7b..ec8ecc2ea 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
> 
> static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
> {
> - int err;
> + static int versions[] = {2, 0};
> + static struct subprocess_capability capabilities[] = {
> + {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
> + };
>   struct cmd2process *entry = (struct cmd2process *)subprocess;
> - struct string_list cap_list = STRING_LIST_INIT_NODUP;
> - char *cap_buf;
> - const char *cap_name;
> - struct child_process *process = >process;
> - const char *cmd = subprocess->cmd;
> -
> - sigchain_push(SIGPIPE, SIG_IGN);
> -
> - err = packet_writel(process->in, "git-filter-client", "version=2", 
> NULL);
> - if (err)
> - goto done;
> -
> - err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
> - if (err) {
> - error("external filter '%s' does not support filter protocol 
> version 2", cmd);
> - goto done;
> - }
> - err = strcmp(packet_read_line(process->out, NULL), "version=2");
> - if (err)
> - goto done;
> - err = packet_read_line(process->out, NULL) != NULL;
> - if (err)
> - goto done;
> -
> - err = packet_writel(process->in, "capability=clean", 
> "capability=smudge", NULL);
> -
> - for (;;) {
> - cap_buf = packet_read_line(process->out, NULL);
> - if (!cap_buf)
> - break;
> - string_list_split_in_place(_list, cap_buf, '=', 1);
> -
> - if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, 
> "capability"))
> - continue;
> -
> - cap_name = cap_list.items[1].string;
> - if (!strcmp(cap_name, "clean")) {
> - entry->supported_capabilities |= CAP_CLEAN;
> - } else if (!strcmp(cap_name, "smudge")) {
> - entry->supported_capabilities |= CAP_SMUDGE;
> - } else {
> - warning(
> - "external filter '%s' requested unsupported 
> filter capability '%s'",
> - cmd, cap_name
> - );
> - }
> -
> - string_list_clear(_list, 0);
> - }
> -
> -done:
> - sigchain_pop(SIGPIPE);
> 
> - return err;
> + return subprocess_handshake(subprocess, "git-filter-", versions, NULL,
> + capabilities,
> + >supported_capabilities);
> }
> 
> static int apply_multi_file_filter(const char *path, const char *src, size_t 
> len,
> diff --git a/pkt-line.c b/pkt-line.c
> index 9d845ecc3..7db911957 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,25 +171,6 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>   return status;
> }
> 
> -int packet_writel(int fd, const char *line, ...)
> -{
> - va_list args;
> - int err;
> - 

Re: [PATCH] sub-process: refactor handshake to common function

2017-07-25 Thread Jonathan Tan
On Tue, 25 Jul 2017 10:38:51 -0400
Ben Peart  wrote:

> Christian Couder has been working on a similar refactoring for the perl 
> versions of very similar helper functions.  They can be found in the 
> following patch series:
> 
> https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/
> 
> In particular:
> 
>  - Patches 02/49 to 08/49 create a Git/Packet.pm module by
>refactoring "t0021/rot13-filter.pl". Functions from this new
>module will be used later in test scripts.
> 
> Some differences I noticed: this patch does both the version and 
> capability negotiation and it lives in "sub-process.h/c" files.  In the 
> perl script patch series, the version negotiation is in the new 
> "packet.pm" module but it does not include the capability negotiation.
> 
> It would make sense to me for these to have the same API and usage 
> behaviors.  Perhaps pull them together into a single patch series so 
> that we can review and keep them in sync?

Thanks for the pointer. These are different languages and different use
cases (mine for production, his for tests), though, so I don't think
consistency in API and behavior is practical.

> > It is unfortunate that the code grew larger because it had to be more
> > generic, but I think this is better than having (in the future) 2 or
> > more separate handshake functions.
> 
> I'm always happy to see an effort to refactor common code into reusable 
> APIs.  Its a good engineering practice that makes it easier to 
> stabilize, extend and maintain the code. The challenge is making sure 
> the common function supports all the requirements of the various callers 
> and that the increase in complexity doesn't outweigh the benefit of 
> centralizing the logic.

Agreed - thanks.

> > +int subprocess_handshake(struct subprocess_entry *entry,
> > +const char *welcome_prefix,
> > +int *versions,
> > +int *chosen_version,
> > +struct subprocess_capability *capabilities,
> > +unsigned int *supported_capabilities) {
> > +   int version_scratch;
> > +   unsigned int capabilities_scratch;
> > +   struct child_process *process = >process;
> > +   int i;
> > +   char *line;
> > +   const char *p;
> > +
> > +   if (!chosen_version)
> > +   chosen_version = _scratch;
> > +   if (!supported_capabilities)
> > +   supported_capabilities = _scratch;
> > +
> > +   sigchain_push(SIGPIPE, SIG_IGN);
> > +
> > +   if (packet_write_fmt_gently(process->in, "%sclient\n",
> > +   welcome_prefix)) {
> > +   error("Could not write client identification");
> > +   goto error;
> > +   }
> > +   for (i = 0; versions[i]; i++) {
> > +   if (packet_write_fmt_gently(process->in, "version=%d\n",
> > +   versions[i])) {
> > +   error("Could not write requested version");
> > +   goto error;
> > +   }
> > +   }
> > +   if (packet_flush_gently(process->in))
> > +   goto error;
> > +
> > +   if (!(line = packet_read_line(process->out, NULL)) ||
> > +   !skip_prefix(line, welcome_prefix, ) ||
> > +   strcmp(p, "server")) {
> > +   error("Unexpected line '%s', expected %sserver",
> > + line ? line : "", welcome_prefix);
> > +   goto error;
> > +   }
> > +   if (!(line = packet_read_line(process->out, NULL)) ||
> > +   !skip_prefix(line, "version=", ) ||
> > +   strtol_i(p, 10, chosen_version)) {
> > +   error("Unexpected line '%s', expected version",
> > + line ? line : "");
> > +   goto error;
> > +   }
> > +   for (i = 0; versions[i]; i++) {
> > +   if (versions[i] == *chosen_version)
> > +   goto version_found;
> > +   }
> 
> This doesn't look like it supports negotiating a common version between 
> the client and server.  If a client supports version 1, 2, and 3 and the 
> server only supports version 1 and 2, which version and capabilities 
> will be used?

In the protocol, the client sends a list of versions it supports (see
the "for" loop above), and expects the server to write the single
version that the server chooses (see the part where we read one single
version line). In your example, the client would write "version=1\n",
"version=2\n", and "version=3\n", and the server would write
"version=2\n" (well, it could write "version=1\n" too). So version 2 (or
1) will be used.


Re: [PATCH] sub-process: refactor handshake to common function

2017-07-25 Thread Ben Peart



On 7/24/2017 5:38 PM, Jonathan Tan wrote:

Refactor, into a common function, the version and capability negotiation
done when invoking a long-running process as a clean or smudge filter.
This will be useful for other Git code that needs to interact similarly
with a long-running process.



Christian Couder has been working on a similar refactoring for the perl 
versions of very similar helper functions.  They can be found in the 
following patch series:


https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/

In particular:

- Patches 02/49 to 08/49 create a Git/Packet.pm module by
  refactoring "t0021/rot13-filter.pl". Functions from this new
  module will be used later in test scripts.

Some differences I noticed: this patch does both the version and 
capability negotiation and it lives in "sub-process.h/c" files.  In the 
perl script patch series, the version negotiation is in the new 
"packet.pm" module but it does not include the capability negotiation.


It would make sense to me for these to have the same API and usage 
behaviors.  Perhaps pull them together into a single patch series so 
that we can review and keep them in sync?



Signed-off-by: Jonathan Tan 
---
This will be useful for anyone implementing functionality like that in
[1].

It is unfortunate that the code grew larger because it had to be more
generic, but I think this is better than having (in the future) 2 or
more separate handshake functions.



I'm always happy to see an effort to refactor common code into reusable 
APIs.  Its a good engineering practice that makes it easier to 
stabilize, extend and maintain the code. The challenge is making sure 
the common function supports all the requirements of the various callers 
and that the increase in complexity doesn't outweigh the benefit of 
centralizing the logic.



I also don't think that the protocol should be permissive - I think
things would be simpler if all protocol errors were fatal errors. As it
is, most parts are permissive, but packet_read_line() already dies
anyway upon a malformed packet, so it may not be too drastic a change to
change this. For reference, the original protocol comes from [2].

[1] https://public-inbox.org/git/20170714132651.170708-2-benpe...@microsoft.com/
[2] edcc858 ("convert: add filter..process option", 2016-10-17)
---
  convert.c | 61 -
  pkt-line.c| 19 ---
  pkt-line.h|  2 --
  sub-process.c | 94 +++
  sub-process.h | 18 ++
  t/t0021-conversion.sh |  2 +-
  6 files changed, 120 insertions(+), 76 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b..ec8ecc2ea 100644
--- a/convert.c
+++ b/convert.c
@@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
  
  static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)

  {
-   int err;
+   static int versions[] = {2, 0};
+   static struct subprocess_capability capabilities[] = {
+   {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
+   };
struct cmd2process *entry = (struct cmd2process *)subprocess;
-   struct string_list cap_list = STRING_LIST_INIT_NODUP;
-   char *cap_buf;
-   const char *cap_name;
-   struct child_process *process = >process;
-   const char *cmd = subprocess->cmd;
-
-   sigchain_push(SIGPIPE, SIG_IGN);
-
-   err = packet_writel(process->in, "git-filter-client", "version=2", 
NULL);
-   if (err)
-   goto done;
-
-   err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
-   if (err) {
-   error("external filter '%s' does not support filter protocol version 
2", cmd);
-   goto done;
-   }
-   err = strcmp(packet_read_line(process->out, NULL), "version=2");
-   if (err)
-   goto done;
-   err = packet_read_line(process->out, NULL) != NULL;
-   if (err)
-   goto done;
-
-   err = packet_writel(process->in, "capability=clean", 
"capability=smudge", NULL);
-
-   for (;;) {
-   cap_buf = packet_read_line(process->out, NULL);
-   if (!cap_buf)
-   break;
-   string_list_split_in_place(_list, cap_buf, '=', 1);
-
-   if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, 
"capability"))
-   continue;
-
-   cap_name = cap_list.items[1].string;
-   if (!strcmp(cap_name, "clean")) {
-   entry->supported_capabilities |= CAP_CLEAN;
-   } else if (!strcmp(cap_name, "smudge")) {
-   entry->supported_capabilities |= CAP_SMUDGE;
-   } else {
-   warning(
-   "external filter '%s' requested unsupported filter 
capability '%s'",
- 

Re: [PATCH] sub-process: refactor handshake to common function

2017-07-24 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> Refactor, into a common function, the version and capability negotiation
> done when invoking a long-running process as a clean or smudge filter.
> This will be useful for other Git code that needs to interact similarly
> with a long-running process.
>
> Signed-off-by: Jonathan Tan 
> ---

Sounds like a sensible thing to do.

[...]
> --- a/sub-process.h
> +++ b/sub-process.h
> @@ -18,6 +18,11 @@ struct subprocess_entry {
>   struct child_process process;
>  };
>  
> +struct subprocess_capability {
> + const char *name;
> + unsigned int flag;

What does this flag represent?  What values can it have?  A comment
might help.

[...]
> @@ -41,6 +46,19 @@ static inline struct child_process 
> *subprocess_get_child_process(
>   return >process;
>  }
>  
> +/*
> + * Perform the handshake to a long-running process as described in the
> + * gitattributes documentation using the given requested versions and
> + * capabilities. The "versions" and "capabilities" parameters are arrays
> + * terminated by a 0 or blank struct.
> + */
> +int subprocess_handshake(struct subprocess_entry *entry,
> +  const char *welcome_prefix,
> +  int *versions,
> +  int *chosen_version,
> +  struct subprocess_capability *capabilities,
> +  unsigned int *supported_capabilities);
> +

Is there a more precise pointer within the gitattributes documentation
that describes what this does?  I searched for "handshake" and found
nothing.  Is the "Long Running Filter Process" section where I should
have started?

The API docs for this header are currently in
Documentation/technical/api-sub-process.txt.  Perhaps these docs
should also go there, or, even better, the docs in
Documentation/technical/ could move to this header in a preparatory
patch.

Aside (not about this patch): why is the subprocess object called
struct subprocess_entry?  Would it make sense to rename it to struct
subprocess?

Back to this function.  Is this a function I should only call once,
when first launching a subprocess, or can I call it again later?  A
note about suggested usage might help.

[...]
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -697,7 +697,7 @@ test_expect_success PERL 'invalid process filter must 
> fail (and not hang!)' '
>  
>   cp "$TEST_ROOT/test.o" test.r &&
>   test_must_fail git add . 2>git-stderr.log &&
> - grep "does not support filter protocol version" git-stderr.log
> + grep "expected git-filter-server" git-stderr.log

This error message looks a little less friendly than the old one.
Is that okay because it is not supposed to happen in practice?

[...]
> --- a/convert.c
> +++ b/convert.c
> @@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
>  
>  static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  {
> - int err;
> - struct cmd2process *entry = (struct cmd2process *)subprocess;
[... many lines snipped ...]
> - return err;
> + static int versions[] = {2, 0};
> + static struct subprocess_capability capabilities[] = {
> + {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
> + };
> + struct cmd2process *entry = (struct cmd2process *)subprocess;
> +
> + return subprocess_handshake(subprocess, "git-filter-", versions, NULL,
> + capabilities,
> + >supported_capabilities);
>  }

API looks nice.

[...]
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -105,3 +105,97 @@ int subprocess_start(struct hashmap *hashmap, struct 
> subprocess_entry *entry, co

Implementation looks sane from a quick glance.

Thanks,
Jonathan