Hi Junio,

On Sun, 14 Aug 2016, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schinde...@gmx.de> writes:
> > -   test=$(echo "$cmd" | sed -e 's|[/ ][/ ]*|_|g')
> > +   test=$(echo "$cmd" | sed -e 's|[/ ][/ ]*|_|g' -e 'y/>/_/')
> The existing sed scriptlet says "we cannot have slash and do not
> want to have space in filename, so we squash runs of them to a
> single underscore".  If you have more characters that you do not
> want, you should add that to the existing set instead.

No, we cannot do that. I even mentioned it in my commit message why:

        We have to take particular care not to confound the existing
        conversion of unwanted characters to underscores with the new
        substitution of '>': the existing conversion chose to collapse
        runs of multiple unwanted characters into a single underscore. If
        we allowed '>' to be collapsed, too, the file name generated from
        the command "diff [...]=-- [...]" would be identical to the one
        generated from "diff [...]=--> [...]".

And as there is exactly this case (two command-lines, differing only in
the '>' character), doing what you suggested would *break* the test since
it would now look at the wrong file.

I know that this is so because my first iteration of the patch did exactly
what you suggested.

> While you are at it, it may be sensible to add a colon to that set, too,
> no?
> Something like this, perhaps?
> > -   test=$(echo "$cmd" | sed -e 's|[/ ][/ ]*|_|g')
> > +   test=$(echo "$cmd" | sed -e 's|[/ <:>][/ <:>]*|_|g')

Maybe. But what other characters are missing? Those are not the only ones
illegal on Windows: [/ \0-\x1f"*:<>?\\|] would be a more complete version
of what you suggested. Except that it is not a valid basic regex.

But is that really necessary?

I think not, for the following reasons:

- my patch was specifically designed to stop my CI from pestering me about
  a totally broken revision that cannot even be checked out (and causes
  subsequent problems because of it),

- as such, my patch was meant not to be an all-encompassing solution to
  the problem of filenames that are illegal on Windows, but really a tiny
  patch that you could apply as quickly as possible so that my CI jobs
  would stop pestering me (which they did not, because `pu` is still

- I even meant this little patch to be so small that it can be easily
  squashed into Jacob's patch,

- I do not want to complicate regular expressions unless *really* needed
  (and you have to admit that we do not need to address any more
  characters than the '>' *right now*), as

        - regular expressions are not exactly an easy meal, so it makes
          sense to keep them as simple as possible both for contributors'
          as well as for maintainers' sake, and

        - when trying to come up with a super-complete solution, it is
          really easy not only to spend waaaaay too much time on it, but
          also to introduce bugs that are not spotted for a loooong time
          because nothing actually exercises the newly introduced code, and

        - If It Ain't Broke, Don't Fix It.

- the broader solution must come separately, and not as a mere add-on to
  one test case: we really need to ensure that such file names do not
  enter Git's source code.

I will send my proposal to address the larger issue in a moment, in the
meantime I *beg* you to add this here patch as a quick fix to my CI woes,
either by squashing it into the indicated commit, or by appending it to
the topic branch. I do not care which one, as long as `pu` gets fixed.


To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to