On Tue, Jul 31, 2018 at 8:50 AM Jeff King <p...@peff.net> wrote:
> On Mon, Jul 30, 2018 at 05:38:06PM -0400, Eric Sunshine wrote:
> > I considered that, but it doesn't handle nested here-docs, which we
> > actually have in the test suite. For instance, from t9300-fast-import:
> > [...]
> > Nesting could be handled easily enough either by stashing away the
> > opening tag and matching against it later _or_ by doing recursive
> > here-doc folding, however, 'sed' isn't a proper programming language
> > and can't be coerced into doing either of those. (And, it was tricky
> > enough just getting it to handle the nested case with a limited set of
> > recognized tag names, without having to explicitly handle every
> > combination of those names nested inside one another.)
>
> I hesitate to make any suggestion here, as I think we may have passed
> a point of useful cost/benefit in sinking more time into this script.
> But...is switching to awk or perl an option? Our test suite already
> depends on having a vanilla perl, so I don't think it would be a new
> dependency. And it would give you actual data structures.

It would, and I did consider it, however, I was very concerned about
startup cost (launch time) with heavyweight perl considering that it
would have to be run for _every_ test. With 13000+ tests, that cost
was a very real concern, especially for Windows users, but even for
MacOS users (such as myself, for which the full test suite already
takes probably close to 30 minutes to run, even on a ram drive). So, I
wanted something very lightweight (and deliberately used that word in
the commit message), and 'sed' seemed the lightest-weight of the
bunch.

'awk' might be about as lightweight as 'sed', and it may even be
possible to coerce it into handling the task (since the linter's job
is primarily just a bunch of regex matching with very little
"manipulating"). v1 of the linter was somewhat simpler and didn't deal
with these more complex cases, such as nested here-docs. v1 also did
rather more "manipulating" of the script since the result was meant to
be run by the shell. When it came time to implement v2, which detects
broken &&-chains itself by textual inspection, most of the
functionality (coming from v1) was already implemented in 'sed', so
'awk' never really came up as a candidate since rewriting the script
from scratch in 'awk' didn't seem like a good idea. (And, at the time
v2 was started, I didn't know that these more complex cases would
arise.) So, 'awk' might be a viable alternative, and perhaps I'll take
a stab at it for fun at some point (or not), but I don't think there's
a pressing need right now.

Reply via email to