On Mon, 22 Nov 2021 23:23:35 GMT, Jonathan Gibbons <j...@openjdk.org> wrote:
>> I'm going to answer both of your comments, this and that one above: >> >>> I think you can simplify the code by reducing `CoarseParser` down to a >>> language-specific regular expression with "standard" named groups for the >>> payload and markup. >> >> Regular expressions with named groups were the initial design. However, I >> changed it to `CoarseParser` halfway through the implementation, when saw >> how bulky the named regex were becoming. As you also note, "end-of-line >> comments" and "comment lines" differ. I couldn't quickly come up with a >> regex that accounts for both of them. > > You may be somewhat missing the point I was trying to make. > > You have two impls of `CoarseParser`, both of which contain a regular > expression for the parsing, hidden inside their private matcher field. > > The only other functionality of `CoarseParser` is `payloadEnd` and > `markupStart`. > > My suggestion is to start by updating each of the regex with named groups for > the payload and markup parts of the line, such that you can derive > `payloadEnd` and `markupStart` from the appropriate named groups. > > At that point, the only thing unique about the impls of `CoarseParser` is > their regex, and that regex could become a property of the `Language` object. > >> As you also note, "end-of-line comments" and "comment lines" differ. I >> couldn't quickly come up with a regex that accounts for both of them. > > To be clear, I am _not_ suggesting a single regex. I am suggesting a regex > per supported language. I abstracted out the mechanics behind the `CoarseParser` precisely because I couldn't come up with a simple way to derive `payloadEnd` and `markupStart` using only groups, be they named or otherwise. My regex fu is not strong enough. If you could find a way that does not look too ugly and passes the added `TestLangProperties` test, be my guest. ------------- PR: https://git.openjdk.java.net/jdk/pull/6397