(It'd have to be a subclass for `RegExp.prototype.whatever.call` to do the right thing for your alternative regex)
On Mon, Jul 30, 2018 at 12:36 PM, Isiah Meadows <[email protected]> wrote: > Not yet, but if I were, it wouldn't be a subclass (it's not > necessary). But the key trouble implementing this is that I'd have to > reimplement the regexp matching logic entirely from scratch if I want > remotely reasonable runtime complexity. As for why: > > - If you ignore backreferences (which IIUC makes JS regexps go from > regular to context-sensitive recognizers), it's *possible* to > implement partial matching using regexp rewriting, but only because JS > doesn't have any other arcane enough features to make it infeasible. > - There is no possible way to extract duplicate matches without > rewriting the regexp entirely and using `regexp.exec`, and the > complexity of the logic to do this generally I suspect is NP-complete, > maybe PSPACE-complete, and in either case, certainly infeasible when > backreferences enter the picture. > - It's technically possible to refactor a string for streaming, but I > then lose the ability to discern a match from a non-match. I also have > the same issue as per above WRT duplicate matches and partial matching > with complexity issues. > - If you were to combine the three, rewriting the regexp generally to > support streaming matches, including duplicates, I suspect would > likely be PSPACE-complete because it seems *very* similar to the first > problem listed here [1]. > > \* Backreferences bring the grammatical complexity of JS regexps from > I'm pretty sure regular to context-sensitive. > > Or in other words, either you control the raw matching logic yourself, > or the polyfill runtime complexity could get absurd for this proposal. > And implementing a pushdown state machine-based regexp engine + parser > in JS isn't exactly something I'm willing to prototype for a strawman. > > [1]: https://en.wikipedia.org/wiki/PSPACE-complete#Regular_ > expressions_and_automata > > ----- > > Isiah Meadows > [email protected] > www.isiahmeadows.com > > > On Mon, Jul 30, 2018 at 2:44 PM, Jordan Harband <[email protected]> wrote: > > Have you tried to implement this as a RegExp subclass, overriding all the > > necessary extension points? > > > > On Mon, Jul 30, 2018 at 11:39 AM, Isiah Meadows <[email protected]> > > wrote: > >> > >> There's two things I've found that need suspendable matching: > >> > >> 1. Partially matching against a string, which is useful with > >> interactive form validation and such. > >> 2. Pattern matching and replacement over a character stream, which is > >> useful for things like matching against files without loading the > >> entire thing into memory or easier filtering of requests. > >> > >> Also, it'd be nice if there was a facility to get *all* matches, > >> including duplicate group matches. This is often useful for simple > >> parsing, where if such support existed, you could just use a Kleene > >> star instead of the standard `exec` loops (which admittedly get old). > >> > >> And finally, we could avoid setting regexp globals here. That would > >> speed up the matcher quite a bit. > >> > >> So, here's my proposal: > >> > >> - `regexp.matcher() -> matcher` - Create a streaming regexp matcher. > >> - `matcher.consume(codePoint, charSize?) -> result | undefined` - > >> Consume a Unicode code point or `-1` if no more characters exist, and > >> return a match result, `undefined` if no match occurred. `charSize` is > >> the number of bytes represented by `codePoint` (default: 1-2 if `/u` > >> is set, 1 otherwise), so it can work with other encodings flexibly. > >> - `matcher.nextPossibleStart -> number` - The next possible start the > >> matcher could have, for more effective buffering and stream > >> management. This is implementation-defined, but it *must* be be `-1` > >> after the matcher completes, and it *must* be within [0, N) otherwise, > >> where N is the next returned match. > >> - `result.group -> string | number | undefined` - Return the group > >> index/name of the current match, or `undefined` if it's just issuing a > >> match of the global regexp. > >> - `result.start -> number` - Return the matched value's start index. > >> - `result.end -> number` - Return the matched value's end index. > >> - This does *not* modify any globals or regexp instance members. It > >> only reads `regexp.lastIndex` on creation. (It doesn't operate on > >> strings, so it shouldn't return any it doesn't already have.) > >> > >> Most RegExp methods could similarly be built using this as a base: if > >> they work on strings, they can iterate their code points. > >> > >> As for the various concerns: > >> > >> - Partial matching is just iterating a string's character codes and > >> seeing if the matcher ever returned non-`undefined`. > >> - Streaming pattern matching is pretty obvious from just reading the > API. > >> - Getting all matches is just iterating the string and returning an > >> object with all the groups + strings it matched. > >> > >> So WDYT? > >> > >> /cc Mathias Bynens, since I know you're involved in this kind of > >> text-heavy stuff. > >> > >> ----- > >> > >> Isiah Meadows > >> [email protected] > >> www.isiahmeadows.com > >> _______________________________________________ > >> es-discuss mailing list > >> [email protected] > >> https://mail.mozilla.org/listinfo/es-discuss > > > > >
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

