Not yet, but if I were, it wouldn't be a subclass (it's not necessary). But the key trouble implementing this is that I'd have to reimplement the regexp matching logic entirely from scratch if I want remotely reasonable runtime complexity. As for why:
- If you ignore backreferences (which IIUC makes JS regexps go from regular to context-sensitive recognizers), it's *possible* to implement partial matching using regexp rewriting, but only because JS doesn't have any other arcane enough features to make it infeasible. - There is no possible way to extract duplicate matches without rewriting the regexp entirely and using `regexp.exec`, and the complexity of the logic to do this generally I suspect is NP-complete, maybe PSPACE-complete, and in either case, certainly infeasible when backreferences enter the picture. - It's technically possible to refactor a string for streaming, but I then lose the ability to discern a match from a non-match. I also have the same issue as per above WRT duplicate matches and partial matching with complexity issues. - If you were to combine the three, rewriting the regexp generally to support streaming matches, including duplicates, I suspect would likely be PSPACE-complete because it seems *very* similar to the first problem listed here [1]. \* Backreferences bring the grammatical complexity of JS regexps from I'm pretty sure regular to context-sensitive. Or in other words, either you control the raw matching logic yourself, or the polyfill runtime complexity could get absurd for this proposal. And implementing a pushdown state machine-based regexp engine + parser in JS isn't exactly something I'm willing to prototype for a strawman. [1]: https://en.wikipedia.org/wiki/PSPACE-complete#Regular_expressions_and_automata ----- Isiah Meadows [email protected] www.isiahmeadows.com On Mon, Jul 30, 2018 at 2:44 PM, Jordan Harband <[email protected]> wrote: > Have you tried to implement this as a RegExp subclass, overriding all the > necessary extension points? > > On Mon, Jul 30, 2018 at 11:39 AM, Isiah Meadows <[email protected]> > wrote: >> >> There's two things I've found that need suspendable matching: >> >> 1. Partially matching against a string, which is useful with >> interactive form validation and such. >> 2. Pattern matching and replacement over a character stream, which is >> useful for things like matching against files without loading the >> entire thing into memory or easier filtering of requests. >> >> Also, it'd be nice if there was a facility to get *all* matches, >> including duplicate group matches. This is often useful for simple >> parsing, where if such support existed, you could just use a Kleene >> star instead of the standard `exec` loops (which admittedly get old). >> >> And finally, we could avoid setting regexp globals here. That would >> speed up the matcher quite a bit. >> >> So, here's my proposal: >> >> - `regexp.matcher() -> matcher` - Create a streaming regexp matcher. >> - `matcher.consume(codePoint, charSize?) -> result | undefined` - >> Consume a Unicode code point or `-1` if no more characters exist, and >> return a match result, `undefined` if no match occurred. `charSize` is >> the number of bytes represented by `codePoint` (default: 1-2 if `/u` >> is set, 1 otherwise), so it can work with other encodings flexibly. >> - `matcher.nextPossibleStart -> number` - The next possible start the >> matcher could have, for more effective buffering and stream >> management. This is implementation-defined, but it *must* be be `-1` >> after the matcher completes, and it *must* be within [0, N) otherwise, >> where N is the next returned match. >> - `result.group -> string | number | undefined` - Return the group >> index/name of the current match, or `undefined` if it's just issuing a >> match of the global regexp. >> - `result.start -> number` - Return the matched value's start index. >> - `result.end -> number` - Return the matched value's end index. >> - This does *not* modify any globals or regexp instance members. It >> only reads `regexp.lastIndex` on creation. (It doesn't operate on >> strings, so it shouldn't return any it doesn't already have.) >> >> Most RegExp methods could similarly be built using this as a base: if >> they work on strings, they can iterate their code points. >> >> As for the various concerns: >> >> - Partial matching is just iterating a string's character codes and >> seeing if the matcher ever returned non-`undefined`. >> - Streaming pattern matching is pretty obvious from just reading the API. >> - Getting all matches is just iterating the string and returning an >> object with all the groups + strings it matched. >> >> So WDYT? >> >> /cc Mathias Bynens, since I know you're involved in this kind of >> text-heavy stuff. >> >> ----- >> >> Isiah Meadows >> [email protected] >> www.isiahmeadows.com >> _______________________________________________ >> es-discuss mailing list >> [email protected] >> https://mail.mozilla.org/listinfo/es-discuss > > _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

