On Sat, 2021-06-12 at 15:38 -0400, Graydon wrote: > This test is meant to test only that no words have been lost or > re-ordered; that the transformation is semantically correct is out of > scope for it.
Somerandomwitterings... So, i'd probably consider (1) make a sequence of words from document A Now, if you really hate your CPU :) you could transform A.seq into a regular expression, w0.*w1.*w2... and match it against the extracted string value of A. Starting with ^.*?w0 might reduce the run-time in practice, but the others all need arbitrary backtracking in case the transformation introducedone or more words that occur at that point in the document, so you have w0 w1 w2 w3 w1 w2 w3 w4 to match against w0 w1 w2 w3 w3 This could also be written with a recursive function and a helper; the helper would find the longest match at the current position, and if that's empty the function returns "nope" and you have to back-track. Doug Lenat i think has written a book around parsing algorithms, as has Anne Brüggemann-Klein; Michael Sperberg-McQueen gave a paper at Balisage about applications to Schema Validation (or at Extreme Markup). Anne's abstraction, whose namei can't remember (sorry), is most promising since your problem can be recast as equivalent to matching XML Schema grammars to input documents, with the unique particle attribution restriction lifted; RelaxNG does this with a hedge automaton and that's another approach. -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org