On Nov 10, 5:56 pm, "Edward K. Ream" <[email protected]> wrote:
> > Here is the test case, boiled down to its essence from data.html:: > > > <td><a href="1">Standards</a> <a href="2">Fees</a></td> > It should be possible to extend the first <a> element so that it > contain the troublesome space. This worked. The new code is yet another scanner, this time at the end of startsHelper. All these scanners are similar, but at present there seems to be no way to use common code. It's not a big deal, imo. As of rev 4772 all unit tests pass, and data.html imports "correctly" if not "well": tags are placed in odd locations. The reason for this unsightly "perfect" import is that skipToMatchingTag thinks there is a tag mismatch. Supposedly, this is a "user error", which however, does not actually spoil the "perfect" import. I have my doubts that there really is a user error, and even if there were unmatched tags, skipToMatchingTag should do a better job of error recovery. That's tomorrow's project. That might be the end of this saga. Edward P.S. The html importer now uses a more rigorous version of filterTokens. It uses a two-pass algorithm. The first pass inserts newlines before </ and after >, which is not *quite* exactly right because the '>' need not terminate an open tag. But that's likely to be a nit that will never cause a problem. The second pass collapses adjacent ws tokens into a single blank, and all runs of newlines into a single newline. These operations are separate, so inserting a newline in the first pass can *not* affect the final ws tokens, and the presence or absence of ws tokens will have no effect on the final newline tokens. Because filterTokens is *almost* perfect, it will be very rare for the perfect import checks to give false positives (falsely claiming to have imported the file perfectly) and it should be impossible for the perfect import checks to give false negatives (falsely reporting import errors). If the (rare!) false positives ever become a problem I can "perfect" the filterTokens code, but that would be a bit expensive, so let's see how things work for now. EKR -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
