On Tue, 09 Jan 2007 02:15:38 +0100, Sam Ruby <[EMAIL PROTECTED]> wrote: >>>> Currently html5lib maps rather well to the specificaction which >>>> improves the readability of the code a lot (imho). I'd like to know >>>> at how many changes we're looking and how that impacts the code. >>> >>> That's why I provided a comprehensive patch: >>> >>> http://intertwingly.net/stories/2007/01/08/xhtml5.diff >> Instead of using string.ascii_uppercase you should use our internal >> asciiUppercase. Also, instead of using a dict for translating can't you >> just provide two strings? I'd think that would be faster. > > I don't understand the suggestion to use the internal asciiUppercase - > with my patch, this constant is no longer used. And my constant was > defined in the src/constants.py file...
But you haven't removed the constant either. But as you later note it's still used somewhere else... > I also don't understand the suggestion to "just provide two strings". > That's not how Python's unicode.translate() method works. Oh, sorry about that. I vaguely recalled translate() from http://diveintopython.org/performance_tuning/dictionary_lookups.html and apparently it works slightly different from what I remembered. Should we use string.maketrans? >> The normalizeToken method should be inlined as you only want to do that >> from a single place anyway. And EndTag should use the translate method >> and not .lower(). > > While it is true that normalizeToken is only called from one place, this > method can't be inlined as the liberal XML parser subclass needs to > override this behavior. Hmm, not so nice. For a large page that's a lot of additional method calls. Can you redo it a bit making sure we don't make that call for all non tag tokens at least, such as characters. >> I suppose these changes also remove the need for asciiLowercase (not >> asciiLower that you introduce) as defined in constants.py. > > asciiLowercase is still used in the portion of the logic dealing with > DocTypes. But having two similarly named constants with quite different > purposes is confusing, and clearly *that* should be changed. Yeah. >> Anyway, with these nits (open for debate) I think I'm ok with doing >> this assuming you will update the tests as well (or someone else will). >> I'd like to have a liberal XML parser too one day and working on an >> experimental implementation of one can't hurt I suppose :-) > > In case you didn't notice it, here are the tests: > > http://intertwingly.net/stories/2007/01/08/tests/test_xhtml.py I noticed those, but you also had some comments on updating tests. >> If xhtml5parser.py is the only other file I would be fine with adding >> that to src/ as liberalxmlparser.py. Bit of a lengthty name, but it >> more accurately reflects what it is. > > I'm not worried about the the name. That name is fine. > > I'll look into committing this tomorrow, with your proposed module name, > with the unit tests, and with some subset of these nits addressed. I'll > add comments at the top of the module indicating that this support is > experimental and subject to change and even removal at any time. Ok, cool. -- Anne van Kesteren <http://annevankesteren.nl/> <http://www.opera.com/> _______________________________________________ implementors mailing list [email protected] http://lists.whatwg.org/listinfo.cgi/implementors-whatwg.org
