Something else I just noticed, while working on TIKA-478.
Currently in XHTMLContentHandler.startElement and endElement, the "local" parameter is used to check for tag names in the INDENT and ENDLINE sets.
But these sets have lower-case tag names (e.g. "p", not "P") and the "local" parameter is always the upper-case version, from the XHTMLDowngradeHandler.
I switched it to use the "name" parameter as that's coming in as lower- case, but not sure if that's a real fix; seems like these tag names in the sets should be upper-cased.
As an aside, wondering about uppercasing all tag names versus lower- casing them, as I thought that lower case was the XHTML 1.0 standard.
Thanks, -- Ken -------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
