Something else I just noticed, while working on TIKA-478.

Currently in XHTMLContentHandler.startElement and endElement, the "local" parameter is used to check for tag names in the INDENT and ENDLINE sets.

But these sets have lower-case tag names (e.g. "p", not "P") and the "local" parameter is always the upper-case version, from the XHTMLDowngradeHandler.

I switched it to use the "name" parameter as that's coming in as lower- case, but not sure if that's a real fix; seems like these tag names in the sets should be upper-cased.

As an aside, wondering about uppercasing all tag names versus lower- casing them, as I thought that lower case was the XHTML 1.0 standard.

Thanks,

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to