Hi Jukka,

On Aug 12, 2010, at 12:43am, Jukka Zitting wrote:

Hi,

On Wed, Aug 11, 2010 at 4:53 AM, Ken Krugler
<[email protected]> wrote:
But before I dive in here and start filing issues/hacking on the code, I'm
wondering if somebody (OK, Jukka) can provide some color commentary.

The rationale behind the lazy startup in XHTMLContentHandler is that
many parsers don't yet have the document title metadata available when
startDocument() is called. Instead of outputting an empty <title/>
element, it's better to delay the startup to as late as possible.

Now, more generally the contract of XHTMLContentHandler (see
start/endDocument javadocs) is that the parser that feeds it should
only output content that go *inside* the <body/> element. Feeding a
full <html/> tree to an XHTMLContentHandler will cause trouble.

If you have a parser that wants to output a full <html/> tree along
with extra <meta/> entries inside the <head/> element, you can always
directly use the ContentHandler instance given as an argument to the
parse() method.

Thanks for the input on this. I'll take a look at filing an issue & generating a patch today.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to