On Aug 13, 2010, at 2:06am, Andrzej Bialecki wrote:
On 2010-08-13 10:34, Jukka Zitting wrote:
Hi,
On Thu, Aug 12, 2010 at 8:27 PM, Ken Krugler
<[email protected]> wrote:
I think I'm missing something - which javadocs are your referring
to here?
What I see for startDocument() is:
/**
* Starts an XHTML document by setting up the namespace mappings.
* The standard XHTML prefix is generated lazily when the first
* element is started.
*/
I guess the "standard XHTML prefix" is a bit vague here... Mea culpa.
The intention was that XHTMLContentHandler would provide everything
up
to the opening<body> tag when startDocument() is called.
I saw your note on the issue in Jira:
[...]
This would work for<meta>, but not<link> or<base>.
I'd argue that we shouldn't output the<base> element. Instead we
should normalize all URLs before giving them out to the client.
Normalization rules may depend on situation... we could provide a
sensible default but I think it's safer to delegate this decision to
a component that you can override, because in general case
normalization rules may be quite complex.
Example 1: you access a page from www.ibm.com/index.html, which
redirects to www-8.ibm.com/index.html for load-balancing. The
retrieved page may contain <base> that points back to www.ibm.com -
again, to ensure proper load-balancing. In this case, base href !=
page URL. Now, how do you normalize the links from the retrieved
page? (at some point in time this was a real case with this real
site ;) ).
Example 2: <base> is http://a.com/index.html/index.html/index.html
(which is related to a known bug in some HTTP servers), and the
outlink is ../services.html. How do you normalize this?
Of course, you can come up with some sensible defaults in each case,
but my point is that this issue is complicated, and there should be
a way to redefine this behavior.
I think Julien's idea about pushing more/most of this down into the
HtmlMapper makes sense, as that feels like the only way to really give
appropriate control over this behavior in a way that can be easily
subclassed.
It's a bigger architectural change than what I have time for right
now, so currently I'm extending the existing architecture to work
around specific issues I'm hitting.
I did take Jukka's advice and emit all metadata elements in the
resulting XHTML's <head> section. This provides better support for
other parsers besides HTML, though it means that the resulting HTML
can look a bit funky right now - for example, you will often get two
<meta> tags, one for "Content-Type" and the other for "content-type",
because HtmlHandler is remapping a <meta http-equiv> element. I've got
that on my list to resolve.
-- Ken
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g