On 2010-08-13 10:34, Jukka Zitting wrote:
Hi,
On Thu, Aug 12, 2010 at 8:27 PM, Ken Krugler
<[email protected]> wrote:
I think I'm missing something - which javadocs are your referring to here?
What I see for startDocument() is:
/**
* Starts an XHTML document by setting up the namespace mappings.
* The standard XHTML prefix is generated lazily when the first
* element is started.
*/
I guess the "standard XHTML prefix" is a bit vague here... Mea culpa.
The intention was that XHTMLContentHandler would provide everything up
to the opening<body> tag when startDocument() is called.
I saw your note on the issue in Jira:
[...]
This would work for<meta>, but not<link> or<base>.
I'd argue that we shouldn't output the<base> element. Instead we
should normalize all URLs before giving them out to the client.
Normalization rules may depend on situation... we could provide a
sensible default but I think it's safer to delegate this decision to a
component that you can override, because in general case normalization
rules may be quite complex.
Example 1: you access a page from www.ibm.com/index.html, which
redirects to www-8.ibm.com/index.html for load-balancing. The retrieved
page may contain <base> that points back to www.ibm.com - again, to
ensure proper load-balancing. In this case, base href != page URL. Now,
how do you normalize the links from the retrieved page? (at some point
in time this was a real case with this real site ;) ).
Example 2: <base> is http://a.com/index.html/index.html/index.html
(which is related to a known bug in some HTTP servers), and the outlink
is ../services.html. How do you normalize this?
Of course, you can come up with some sensible defaults in each case, but
my point is that this issue is complicated, and there should be a way to
redefine this behavior.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com