Re: PaceXhtmlNamespaceDiv

Julian Reschke Thu, 10 Feb 2005 13:25:38 -0800


Sam Ruby wrote:

Julian Reschke wrote:

Sam, thanks for the long reply. I'll try my best to dig it and to offer constructive remarks...

To summarize my p.o.v.:
- the spec shouldn't require any specific container element for XHTML content,
We continue to talk past one another.  The above line is key.
Some examples might help. Perhaps once we are actually understanding each other's points, then we can work backward from there to spec text.
So, suppose my XHTML content is:
  <p>What a nice day!</p>
My XHTML container element is <p>. That is completely my choice. It is not required by the spec.


Yep.

Now if I place that inside an atom feed, I'm going to get something like this (heavily elided, all namespace details omitted):
  <feed>
    <entry>
      <summary>
         <p>What a nice day!</p>
      </summary>
    </entry>
  </feed>


Yep.

Depending on the how the question is phrased, one could take the position that <feed>, <entry>, and <summary> are container elements. Or not. Again, depending on how the question is phrased.


Fine with me.

I don't believe that these elements are the ones that you have an issue with. Correct?


Yes.

Now, consider a different document, again heavily elided, etc:
  <feed>
    <entry>
      <summary>
         <div>
           <p>What a nice day!</p>
         </div>
      </summary>
    </entry>
  </feed>
The key difference between these two documents is that instead of three elements around which there should be no issue, there now are four. But for some reason, this causes a big controversy.

My theory is that the controversy is that people initially assumed that this div element was to be considered part of the content and not part of the format. And thereby was mandating that all content have a given container element. An entirely unreasonable mandate.

Well, the current spec says it's part of the content. I personally feel it really doesn't matter. Adding DIVs around XHTML content doesn't change the semantics of the content, in particular if it doesn't carry any additional attributes.

So, I wouldn't have any problems with recipients that collapse multiple nested xhtml:div elements into one or none (in absence of other attributes on it).

I agree that this would be an unreasonable mandate. But I don't want to force a top level container element for the xhtml, I want to define a bottom level container element in the format for the xhtml. There is a big difference.

It's still hard to see the difference, It's certainy not obvious on the syntactical level, and at the end of the day, that's what we are discussing here, right?

The difference between four feed container elements and mandating that all xhtml content have a uniform top level container element. Which again, I will agree is an entirely unreasonable assumption.
 - - -
On the optimistic presumption that you are with me so far, I'll press on. What desirable characteristics are there for feed container


Not entirely, but trying :-)

elements in this circumstance?
To answer that question, it is important to understand how CMS software tends to be implemented. In particular, how they are layered. This is difficult as there isn't any one reference implementation that we can consult. We also need to consider software which isn't written yet. As I said, this is diffuclt.

But we can observe common problems that people have had, and try to engineer a solution that avoids them. I hold the belief that if somebody writes a simple and clear spec that a significant number of people get wrong, that we are looking at a spec bug.

Sure. But, are we looking at the whole set of implementors, or only those who actually read the spec? We all know that those sets aren't identical...

Enough hand waving, onto the problem at hand. What we are looking at here is an xhtml fragment. Not a complete xhtml document, but some fragment of a web page.


Yes.

Now, fragments tend not to exist independent of a context. And in virtually all xhtml documents I have seen (including the ones I produce), any fragment presumes that the xhtml namespace was defined as the default namespace earlier in the document (in particular, on the document element).

Well, that depends how you define "fragment". For instance, I can use XSLT to produce that fragment and I certainly don't have to make any assumptions about default namespaces. The XSLT processor cares for me. The same thing applies when serializing a node set from an namespace-aware DOM (at least that's what I'd expect and MSXML has been doing for years now).

So, a desirable characteristic for a container element would be one in which the default namespace can be set.

I disagree that this is important, but the atom text constructs do have that characteristic already.

At this point, the discussion can fragment into any number of different directions.
  - - -
One is for those who view XML as merely one potential serialization format, and something that their tool takes care of for them. For them, double escaping the content is the right answer, the simplest thing that can possibly work, end of discussion. While neither you nor I are in that camp (nor is Norm, and others), I am quite willing to leave that as a valid option, as long as it is explicitly declared.

Yes (although I'd like to see the spec to at least state that using the HTML type and double-escaping may reduce the number of recipients that actually will see the intended markup; separate discussion).

Another is to declare the use of default namespaces as evil, and rewrite both the document and the content to use explicit namespaces on every element. This may very well be where you and I part ways. If so, peace. Just please give the people who want to use default namespaces the same consideration that I am willing to give those who wish to double escape.

Ah! Yes, there's a camp of people who dislike default namespaces. I'm not part of that group. Default namespaces are a useful method to keep the XML readable, but as you said, they must be used carefully. I've explained XPath1-vs-default-namespaces too many times in my life :-)

And finally, there is a desire to create a format that can be done entirely with default namespaces, and without the need to rewrite or modify the content.


That sounds like an entirely new requirement that I wouldn't support.

The simple fact is that well formed xhtml does not always exist in the form of DOM nodes. Sometimes it is serialized as a string and stored in a file or a MySQL database. That does not make it any less well formed. It doesn't mean that it wasn't produced by a proper tool.

Correct. I'm still with you. Actually, the code I'm maintaining (SAP's WebDAV connector for Netweaver) uses this approach to persist WebDAV properties (that may contain arbitrary XML).

Not having seen Tim's implementation, I'm just speculating at this point, but it probably falls into this category. Based on the tools he is using, he is confident that his content is well formed, even if it is stored as a string. As such, he can confidently use simple string concatenation as long as he can be assured that the default namespace is correct.

Hm. Even when you think you know your content is well-formed, string concatenation may still be dangerous (and that's why I'm not doing it):

- sometimes, you don't have full control about people who write-access your store; and they may mess up the otherwise well-formed XML; if there's any remote chance that this happens, I'll rather re-parse the string instead of ever emitting non-wf XML,

- XML-wellformedness extends to encoding considerations; you may have an XML fragment in a string which looks perfectly ok; but emitting it inside a different context may still be broken (consider XML that uses non-ASCII characters in element names which is serialized in US-ASCII encoding)

Whether Tim's implementation meets this description or not, mine certainly does. And by looking at the common errors I have seen in feeds, I'm pretty sure that many others do too.

So what does your code when the XML fragment it's pulling from a string-based store contains something like:

<ß xmlns="http://julian-reschke.de/julians-funny-xthml-extension>xyz</ß>

Doing this for "arbitrary" content will only fly when the content encoding that you may have selected earlier (streaming?) indeed allows to serialize all Unicode characters without escaping.

 - - -
So, what would a desirable feed container element be for this scenario? I would suggest that it would be something in the xhtml namespace. If it were in the atom namespace, you would have to do something along the lines of:

<atom:summary xmlns:atom="..." xmlns="...">

Yes (unless xmlns:atom was already declared earlier). That seems like a good approach for those who do want the default namespace here.

One could of course, hoist the declaration of the atom namespace to the top of the document, at which point you get two declarations of the atom namespace. You can get to exactly one declaration, *if* you explicitly specify the namespace prefix on every element, and as I said above, you are welcome do this, I just don't want to mandate it.


Yes (nor do I).

An alternative would be to put summary in the xhtml namespace. That doesn't feel quite right to me.


Yes.

A final alternative would be to adopt an element from the xhtml vocabulary as a feed level container. One that connotes that the children are expected to be valid children of the <div> element would be nice.
 - - -
If you are still with me, what I am proposing is that the simplest and cleanest solution for people who like default namespaces would be to define the format so that there is an <xhtml:div> element between the <atom:summary> and the xhtml fragment that is being syndicated.


So what was wrong with

        <atom:summary xmlns:atom="..." xmlns="...">

? The issue here is that your proposal may be the optimla one for those who need the default namespace, it's sub-optimal for everybody else.

If you believe in double escaping, this does not affect you.
If you don't believe in default namespaces, then the difference amounts to whether there is three or four enclosing feed elements for you to deal with.


Yes. I don't like four :-)

 - - -
So, if we can't work together to find appropriate spec wording to make this happen, the following predictions can be safely made:

1) Graham (who uses proper XML tools) will have to do more work.


Why?

2) Tim (who uses string concatenation) will have to do more work.

He can still do what he does. I'll assume that Tim indeed read the draft carefully and does what he does entirely on purpose. Why do you think he would need to change his code if the spec doesn't change?

  3) More feeds will be harder to read (that's why I asked for people to
     experiment with alternate serializations.

I think that's a matter of taste. Some prefer less NS declarations, other prefer less container elements.

3) More feeds will be invalid (content in atom namespace)

On the other hand, more feeds may be invalid because of missing DIV elements. It's hard to design a spec for people who don't read it. What makes you think that putting a REQUIRED div container element will actually force everybody to use it?

  4) More feeds will be incorrect (in the sense that Tim's feed does
     accurately reflect the content of his entries).

Now you lost me. Are you saying that additional container <div>s affect the semantics of XHTML content?

  5) For some combinations of clients and servers, entries produced
     via an HTTP POST will end up with multiple <div>s.

If this is the case, the spec possibly should state that container <div>s without attributes indeed do not care any additional semantics and can safely be collapsed.

 - - -
All that being said, I am OK with any spec wording that enables one to create a document using only default namespaces that:
  1) does not require well formed, serialized XHTML fragments to be
     modified.


Such as...: <atom:summary xmlns:atom="..." xmlns="..."> ?

  2) is unabiguous as to which elements in the document are part of the
     feed "structure" and which are to be considered the "content" being
     syndicated.

Fair enough?

Yes. So how about instead adding an explanation + example how atom text constructs can be serialized in a way such that the XHTML namespace is indeed default?


Best regards, Julian

--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

Re: PaceXhtmlNamespaceDiv

Reply via email to