At 9:33 AM -0400 5/9/05, Sam Hartman wrote:
My personal opinion as someone who is very shortly going to have to evaluate the atom specification is that you've identified an issue (space and line breaking) for some languages that should be considered. Your proposed solution seems highly undesirable in that it requires us to understand the language of the text being displayed. In the past we've had all sorts of problems doing that. Your proposed solution also seems quite complicated.
Fully agree. Please note the text in the spec we are working from:
If the value is "text", the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), and display the text using typographic techniques such as justification and proportional fonts.
FWIW, this appears twice, identically, in the spec.
Martin Dürst brought up CJK (well, actually CJT), saying that they don't use inter-word spacing. That's fine, but it is irrelevant to the text in the draft. If some text comes through with no spaces, there is no white space to collapse. His argument that some XML editors make long lines of text difficult to edit is clearly *way* out of scope for Atom, or any other XML-using protocol for that matter.
It may well be that the solutions to this problem are worse than the problem itself. However I think it is important to specifically understand that is the case rather than failing to solve the problem because we failed to understand it.
The "case" is that text that is supposed to be read by humans comes in many forms, with different line lengths, and so on. The paragraph from the spec says that Atom processors may alter these so that they can be presented better for the reader. Of course, they may also alter it to make it less readable, as many mail user agents do (<sigh>). Regardless, this says that the Atom processor is free to present things in text constructs in any fashion it deems suitable. This is particularly important for making Atom content accessible; for example, the Atom processor can use this rule to present text content by reading it aloud, by putting it on a screen greatly magnified one character at a time, and so on.
At least based on the discussion the IESG has been copied on, it doesn't sound like the working group has fully considered this issue. The responses have more of the character of those found from people trying to brush aside an issue than of people who have carefully considered something and concluded there is nothing to be done.
Sorry, but that's unfair. Alexy asked "Ok, maybe it is just me, but what does it mean to "collapse white-space"? Does this mean to replace FWS (in RFC 2822 sense) with a single space?" Martin's response was orthogonal: "Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words." The rest of the thread wandered into the weeds because it was hard to figure out what was being discussed.
Moreover, thisn issue cannot be unique to atom: it must effect many XML based protocols both within the IETF and within other standards organizations.
Any protocol that has XML that includes human-readable text has this issue. Well, the processors of that XML does; the protocols themselves do not.
Anyway as someone evaluating atompub's output it would be very useful
if the working group responded to this last call comment. IN my mind
a response would start with a researched description of the issue:
either confirm that Chinese and Japanese and Thai tools work as
described or explain how they actually work. Then describe what other
standards have done about this problem. Finally describe what atompub
has done about the problem and why.
I'm not asking for a lot of text; probably something about as long as this message.
I believe that it can be a lot shorter: given the rationale above, it's not a problem for Atompub or any other XML-using protocol. For that matter, it's not really and XML problem at all: it affects text formats like HTML and RFC 2822 as well.
--Paul Hoffman, Director --Internet Mail Consortium