Re: BIDI (was Proposal: Atomext WG)

James M Snell Mon, 07 Jan 2008 00:00:18 -0800



Brian Smith wrote:

[snip]

Atom documents are almost never hand-entered, and there isalready a specification in place for markup up BIDI and even
ruby text in general XML. The odds that clients and servers
are going to correctly implement this extension--except
those targeted direclty towards BIDI users--seem pretty
low to me. Personally, it seems much easier toimplement the an existing BIDI markup mechanism (Unicode,XML, and/or XHTML) than a new standard.
What are you basing that on?


The Unicode/W3C guidelines (http://www.w3.org/TR/unicode-xml/#Bidi and 
http://www.w3.org/International/questions/qa-bidi-controls) say this:

* Use *XHTML* BIDI markup whenever possible.
* Otherwise *CSS* whenever possible.
* Otherwise, consider building BIDI markup into your markup schema.
* We have to support BIDI formatting codes anyway, since the above mechanisms 
don't solve all BIDI problems.

Sorry if I wasn't clear, I was referring specifically to your assertionthat, "it seems much easier to implement the an existing BIDI markupmechanism"... Having already implemented it, I can see no additionalcomplexity or difficulty.

There's really not much guess workinvolved in the implementation and apps that choose not to implementsupport will be no worse or better off than they are currently.
That is not true. Consider a feed aggregator. If it doesn't supportAtom BIDI, then it will not correctly rewrite entries to handle aninherited "dir" attribute from the atom:feed element. Since thedirectionality is also inherited by text constructs, whenever theimplementation passes the text construct to a rendering engine, itneeds to rewrite that content to handle the inherited directionality.

If an aggregator doesn't support the dir attribute, it will ignore it ordrop it as if it wasn't there in the first place and will continue tooperate as it always had, which is exactly what we want. So again,those apps will be no worse or better off than they are currently.

Don't forget atom:link/@title and atom:category/@label.atom:category/@term can also cause problems whenimplementation use thatvalue for display purposes.
I know that. But, the Atom BIDI draft does not eliminate all uses of BIDIformatting characters in these attributes, either. And, it doesn't specifyBIDI support for language-sensitive content in Atom service documents,category documents, RSS feeds, or RSD documents.


I have no interest in eliminating all uses of bidi formatting characters.

And yes, the spec does address bidi support for language-sensitivecontent in Atompub service and category documents. The spec languagecan be improved in this regard, but the spec alters the definition ofthe atomCommonAttributes production to add the dir attribute. In boththe Atompub service document and the Atom categories document, thelangauge-sensitive text is provided by elements from the Atom namespace(e.g. atom:category and atom:title). Also, the bidi spec predates thepublication of rfc5023; I intend to have the next rev of the specspecifically discuss atompub service docs.

As for RSS and RSD doc's, I couldn't care less about solving the i18nissues of either.

Arbitrary extension elements can also have problems.


The BIDI draft says it only applies to constructs that RFC4287 labeled "language 
sensitive." Accordingly, the BIDI draft does not apply to extension elements.


Section 6.4.2: "Structured Extension elements are Language-Sensitive."

The bidi draft doesn't attempt to solve all the i18n issueswith Atom. Ruby text is a problem for pretty much everything,especially given the fact that most browsers don't have aclue how to properly render ruby text yet. The bidi draftrightfully focuses on one small part of the problem.
I agree that a narrow scope is good. But, a solution for Ruby text will also be applicable to BIDI, especially if that solution involves the reuse of XHTML markup and/or CSS.

if/when the need emerges to improve ruby text support in Atom, I willglady help work out a solution.

It also doesn't solve the problem with atom:link/@title or
other attributes that
are language-sensitive.
Yes, it does.
The Atom BIDI draft does not provide a way of specifying base differentbase directionalities for attributes on the same element, it doesn't eliminateall need for BIDI formatting characters in language-sensitive attribute values,it doesn't provide a mechanism for discovering which (nested) extension elementsand attributes are affected by the proposed Atom BIDI markup.

atom:category and atom:link each have exactly one language-sensitiveattribute.

there's no reason to try eliminating all need for bidi formattingcharacters in language sensitive attribute values.

The bidi attribute applies to language-sensitive elements andattributes. Defining which extension elements are language sensitive isup to the extension definition. There's no reason to provide amechanism for discovering which extensions are affected.

If I implement RFC4287, the Unicode BIDI algorithm, XHTML BIDI, HTML BIDI, andthe "Unicode in XML" guidelines, I will have pretty good BIDI support. It willrequire me to adhere to four different BIDI standards in addition to RFC4287.That is a lot of work already. Now, your Atom BIDI and URI template BIDI proposalsadd two more specifications that I would have to support--for a total of SIX standardsto adhere to and resolve conflicts between, JUST to support BIDI text. And, even ifI create well-formed documents adhering to all six standards, whenever I open them upin any of my text editors, or any feed reader, they will look wrong since nobodyelse is implementing all of those standards. I think that is totally unreasonable.Abdera might have amazing support for BIDI, for which you should be commended, butunless all Atom software is going to be implemented on top of Abdera, Abdera willnot be able to reliably interoperate with anything. If we want to provide interopera!
 ble support for BIDI, we need to make it as simple to implement as possible.

My counter-proposal is simple:


None of this is simple. Please do not claim that it is.

* Use XHTML/HTML BIDI/Ruby markup whenever possible.

Keep in mind the fact that the atom bidi spec very clearly indicatesthat the (x)html bidi mechanisms should be used in addition to the atomdir attribute.

* Otherwise, use Unicode BIDI/Ruby formatting codes, such that matching pairs 
of formatting codes are fully contained within a single text or attribute node.

Whose responsibility is it to apply the formatting codes? The persontyping the text or the software? How does the software know when toapply the codes? Also, what about when an Atompub client edits anentry? Is the Atompub client responsible for preserving the unicodeformatting characters? What if they don't? There are existing Atompubclients out there that, more than likely, will not, and since theformatting codes are non-visual, it's not likely the user will noticethem either, causing unexpected rendering issues later on. How arenon-bidi enabled clients supposed to know what to do? With the bidiattribute approach, per rfc5023, non-supporting clients are expected toat least preserve the bidi attribute but will otherwise continue workingas they currently do, without risk of corrupting the text byinadvertently dropping or improperly nesting the bidi controls.

Also, imagine a case where we have a feed with 100 entries, each withabout 5 atom:category elements. Let's stay that the feed is generallyall RTL. Using your approach, that's at least 1000 extra characters inthe feed, and 500 opportunities for the embedding to be screwed up.

Re: The ruby formatting codes: Even the Unicode spec warns against usingthe ruby formatting codes for anything other than internal storage. Wegain absolutely nothing by bringing ruby into this discussion.

* Editors of new documents must be meticulous about inserting the proper markup 
and formatting codes.
* Processors of existing documents must be meticulous about preserving 
BIDI/Ruby markup and/or formatting codes whenever any part of the contained 
text is preserved.

Again, what about older editors that know nothing about the propermarkup or formatting codes?

I recognize that this goes against the Unicode in XML guidelines. However, Atomalready goes against the guidelines by having language-sensitive text in attributevalues and other contexts where XHTML markup cannot be used.


What is the benefit of going against the Unicode in XML guidelines?

- James

- Brian

Re: BIDI (was Proposal: Atomext WG)

Reply via email to