Re: BIDI (was Proposal: Atomext WG)

James M Snell Mon, 07 Jan 2008 09:58:02 -0800


Brian Smith wrote:

[snip[
The BIDI draft says it only applies to constructs thatRFC4287 labeled "language sensitive." Accordingly, the BIDIdraft does not apply to extension elements.
Section 6.4.2: "Structured Extension elements are Language-Sensitive."
atom:category and atom:link each have exactly one language-sensitiveattribute.
Which attributes on elements and subelements of structured extension elements 
are language-sensitive? It seems it must be either all or none.

It depends on the definition of the extension. If you're not familiarwith the extension, you have no way of knowing.

there's no reason to try eliminating all need for bidi formattingcharacters in language sensitive attribute values.
I agree. If there are going to be formatting codes in the document anyway, then 
why do we need a mechanism that duplicates their functionality?

Because their use in markup is problematic and is actively discouragedfor a number of very good reasons.

* Otherwise, use Unicode BIDI/Ruby formatting codes, such
that matching pairs of formatting codes are fully containedwithin a single text or attribute node.
Whose responsibility is it to apply the formatting codes? The persontyping the text or the software? How does the software know when toapply the codes?
Also, what about when an Atompub client edits an entry?
Is the Atompub client responsible for preserving the unicodeformatting characters? What if they don't?
My hypothesis is that an implementation ignorant of BIDI issues is morelikely to preserve the formatting characters than Atom/XHTML/HTML BIDImarkup, especially when the effects of those formatting characters neverspan multiple nodes in the document.

Have you tested this hypothesis using real editors? Example, in ourinternal blogging environment, tags are entered in a single text box,each tag separated by a comma. The system splits the tags into an arrayand saves each tag separately. Each tag becomes a separateatom:category element. Is the user responsible for adding theappropriate formatting codes around each individual tag? When the userwishes to edit the entry later, perhaps to add a new entry, are theysupposed to just know that there are non-visual bidi formatting codesinterspersed into the comma separated list of tags?

There are existing Atompub clients out there that, more than likely,
will not, and since the formatting codes are non-visual, it's not
likely the user will notice them either, causing unexpected
rendering issues later on.
I don't understand why a user that needs BIDI functionality would use aclient that doesn't have BIDI support. Further, I don't understand howclients that are incapable of generating BIDI formatting codes can generatemarkup compliant with your proposal. Are you expecting the users to editthe markup directly?

You're assuming that all users have the same requirements. In ourenvironment, a single feed may include entries from many differentusers. We have group blogs where users from many different locales haveedit rights on any entry in the blog. Further, our users use manydifferent editors to write and manage their blog entries. Asking thoseusers to be mindful of how they're using bidi formatting characters is alot more difficult than what we currently do, which is provide a simplecheck box to indicate whether or not the entry is "right-to-left", whichin turn, is translated into the appropriate dir="rtl" in the markup.Clients that do not understand the dir attribute simply ignore it, andsince our software is written so that only explicit changes in value arerecognized (e.g. a missing dir attribute does not mean the dir attributevalue has changed) we're able to work seamlessly with editors that donot support the attribute.


Markup is just as non-visual as formatting codes, except for people using "view 
source" and the like.

Yes, but with the marup we don't have to rely on users getting it rightwhen they type in the values.

With the bidi attribute approach, per rfc5023, non-supporting
clients are expected to at least preserve the bidi attribute
but will otherwise continue working as they currently do,
without risk of corrupting the text by inadvertently dropping
or improperly nesting the bidi controls.

RFC 4287 and RFC 5023 is pretty unclear about what is required to be preserved.

RFC 5023, Section 9.3: To avoid unintentional loss of data when editingMember Entries or Media Link Entries, an Atom Protocol client SHOULDpreserve all metadata that has not been intentionally modified,including unknown foreign markup as defined in Section 6 of [RFC4287].


Seems pretty darn clear to me.

Firstly, RFC 5023 says that implementations can do whatever they want as long asthe results are well-formed. Otherwise, AtomPub implementations that use (X)HTMLwhitelists would be non-compliant. Also, the requirement to preserve unknown foreignmarkup seems to apply more to unknown extension elements than to unknown attributeson known elements. In particular, if I replace the atom:author element with a newone, then I am not going to preserve the old, unknown attributes on the previousatom:author element.

9.3 uses the term "unknown foreign markup", which, if RFC 4287 definesas unknown elements AND attributes.

Also, imagine a case where we have a feed with 100 entries, each withabout 5 atom:category elements. Let's stay that the feed isgenerally all RTL. Using your approach, that's at least 1000 extracharacters in the feed, and 500 opportunities for the embedding to
be screwed up.
The category labels are almost always going to be composed of strong RTL andstrong LTR characters (only), so the BIDI algorithm will work correctly. And,when it doesn't work, it is unlikely that it will be due to the wrong basedirectionality--in these cases, formatting characters are going to be neededno matter what. Right?

"almost always" is not "always". The Atom bidi draft covers the caseswhere "always" is more desirable than "almost always".

And no, the formatting characters are not always going to be needed. IfI am rendering the text in (x)html, I don't want the formattingcharacters in there at all; rather, I want to follow best practices anduse the (x)html provided bidi markup mechanisms.


> [snip]

* Editors of new documents must be meticulous aboutinserting the proper markup and formatting codes.* Processors of existing documents must be meticulous aboutpreserving BIDI/Ruby markup and/or formatting codes wheneverany part of the contained text is preserved.
Again, what about older editors that know nothing about the propermarkup or formatting codes?
All of these old editors will also fail to implement the Atom BIDI spec too.There are currently more editors that can insert formatting characters thancan handle Atom BIDI markup, aren't there? I think that will always be the case.

Failing to implement the atom bidi spec has significantly fewerconsequences than improperly implementing the unicode formattingcharacters. That is, existing applications will be no worse off thanthey currently are if the dir attribute is ignored or dropped; however,existing applications can be severely impacted by the improper use ofthe unicode bidi characters. Again, there are very good reasons behindthe recommendation against using the formatting codes in markup.

[snip]
atom:name: When consuming atom:name, treat it as though itwas defined as a text construct, and allow for type="xhtml"and type="html". When producing atom:name elements, BIDI markupSHOULD be replaced with Unicode formatting characters when needed.Ruby markup may be replaced with Unicode formatting characters orstripped entirely. All markup should be stripped entirely, thetype attribute should be removed.

As I pointed out in my previous note, this is not an interoperablesuggestion and is not supported in any way by the specification.


- James

- Brian

Re: BIDI (was Proposal: Atomext WG)

Reply via email to