Brian Smith wrote:
[snip[
The BIDI draft says it only applies to constructs that RFC4287 labeled "language sensitive." Accordingly, the BIDI draft does not apply to extension elements.
Section 6.4.2: "Structured Extension elements are Language-Sensitive."

atom:category and atom:link each have exactly one language-sensitive attribute.

Which attributes on elements and subelements of structured extension elements 
are language-sensitive? It seems it must be either all or none.


It depends on the definition of the extension. If you're not familiar with the extension, you have no way of knowing.

there's no reason to try eliminating all need for bidi formatting characters in language sensitive attribute values.

I agree. If there are going to be formatting codes in the document anyway, then 
why do we need a mechanism that duplicates their functionality?


Because their use in markup is problematic and is actively discouraged for a number of very good reasons.

* Otherwise, use Unicode BIDI/Ruby formatting codes, such
that matching pairs of formatting codes are fully contained within a single text or attribute node.

Whose responsibility is it to apply the formatting codes? The person typing the text or the software? How does the software know when to apply the codes?

Also, what about when an Atompub client edits an entry?
Is the Atompub client responsible for preserving the unicode formatting characters? What if they don't?

My hypothesis is that an implementation ignorant of BIDI issues is more likely to preserve the formatting characters than Atom/XHTML/HTML BIDI markup, especially when the effects of those formatting characters never span multiple nodes in the document.

Have you tested this hypothesis using real editors? Example, in our internal blogging environment, tags are entered in a single text box, each tag separated by a comma. The system splits the tags into an array and saves each tag separately. Each tag becomes a separate atom:category element. Is the user responsible for adding the appropriate formatting codes around each individual tag? When the user wishes to edit the entry later, perhaps to add a new entry, are they supposed to just know that there are non-visual bidi formatting codes interspersed into the comma separated list of tags?


There are existing Atompub clients out there that, more than likely,
will not, and since the formatting codes are non-visual, it's not
likely the user will notice them either, causing unexpected
rendering issues later on.

I don't understand why a user that needs BIDI functionality would use a client that doesn't have BIDI support. Further, I don't understand how clients that are incapable of generating BIDI formatting codes can generate markup compliant with your proposal. Are you expecting the users to edit the markup directly?

You're assuming that all users have the same requirements. In our environment, a single feed may include entries from many different users. We have group blogs where users from many different locales have edit rights on any entry in the blog. Further, our users use many different editors to write and manage their blog entries. Asking those users to be mindful of how they're using bidi formatting characters is a lot more difficult than what we currently do, which is provide a simple check box to indicate whether or not the entry is "right-to-left", which in turn, is translated into the appropriate dir="rtl" in the markup. Clients that do not understand the dir attribute simply ignore it, and since our software is written so that only explicit changes in value are recognized (e.g. a missing dir attribute does not mean the dir attribute value has changed) we're able to work seamlessly with editors that do not support the attribute.


Markup is just as non-visual as formatting codes, except for people using "view 
source" and the like.


Yes, but with the marup we don't have to rely on users getting it right when they type in the values.

With the bidi attribute approach, per rfc5023, non-supporting
clients are expected to at least preserve the bidi attribute
but will otherwise continue working as they currently do,
without risk of corrupting the text by inadvertently dropping
or improperly nesting the bidi controls.

RFC 4287 and RFC 5023 is pretty unclear about what is required to be preserved.

RFC 5023, Section 9.3: To avoid unintentional loss of data when editing Member Entries or Media Link Entries, an Atom Protocol client SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup as defined in Section 6 of [RFC4287].

Seems pretty darn clear to me.

Firstly, RFC 5023 says that implementations can do whatever they want as long as the results are well-formed. Otherwise, AtomPub implementations that use (X)HTML whitelists would be non-compliant. Also, the requirement to preserve unknown foreign markup seems to apply more to unknown extension elements than to unknown attributes on known elements. In particular, if I replace the atom:author element with a new one, then I am not going to preserve the old, unknown attributes on the previous atom:author element.


9.3 uses the term "unknown foreign markup", which, if RFC 4287 defines as unknown elements AND attributes.

Also, imagine a case where we have a feed with 100 entries, each with about 5 atom:category elements. Let's stay that the feed is generally all RTL. Using your approach, that's at least 1000 extra characters in the feed, and 500 opportunities for the embedding to
be screwed up.

The category labels are almost always going to be composed of strong RTL and strong LTR characters (only), so the BIDI algorithm will work correctly. And, when it doesn't work, it is unlikely that it will be due to the wrong base directionality--in these cases, formatting characters are going to be needed no matter what. Right?

"almost always" is not "always". The Atom bidi draft covers the cases where "always" is more desirable than "almost always".

And no, the formatting characters are not always going to be needed. If I am rendering the text in (x)html, I don't want the formatting characters in there at all; rather, I want to follow best practices and use the (x)html provided bidi markup mechanisms.

> [snip]
* Editors of new documents must be meticulous about inserting the proper markup and formatting codes. * Processors of existing documents must be meticulous about preserving BIDI/Ruby markup and/or formatting codes whenever any part of the contained text is preserved.
Again, what about older editors that know nothing about the proper markup or formatting codes?

All of these old editors will also fail to implement the Atom BIDI spec too. There are currently more editors that can insert formatting characters than can handle Atom BIDI markup, aren't there? I think that will always be the case.


Failing to implement the atom bidi spec has significantly fewer consequences than improperly implementing the unicode formatting characters. That is, existing applications will be no worse off than they currently are if the dir attribute is ignored or dropped; however, existing applications can be severely impacted by the improper use of the unicode bidi characters. Again, there are very good reasons behind the recommendation against using the formatting codes in markup.

[snip]
atom:name: When consuming atom:name, treat it as though it was defined as a text construct, and allow for type="xhtml" and type="html". When producing atom:name elements, BIDI markup SHOULD be replaced with Unicode formatting characters when needed. Ruby markup may be replaced with Unicode formatting characters or stripped entirely. All markup should be stripped entirely, the type attribute should be removed.


As I pointed out in my previous note, this is not an interoperable suggestion and is not supported in any way by the specification.

- James

- Brian




Reply via email to