Brian Smith wrote:
[snip]
Atom documents are almost never hand-entered, and there is
already a specification in place for markup up BIDI and even
ruby text in general XML. The odds that clients and servers
are going to correctly implement this extension--except
those targeted direclty towards BIDI users--seem pretty
low to me. Personally, it seems much easier to
implement the an existing BIDI markup mechanism (Unicode,
XML, and/or XHTML) than a new standard.
What are you basing that on?
The Unicode/W3C guidelines (http://www.w3.org/TR/unicode-xml/#Bidi and
http://www.w3.org/International/questions/qa-bidi-controls) say this:
* Use *XHTML* BIDI markup whenever possible.
* Otherwise *CSS* whenever possible.
* Otherwise, consider building BIDI markup into your markup schema.
* We have to support BIDI formatting codes anyway, since the above mechanisms
don't solve all BIDI problems.
Sorry if I wasn't clear, I was referring specifically to your assertion
that, "it seems much easier to implement the an existing BIDI markup
mechanism"... Having already implemented it, I can see no additional
complexity or difficulty.
There's really not much guess work
involved in the implementation and apps that choose not to implement
support will be no worse or better off than they are currently.
That is not true. Consider a feed aggregator. If it doesn't support
Atom BIDI, then it will not correctly rewrite entries to handle an
inherited "dir" attribute from the atom:feed element. Since the
directionality is also inherited by text constructs, whenever the
implementation passes the text construct to a rendering engine, it
needs to rewrite that content to handle the inherited directionality.
If an aggregator doesn't support the dir attribute, it will ignore it or
drop it as if it wasn't there in the first place and will continue to
operate as it always had, which is exactly what we want. So again,
those apps will be no worse or better off than they are currently.
Don't forget atom:link/@title and atom:category/@label.
atom:category/@term can also cause problems when
implementation use that
value for display purposes.
I know that. But, the Atom BIDI draft does not eliminate all uses of BIDI
formatting characters in these attributes, either. And, it doesn't specify
BIDI support for language-sensitive content in Atom service documents,
category documents, RSS feeds, or RSD documents.
I have no interest in eliminating all uses of bidi formatting characters.
And yes, the spec does address bidi support for language-sensitive
content in Atompub service and category documents. The spec language
can be improved in this regard, but the spec alters the definition of
the atomCommonAttributes production to add the dir attribute. In both
the Atompub service document and the Atom categories document, the
langauge-sensitive text is provided by elements from the Atom namespace
(e.g. atom:category and atom:title). Also, the bidi spec predates the
publication of rfc5023; I intend to have the next rev of the spec
specifically discuss atompub service docs.
As for RSS and RSD doc's, I couldn't care less about solving the i18n
issues of either.
Arbitrary extension elements can also have problems.
The BIDI draft says it only applies to constructs that RFC4287 labeled "language
sensitive." Accordingly, the BIDI draft does not apply to extension elements.
Section 6.4.2: "Structured Extension elements are Language-Sensitive."
The bidi draft doesn't attempt to solve all the i18n issues
with Atom. Ruby text is a problem for pretty much everything,
especially given the fact that most browsers don't have a
clue how to properly render ruby text yet. The bidi draft
rightfully focuses on one small part of the problem.
I agree that a narrow scope is good. But, a solution for Ruby text will also be applicable to BIDI, especially if that solution involves the reuse of XHTML markup and/or CSS.
if/when the need emerges to improve ruby text support in Atom, I will
glady help work out a solution.
It also doesn't solve the problem with atom:link/@title or
other attributes that
are language-sensitive.
Yes, it does.
The Atom BIDI draft does not provide a way of specifying base different
base directionalities for attributes on the same element, it doesn't eliminate
all need for BIDI formatting characters in language-sensitive attribute values,
it doesn't provide a mechanism for discovering which (nested) extension elements
and attributes are affected by the proposed Atom BIDI markup.
atom:category and atom:link each have exactly one language-sensitive
attribute.
there's no reason to try eliminating all need for bidi formatting
characters in language sensitive attribute values.
The bidi attribute applies to language-sensitive elements and
attributes. Defining which extension elements are language sensitive is
up to the extension definition. There's no reason to provide a
mechanism for discovering which extensions are affected.
If I implement RFC4287, the Unicode BIDI algorithm, XHTML BIDI, HTML BIDI, and
the "Unicode in XML" guidelines, I will have pretty good BIDI support. It will
require me to adhere to four different BIDI standards in addition to RFC4287.
That is a lot of work already. Now, your Atom BIDI and URI template BIDI proposals
add two more specifications that I would have to support--for a total of SIX standards
to adhere to and resolve conflicts between, JUST to support BIDI text. And, even if
I create well-formed documents adhering to all six standards, whenever I open them up
in any of my text editors, or any feed reader, they will look wrong since nobody
else is implementing all of those standards. I think that is totally unreasonable.
Abdera might have amazing support for BIDI, for which you should be commended, but
unless all Atom software is going to be implemented on top of Abdera, Abdera will
not be able to reliably interoperate with anything. If we want to provide interopera!
ble support for BIDI, we need to make it as simple to implement as possible.
My counter-proposal is simple:
None of this is simple. Please do not claim that it is.
* Use XHTML/HTML BIDI/Ruby markup whenever possible.
Keep in mind the fact that the atom bidi spec very clearly indicates
that the (x)html bidi mechanisms should be used in addition to the atom
dir attribute.
* Otherwise, use Unicode BIDI/Ruby formatting codes, such that matching pairs
of formatting codes are fully contained within a single text or attribute node.
Whose responsibility is it to apply the formatting codes? The person
typing the text or the software? How does the software know when to
apply the codes? Also, what about when an Atompub client edits an
entry? Is the Atompub client responsible for preserving the unicode
formatting characters? What if they don't? There are existing Atompub
clients out there that, more than likely, will not, and since the
formatting codes are non-visual, it's not likely the user will notice
them either, causing unexpected rendering issues later on. How are
non-bidi enabled clients supposed to know what to do? With the bidi
attribute approach, per rfc5023, non-supporting clients are expected to
at least preserve the bidi attribute but will otherwise continue working
as they currently do, without risk of corrupting the text by
inadvertently dropping or improperly nesting the bidi controls.
Also, imagine a case where we have a feed with 100 entries, each with
about 5 atom:category elements. Let's stay that the feed is generally
all RTL. Using your approach, that's at least 1000 extra characters in
the feed, and 500 opportunities for the embedding to be screwed up.
Re: The ruby formatting codes: Even the Unicode spec warns against using
the ruby formatting codes for anything other than internal storage. We
gain absolutely nothing by bringing ruby into this discussion.
* Editors of new documents must be meticulous about inserting the proper markup
and formatting codes.
* Processors of existing documents must be meticulous about preserving
BIDI/Ruby markup and/or formatting codes whenever any part of the contained
text is preserved.
Again, what about older editors that know nothing about the proper
markup or formatting codes?
I recognize that this goes against the Unicode in XML guidelines. However, Atom
already goes against the guidelines by having language-sensitive text in attribute
values and other contexts where XHTML markup cannot be used.
What is the benefit of going against the Unicode in XML guidelines?
- James
- Brian