RE: BIDI (was Proposal: Atomext WG)

Brian Smith Mon, 07 Jan 2008 12:46:48 -0800

James M Snell wrote:
> > So, Atom implementations have to be prepared to accept at 
> least LRM and RLM in documents, anyway.
> > 
> 
> Never said they didn't.  All the atom bidi draft does is 
> provide a markup alternative in order to make life easier... 
> which, reminds me, you still haven't explained why you think 
> the dir attribute is more complicated to implement than the 
> bidi characters.


* A processor still has to handle the formatting characters regardless of 
whether the document uses the Atom BIDI markup.

* The inheritance of directionality makes it more difficult to compose, 
decompose, and remix atom documents than if the directionality markup was 
localized near the text that needs it.

* In my testing, current Atom(Pub) implementations were more likely to strip 
the "dir" attribute than they were to strip Unicode formatting characters.

* My code already preserves BIDI directionality information encoded using the 
mechanism I suggested this thread, with no extra work on my part. Since the 
preservation of the formatting characters and (X)HTML BIDI markup required 
literally ZERO lines of code on my part, and support for the Atom BIDI draft 
requires more than zero lines of code, I find the Atom BIDI draft to be more 
difficult to implement. 

* The suggested mechanism doesn't all

> > When software breaks apart BIDI text and recombines it, it has to 
> > preserve the BIDI formatting. In this case, the system that splits 
> > apart the tags into an array and/or the system that recombines the 
> > tags into a comma-seperated list should transparently 
> > handle the formatting codes.
> > 
> 
> Let me ask again: is the user responsible for adding the appropriate 
> formatting codes around each individual tag in the list?  

Not if you don't already need to using your current implementation. Let's say 
the uesr has a list "abc, FED, ghi" with no control characters and checks the 
RTL checkbox. You break it into the three parts and store it in an array or 
database or wherever, along with the checkbox value. When you want to 
regenerate the list, you take all the entries and join then with ", ", you set 
the text box's directionality using the checkbox value, and then you put the 
regenerated list back in the text box--again, no formatting codes. When you 
generate your atom document, you don't need a "dir" attribute or unicode 
control characters for the @label value, because the BIDI algorithm will handle 
everything correctly.

That is assuming that the BIDI algorithm can already handle the list of labels 
without hints. If you send me some examples of labels that are not 
BIDI-algorithm-friendly (preferably ones that that your implementation 
supports), I will explain when the control characters are necessary. 

> When the user wishes to edit the entry later, are they supposed
> to just know that there are non-visual bidi formatting codes
> interspersed into the comma separated list of tags?

No, I am not recommending that. The only case where that would be necessary is 
when you need to have nested embedding levels in your list of tags. That seems 
to be something that your implementation doesn't support anyway, based on your 
description.

> Your solution is based on a lot of assumptions and possibilities.  It 
> *may* work in *many* cases.  It *likely* won't be a problem.  It's 
> *possible* that clients will get it right.  It also goes against 
> documented best practices and recommendations, defended only by an 
> untested hypothesis that it's "simple" and "easy". That's not 
> acceptable when we can do better.

My main assumption is that very few implementations will implement the Atom 
BIDI mechanism properly or consistently. Consequently, I want to focus on a 
solution that can be "accidently" implemented correctly, as much as possible.

I guess I am just playing the "worse is better" card. I've asked you multiple 
times to supply some concrete examples where my idea breaks down (for BIDI IRI 
templates and for BIDI in Atom). Those requests were not rhetorical--having 
those examples really makes everything easier to understand for people like me, 
who aren't experts in BIDI text handling. Do you have links to real feeds that 
makes good use of the Atom BIDI draft features, or some other real-world 
examples, so we can study them?

> The Atom bidi attribute is not the whole solution; it's just 
> one part of the larger picture, intended to fill in the gaps
> and make certain things simpler. Further, it is consistent
> with the approach taken by XHTML and the recommendations of
> both the W3C and the Unicode organization.

I never doubted that. The only thing I doubt is that it is better or simpler to 
have a seperate mechanism for BIDI than the mechanism for Ruby text or other 
i18n problems with RFC 4287. 

I really liked your idea of a <x:display-name> element to solve the problem 
with Ruby text in <atom:name>. That same solution also be applied to 
atom:link/@title and atom:category/@label to allow all language-sensitive 
constructs in RFC 4287 to be compliant with the "Unicode in XML" guidelines. 
And, coincidently, it would also work for BIDI text and probably even Gaiji 
(which I didn't even know about until Murata-san mentioned it). I really think 
that the "use XHTML text constructs for i18n" idea is easy to understand and 
relatively simple to implement, particularly since the cascading/inheriting of 
directionality overrides becomes very localized.

> Ok... and?  The meaning of the text in the spec is clear: 
> clients should only change things they intentionally want
> changed. If the atom:link element is replaced without the
> dir attribute, then obviously the only reasonable
> interpretation the server can make is that the client
> wanted that dir attribute to be changed.

I am not talking about clients stripping the directionality information as much 
as servers stripping it. For example, in my AtomPub implementation, the server 
ignores any changes to "edit" and "edit-media" links, and any changes to 
atom:content for media link entries, including all attributes. I had difficulty 
with getting other AtomPub servers to preserve the "dir" attribute except on 
XHTML elements. It is my understanding that every server can strip off the 
"dir" attribute if it wants to. Do you know of any servers that preserve it?

Regards,
Brian

RE: BIDI (was Proposal: Atomext WG)

Reply via email to