Brian Smith wrote:
[snip[
The BIDI draft says it only applies to constructs that
RFC4287 labeled "language sensitive." Accordingly, the BIDI
draft does not apply to extension elements.
Section 6.4.2: "Structured Extension elements are Language-Sensitive."
atom:category and atom:link each have exactly one language-sensitive
attribute.
Which attributes on elements and subelements of structured extension elements
are language-sensitive? It seems it must be either all or none.
It depends on the definition of the extension. If you're not familiar
with the extension, you have no way of knowing.
there's no reason to try eliminating all need for bidi formatting
characters in language sensitive attribute values.
I agree. If there are going to be formatting codes in the document anyway, then
why do we need a mechanism that duplicates their functionality?
Because their use in markup is problematic and is actively discouraged
for a number of very good reasons.
* Otherwise, use Unicode BIDI/Ruby formatting codes, such
that matching pairs of formatting codes are fully contained
within a single text or attribute node.
Whose responsibility is it to apply the formatting codes? The person
typing the text or the software? How does the software know when to
apply the codes?
Also, what about when an Atompub client edits an entry?
Is the Atompub client responsible for preserving the unicode
formatting characters? What if they don't?
My hypothesis is that an implementation ignorant of BIDI issues is more
likely to preserve the formatting characters than Atom/XHTML/HTML BIDI
markup, especially when the effects of those formatting characters never
span multiple nodes in the document.
Have you tested this hypothesis using real editors? Example, in our
internal blogging environment, tags are entered in a single text box,
each tag separated by a comma. The system splits the tags into an array
and saves each tag separately. Each tag becomes a separate
atom:category element. Is the user responsible for adding the
appropriate formatting codes around each individual tag? When the user
wishes to edit the entry later, perhaps to add a new entry, are they
supposed to just know that there are non-visual bidi formatting codes
interspersed into the comma separated list of tags?
There are existing Atompub clients out there that, more than likely,
will not, and since the formatting codes are non-visual, it's not
likely the user will notice them either, causing unexpected
rendering issues later on.
I don't understand why a user that needs BIDI functionality would use a
client that doesn't have BIDI support. Further, I don't understand how
clients that are incapable of generating BIDI formatting codes can generate
markup compliant with your proposal. Are you expecting the users to edit
the markup directly?
You're assuming that all users have the same requirements. In our
environment, a single feed may include entries from many different
users. We have group blogs where users from many different locales have
edit rights on any entry in the blog. Further, our users use many
different editors to write and manage their blog entries. Asking those
users to be mindful of how they're using bidi formatting characters is a
lot more difficult than what we currently do, which is provide a simple
check box to indicate whether or not the entry is "right-to-left", which
in turn, is translated into the appropriate dir="rtl" in the markup.
Clients that do not understand the dir attribute simply ignore it, and
since our software is written so that only explicit changes in value are
recognized (e.g. a missing dir attribute does not mean the dir attribute
value has changed) we're able to work seamlessly with editors that do
not support the attribute.
Markup is just as non-visual as formatting codes, except for people using "view
source" and the like.
Yes, but with the marup we don't have to rely on users getting it right
when they type in the values.
With the bidi attribute approach, per rfc5023, non-supporting
clients are expected to at least preserve the bidi attribute
but will otherwise continue working as they currently do,
without risk of corrupting the text by inadvertently dropping
or improperly nesting the bidi controls.
RFC 4287 and RFC 5023 is pretty unclear about what is required to be preserved.
RFC 5023, Section 9.3: To avoid unintentional loss of data when editing
Member Entries or Media Link Entries, an Atom Protocol client SHOULD
preserve all metadata that has not been intentionally modified,
including unknown foreign markup as defined in Section 6 of [RFC4287].
Seems pretty darn clear to me.
Firstly, RFC 5023 says that implementations can do whatever they want as long as
the results are well-formed. Otherwise, AtomPub implementations that use (X)HTML
whitelists would be non-compliant. Also, the requirement to preserve unknown foreign
markup seems to apply more to unknown extension elements than to unknown attributes
on known elements. In particular, if I replace the atom:author element with a new
one, then I am not going to preserve the old, unknown attributes on the previous
atom:author element.
9.3 uses the term "unknown foreign markup", which, if RFC 4287 defines
as unknown elements AND attributes.
Also, imagine a case where we have a feed with 100 entries, each with
about 5 atom:category elements. Let's stay that the feed is
generally all RTL. Using your approach, that's at least 1000 extra
characters in the feed, and 500 opportunities for the embedding to
be screwed up.
The category labels are almost always going to be composed of strong RTL and
strong LTR characters (only), so the BIDI algorithm will work correctly. And,
when it doesn't work, it is unlikely that it will be due to the wrong base
directionality--in these cases, formatting characters are going to be needed
no matter what. Right?
"almost always" is not "always". The Atom bidi draft covers the cases
where "always" is more desirable than "almost always".
And no, the formatting characters are not always going to be needed. If
I am rendering the text in (x)html, I don't want the formatting
characters in there at all; rather, I want to follow best practices and
use the (x)html provided bidi markup mechanisms.
> [snip]
* Editors of new documents must be meticulous about
inserting the proper markup and formatting codes.
* Processors of existing documents must be meticulous about
preserving BIDI/Ruby markup and/or formatting codes whenever
any part of the contained text is preserved.
Again, what about older editors that know nothing about the proper
markup or formatting codes?
All of these old editors will also fail to implement the Atom BIDI spec too.
There are currently more editors that can insert formatting characters than
can handle Atom BIDI markup, aren't there? I think that will always be the case.
Failing to implement the atom bidi spec has significantly fewer
consequences than improperly implementing the unicode formatting
characters. That is, existing applications will be no worse off than
they currently are if the dir attribute is ignored or dropped; however,
existing applications can be severely impacted by the improper use of
the unicode bidi characters. Again, there are very good reasons behind
the recommendation against using the formatting codes in markup.
[snip]
atom:name: When consuming atom:name, treat it as though it
was defined as a text construct, and allow for type="xhtml"
and type="html". When producing atom:name elements, BIDI markup
SHOULD be replaced with Unicode formatting characters when needed.
Ruby markup may be replaced with Unicode formatting characters or
stripped entirely. All markup should be stripped entirely, the
type attribute should be removed.
As I pointed out in my previous note, this is not an interoperable
suggestion and is not supported in any way by the specification.
- James
- Brian