Re: [bidi] Re: PRI 185 Revision of UBA for improved display of URL/IRIs

2011-07-29 Thread Martin J. Dürst

Hello Mark, others,

On 2011/07/28 5:01, Mark Davis ☕ wrote:

Just to remind people: posting to this list does *not* mean submitting to
the UTC. If you want to discuss a proposal here, not a problem, but just
remember that if you want any action you have to submit to the UTC.

Unicode members via: http://www.unicode.org/members/docsubmit.html
Others via: http://www.unicode.org/reporting.html


[I'll copy this text to the i...@ietf.org mailing list (mailing list of 
the EAI (Email Address Internationalization) WG, to have a public 
record, because that's the mailing list where most of the discussion 
about this draft in the IETF happened, as far as I'm aware of.]



Context
===

I'm an individual Unicode member, but I'll paste this in to the 
reporting form because that's easier. Please make a 'document' out of it 
(or more than one, if that helps to better address the issues raised 
here). I apologize for being late with my comments.



Substantive Comments


On substance, I don't agree with every detail of what Jonathan Rosenne, 
Behdad Esfahbod, Aharon Lanin and others have said, I agree with them in 
general. If their documents/messages are not properly submitted, I 
include them herewith by reference.


The proposal is an enormous change in the Bidi algorithm, changing its 
nature in huge ways. Whatever the details eventually may look like, it 
won't be possible to get everything right in one step, and probably 
countless tweaks will follow (not that they necessarily will make things 
better, though). Also, dealing with IRIs will increase the 
appetite/pressure for dealing with various other syntactical constructs 
in texts.


The introduction of the new algorithm will create numerous compatibility 
issues (and attack surfaces for phishing, the main thing the proposal 
tries to address) for a long period of time. Given that the Unicode 
Consortium has been working hard to address (compared to this issue) 
even extremely minor compatibility issues re. IDNs in TR46, it's 
difficult for me to see how this fits together.



Taking One Step Back


As one of the first people involved with what's now called IDNs and 
IRIs, I know that the problem of such Bidi identifiers is extremely 
hard. The IETF, as the standards organization responsible for 
(Internationalized) Domain Names and for URIs/IRIs, has taken some steps 
to address it (there's a Bidi section in RFC 3987 
(http://tools.ietf.org/html/rfc3987#section-4), and for IDNs, there is 
http://tools.ietf.org/html/rfc5893).


I don't think these are necessarily sufficient or anything. And I don't 
think that the proposal at hand is completely useless. However, the 
proposal touches many aspects (e.g. recognizing IRIs in plain text,...) 
that are vastly more adequate for definition in another standards 
organization or where a high-bandwidth coordination with such an 
organization is crucial (roughly speaking, first on feasibility of 
various approaches, then on how to split up the work between the 
relevant organizations, then on coordination in details.) Without such a 
step back and high-bandwidth coordination, there is a strong chance of 
producing something highly suboptimal.


(Side comment on  detail: It would be better for the document to use 
something like
http://tools.ietf.org/html/rfc3987#section-2.2 rather than the totally 
obscure and no longer maintained 
http://rfc-ref.org/RFC-TEXTS/3987/chapter2.html, in the same way the 
Unicode Consortium would probably prefer to have its own Web site 
referenced for its work rather than some third-party Web site.)



Taking Another Step Back


I mention 'high-bandwidth' above. The Unicode Public Review process is 
definitely not suited for this. It has various problems:

- The announcements are often very short, formalistic, and cryptic
  (I can dig up examples if needed.)
- The announcements go to a single list; forwarding them to other
  relevant places is mostly a matter of chance. This should be improved
  by identifying the relevant parties and contacting them directly.
- To find the Web form, one has to traverse several links.
- The submission is via a Web form, without any confirmation that the
  comment has been received.
- The space for comments on the form is very small.
- There is no way to make a comment public (except for publishing it
  separately).
- There is no official response to a comment submitted to the Web form.
  One finds out about what happened by chance or not at all.
  (compare to W3C process, where WGs are required to address each
   comment formally, and most of them including the responses are
   public)
- The turnaround is slow. Decisions get made (or postponed) at UTCs
  only.
Overall, from an outsider's point of view, the review process and the 
review form feel like a needle's ear connected to a black hole.


[I very much understand that part of the reason the UTC works the way it 
works is because of 

Record-A-Thon is tomorrow – help record 50 langauges in a single day

2011-07-29 Thread Eric Muller



From http://blog.mightyverse.com/2011/06/300-languages-record-a-thon/


   On July 30th, 2011 we will meet at the Internet Archive in San
   Francisco, where volunteers will record the Universal Declaration of
   Human Rights http://www.un.org/en/documents/udhr/index.shtml
   (UDHR) in their native language(s). Mightyverse volunteers will
   assist recording at several recording stations. Each station will be
   equiped with a video camera, monitor, lighting, microphone and
   Mightyverse PhraseFarm teleprompter system to enable the capture of
   spoken language. These high quality recordings of native speakers
   will be made available at archive.org http://archive.org/ under a
   Creative Commons license.
   Mightyverse is excited to support the Long Now Foundation
   http://longnow.org/’s 300 languages project in its July 30th 2011
   record-a-thon http://rosettaproject.org/record-a-thon/. The goal
   of the 300 languages project is to record spoken language that has
   parallel translations in at least 300 languages. Towards that
   effort, Laura Welcher and her team at The Rosetta Project
   http://rosettaproject.org/ (an ongoing effort by The Long Now
   Foundation) have identified texts that already exist in parallel
   translations. Of those texts, we at Mightyverse were especially
   excited by the UDHR.



Signup for UDHR Recording: 
https://spreadsheets.google.com/spreadsheet/viewform?formkey=dEM2cW9wSm4za0VmSHZwTEI2amxhNUE6MQ 



Also, it will be a fun day with free form language recording, some 
speakers at the beginning of the day and at lunch and there'll be food 
and prizes for people who record.


Eric.