Monday, January 17, 2005, 6:11:22 PM, you wrote:

>> What do you think about only allowing it on atom:content and on Text
>> constructs?

> I'm very concerned.  Any time you're processing text and you want to
> display it or index it for search, or do pretty well anything, if you
> don't know what language it's in you're probably going to make damaging
> mistakes.  Consider a Turkish author whose name begins with 'I'... feed
> that to a search engine that doesn't know it's in Turkish and the 
> results will be wrong.

Yes this does sound like a problem.

> Suppose Joi Ito wants to list his name in
> Japanese but still write in English; or the the reverse.

Let's hope he doesn't want to provide a name in more than one language.

> Suppose I'm
> publishing in Moscow and I want my copyright statement to be in Russian
> irrespective of what language I'm writing in?

Copyright is a Text Construct, so at least this one would be ok.

>> I think that if we allow xml:lang then it should definitely be
>> restricted. The current "xml:lang everywhere" situation is only simple
>> to implement if you assume that your implementation stores all of it's
>> data in an XML DOM.

> This statement is not correct; it's simple to implement correctly if
> you're doing stream processing.

I agree, parsing is simple whether it is stream or tree parsing.  I'm
worried about what we do with this data afterwards.

It isn't clear what xml:lang should apply to? Does it apply to email
addresses? Aural browsers might use it as a pronunciation hint. So
does a database implementation need an "email_lang" column in the
table or not?  I think we need to decide on these issues rather than
leave it to implementors to guess.

> And I repeat, anything that is remotely general-purpose needs to 
> associate each little chunk of text with its language, if it wants to
> operate correctly.  Sometimes being correct has convenience costs.

I'm not suggesting that "xml:lang everywhere" is useless. I'm pointing
out that the costs are deceptively high. If the consensus is that the
current situation is acceptable then I'll be ok with that.

My Pace might not be an acceptable solution, but I think we perhaps
ought to be clear about what xml:lang applies to, including a strategy
for extensions.

-- 
Dave

Reply via email to