Ciaran Gultnieks wrote:
Craig Andrews wrote:
I would like to see a notice has a new field saved with in the database:
it's language. A notice's language is set by (first match wins):
1. Language hash tag (#.en #.fr etc)
2. API provided meta data (or web interface selection)
3. "Language" selection from the user's profile

This makes perfect sense to me. I would add that the language metadata
needs to be incorporated into the OMB protocol so this information can
be passed from server to server.

Regarding the hash tag, there is probably room for debate as to whether
that is the right way to deal with what is probably an edge case of
users that switch between languages frequently. Interface-specific
solutions might be more friendly, for example - e.g. via the web,
a dropdown to override the lanauge, via xmpp, a 'change language'
command.
Agreed.

I wanted to add a last check, for the browser language, but since that's a multi-valued variable that is also negotiated based on server-side support of the language. So it's not a good way to determine what the language of submitted text is.

Another option I thought of and rejected: I think it's possible but probably inefficient to make a good guess at what language is used based on characters in the notice. For example, only Arabic, Farsi and Urdu use the Arabic Unicode range. (I think... for the sake of argument, let's accept that as true.) So we could check which code ranges are used in the notice, maybe with percentages (98% arabic, 2% roman). We could then make a guess based on vocabulary -- if the notice contains 12 common words in Urdu, 1 that's common in Farsi, and 0 in Arabic, then it's probably Urdu (and the Farsi word is probably a false hit due to a homonym).

A lot of times we wouldn't have to go that far. Something that's 18% kanji, 70% hiragana and 12% katakana is almost definitely Japanese, for example.

This sounds really hard to me, would take a lot of time at notice submit time, and would be almost intractable for the latin code range. I think probably someone should do this work, somewhere... but it's probably not up to us to do it.

-Evan

--
Evan Prodromou
CEO, Control Yourself, Inc.
e...@controlyourself.ca - http://identi.ca/evan - +1-514-554-3826

_______________________________________________
Laconica-dev mailing list
Laconica-dev@laconi.ca
http://mail.laconi.ca/mailman/listinfo/laconica-dev

Reply via email to