Hi Andreas,

> It says, that languages are defined by three letter tags; so for german that 
> would be something like "ger" or "deu".
> 
> If instantiated like this in the dublincore metadata files, one should think 
> there a need to use the same 3 letters in the mysql
> dictionary tables as language identifiers, right?

That's a valid assumption. However, in the current implementation, the 
dictionary service will provide text that needs to be verified to the 
dictionary twice: upon the first pass, the language is determined, and only on 
the second pass is the dictionary doing the actual cleanup, which essentially 
means that the dictionary service is completely ignoring the language field. 
The idea behind this implementation is that instructors are likely to include 
different languages  within one presentation, so going with just one language 
does not provide good output.

> The question arises, because on
> http://opencast.jira.com/wiki/display/MH/Configure+Text+Analysis+%28Trunk%29
> there seem to be used 2-letter identifiers for the languages.

This is correct. It would however probably make sense to switch the API to a 
Locale or some other kind of object rather than a simple string.

> (I know I should go check the code for an answer on this, but maybe the 
> answer to this is in one of your minds already :)

I had to check the code, too :-)

Tobias

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to