The language is determined by scoring the text against each installed language. 
 Take the following example:

"Ich have a brown dog."

Most of these words exist in the English dictionary but only one exists in 
German.  The dictionary will therefore use English to determine which of the 
text tokens are valid words.  The one word in German (Ich) is likely due to a 
quality problem with the OCR.  It's not actually German text, and "Ich" should 
not be treated as a valid word in this case.  Had the rest of the words been 
German, "Ich" should be treated as valid.

Keep in mind that the language packs must be copied to 
$FELIX/conf/dictionaries/ .  Once the contents of those csv files are loaded 
into the database, the files will be deleted.  This loading process needs to be 
done only once on a single worker node.

Josh

On Jun 29, 2011, at 8:58 AM, matpro_fhkoeln wrote:

> Hello Ladies and Gentlemen,
> 
> According to
> http://opencast.jira.com/wiki/display/MHDOC/Configure+Text+Analysis+v1.1
> "Matterhorn can support any number of language packs concurrently,
> and will attempt to determine the most appropriate language for each
> video segment it analyzes."
> 
> Besides, these three language packs
> http://downloads.opencastproject.org/artifacts/
> are enclosed in a fresh matterhorn installation.
> 
> Path to csv-files seems slightly different:
> 
> matpro@pips03:~$ ls -l /opt/matterhorn/felix/conf/dictionaries/
> insgesamt 0
> 
> root@pips03:/home/matpro# find / -name de.csv
> /opt/matterhorn/1.1.0/docs/felix/conf/dictionaries/de.csv
> 
> matpro@pips03:~$ ls -l /opt/matterhorn/1.1.0/docs/felix/conf/dictionaries/
> insgesamt 324
> -rw-r--r-- 1 matpro matpro 107141 17. Jun 15:05 de.csv
> -rw-r--r-- 1 matpro matpro  99998 17. Jun 15:05 en.csv
> -rw-r--r-- 1 matpro matpro 104690 17. Jun 15:05 es.csv
> 
> So which criteria is used to determine the exact language pack?
> Is this detected through the media title?
> 
> Thank you in advance,
> regards,
> 
> [email protected]
> 
> _______________________________________________
> Community mailing list
> [email protected]
> http://lists.opencastproject.org/mailman/listinfo/community
> 
> 
> To unsubscribe please email
> [email protected]
> _______________________________________________

_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to