I have created a fresh dictionary from the wikipedia latest articles - this new 
en.csv file is smaller (131.1 MB) than the one on 
http://downloads.opencastproject.org/artifacts/ 

It took 4h 31min (on MacPro) of parsing time to create this new dictionary: 
10,545,272 unique words from 2041,513,935 total words.

This new dictionary file were then added to the core (after 2h 7min) with the 
final message:  

  INFO(DictionaryScanner:90) Finished loading pack from ...../en.csv

   despite many messages like:  "Unable add words:"   for chinese or similar 
characters or long string of characters.

 So I think, the new dictionary was correctly imported, and the folder 
../conf/dictionary is also empty after the import.

BUT, still the Segment Text is displaying a total garbage.

Any idea what else could be wrong?

Thanks,
Leslaw


Begin forwarded message:

> From: Adam Hochman <[email protected]>
> Date: October 26, 2011 6:08:48 PM GMT+01:00
> To: Matterhorn Users <[email protected]>
> Subject: Re: [Matterhorn-users] the text analysis: does it run successfully?
> Reply-To: [email protected], Matterhorn Users 
> <[email protected]>
> 
> A very limited dictionary is included ootb.  Here are instructions on how to 
> install a more expansive dictionary the includes all of the words found in 
> Wikipedia.  Even this solution isn't fool proof because some non-sensical 
> words exist in Wikipedia, but the results are significantly better.
> http://opencast.jira.com/wiki/display/MH/Configure+Text+Analysis+%28Trunk%29
> http://downloads.opencastproject.org/artifacts/
> 
> On 10/26/11 9:47 AM, Dr Leslaw Zieleznik wrote:
>> I have a question about the text analysis:   does it run successfully?
>> 
>> I have uploaded a very good quality videos with very good audio too, and 
>> with the discrete scenes/images selected, one video with included slides.
>> But in both cases the Segment Text, when playing videos is displayed in 95% 
>> as a garbage.
>> According to the documentation, the language pack is included with the 
>> installation.
>> Is it anything I am missing in the setup/installation?
>> 
>> 
>> Many thanks,
>> Leslaw
>> 
>> _______________________________________________
>> Matterhorn-users mailing list
>> [email protected]
>> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
>> 
> _______________________________________________
> Matterhorn-users mailing list
> [email protected]
> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users


_______________________________________________
Matterhorn-users mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn-users

Reply via email to