I have created a fresh dictionary from the wikipedia latest articles - this new en.csv file is smaller (131.1 MB) than the one on http://downloads.opencastproject.org/artifacts/
It took 4h 31min (on MacPro) of parsing time to create this new dictionary: 10,545,272 unique words from 2041,513,935 total words. This new dictionary file were then added to the core (after 2h 7min) with the final message: INFO(DictionaryScanner:90) Finished loading pack from ...../en.csv despite many messages like: "Unable add words:" for chinese or similar characters or long string of characters. So I think, the new dictionary was correctly imported, and the folder ../conf/dictionary is also empty after the import. BUT, still the Segment Text is displaying a total garbage. Any idea what else could be wrong? Thanks, Leslaw Begin forwarded message: > From: Adam Hochman <[email protected]> > Date: October 26, 2011 6:08:48 PM GMT+01:00 > To: Matterhorn Users <[email protected]> > Subject: Re: [Matterhorn-users] the text analysis: does it run successfully? > Reply-To: [email protected], Matterhorn Users > <[email protected]> > > A very limited dictionary is included ootb. Here are instructions on how to > install a more expansive dictionary the includes all of the words found in > Wikipedia. Even this solution isn't fool proof because some non-sensical > words exist in Wikipedia, but the results are significantly better. > http://opencast.jira.com/wiki/display/MH/Configure+Text+Analysis+%28Trunk%29 > http://downloads.opencastproject.org/artifacts/ > > On 10/26/11 9:47 AM, Dr Leslaw Zieleznik wrote: >> I have a question about the text analysis: does it run successfully? >> >> I have uploaded a very good quality videos with very good audio too, and >> with the discrete scenes/images selected, one video with included slides. >> But in both cases the Segment Text, when playing videos is displayed in 95% >> as a garbage. >> According to the documentation, the language pack is included with the >> installation. >> Is it anything I am missing in the setup/installation? >> >> >> Many thanks, >> Leslaw >> >> _______________________________________________ >> Matterhorn-users mailing list >> [email protected] >> http://lists.opencastproject.org/mailman/listinfo/matterhorn-users >> > _______________________________________________ > Matterhorn-users mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
_______________________________________________ Matterhorn-users mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn-users
