El dl 26 de 03 de 2012 a les 15:14 +0400, en/na Rime Rime va escriure:
> Dear fiends!
> 
> 
> For a few years our group has been developing OCR (optical character
> recognition) and translation system with Open Source code for Asian
> languages. The key features of the OCR system include:
> 1. Stream OCR processing During the first stage of the project, we
> recognized 300 000 pages of Tibetan Canon in Tibetan for TBRC Digital
> Library (www.tbrc.org) We used MacPro server that has processed all
> 280 volumes with one OCR set.
> 
> 2. Tibetan spell checker and online dictionary on 250000 words ans 6.5
> mln wordlist.
> 
> 3. Multilingual support At present, the key direction of the project
> is Tibetan, Sinhala, Sanskrit, Kannada OCR.
> 
> 4. High accuracy. The system uses dictionary control at all stages of
> OCR processing. Its Grammar Corrector can use a statistic dictionary
> containing 20-30 mln phrases (the Tibetan dictionary now includes 8.5
> mln). For Tibetan books, the current recognition results are 1 error
> per 1000 characters. 
> 
> In current stage of project: 
> We has grammar analysis module for tibetan and sanskrit. In include
> corpus and full-text fussy search 1sec for 1Gb corpus
> It is need incorporate it with HFST and Apertium

It isn't really clear how this will help us make better machine
translation systems, and it doesn't relate to any of the ideas on our
ideas page!

Can you think of how the project will help us build MT systems for
Apertium ? 

Fran



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to