[Apertium-stuff] Tibetan, Sanskrit, Apertium

Rime Rime Mon, 26 Mar 2012 04:14:29 -0700

Dear fiends!

For a few years our group has been developing OCR (optical character 
recognition) and translation system with Open Source code for Asian languages. 
The key features of the OCR system include:
1. Stream OCR processing During the first stage of the project, we recognized 
300 000 pages of Tibetan Canon in Tibetan for TBRC Digital Library 
(www.tbrc.org) We used MacPro server that has processed all 280 volumes with 
one OCR set.


2. Tibetan spell checker and online dictionary on 250000 words ans 6.5 mln 
wordlist.

3. Multilingual support At present, the key direction of the project is 
Tibetan, Sinhala, Sanskrit, Kannada OCR.

4. High accuracy. The system uses dictionary control at all stages of OCR 
processing. Its Grammar Corrector can use a statistic dictionary containing 
20-30 mln phrases (the Tibetan dictionary now includes 8.5 mln). For Tibetan 
books, the current recognition results are 1 error per 1000 characters. 

In current stage of project: 
We has grammar analysis module for tibetan and sanskrit. In include corpus and 
full-text fussy search 1sec for 1Gb corpus
It is need incorporate it with HFST and Apertium
In StPetersburg State University we receive letter about Apertium project and 
GoogleCodde Summer with recommendation to join your project.

How we can cooperate efforts?

Sincerely yours alex
http://code.google.com/p/ocrlib/

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Tibetan, Sanskrit, Apertium

Reply via email to