Hi Imię Nazwisko, You can force the OCR engine Tesseract to use a specific language by setting the OCR_TESSERACT_LANGUAGE configuration option. I thought about setting the language on a per document basis but sometimes a document may contain more than language, so didn't implemented until someone needed it.
Right now the handling of Tesseract is hard coded, I was planning to abstract the OCR engine to allow other software to be used for OCR and that would be a good chance to add ISO language code per document using a language code property. Do you think a metadata for OCR language could offer a some benefit or a language property per document would be fine for your use case? --Roberto On Thursday, February 21, 2013 4:36:35 PM UTC, Imię Nazwisko wrote: > > I sent a test piece of a PDF file but in PNG format and OCR results I > received were very weak. The file was in Polish. > -- --- You received this message because you are subscribed to the Google Groups "Mayan EDMS" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
