[Mayan EDMS: 533] Re: How do I force a language for the OCR to a specific document with metadata?

Roberto Rosario Mon, 04 Mar 2013 14:04:02 -0800

Hi Imię Nazwisko,

You can force the OCR engine Tesseract to use a specific language by 
setting the OCR_TESSERACT_LANGUAGE configuration option.  I thought about 
setting the language on a per document basis but sometimes a document may 
contain more than language, so didn't implemented until someone needed it.

Right now the handling of Tesseract is hard coded, I was planning to 
abstract the OCR engine to allow other software to be used for OCR and that 
would be a good chance to add ISO language code per document using a 
language code property.  Do you think a metadata for OCR language could 
offer a some benefit or a language property per document would be fine for 
your use case?

--Roberto

On Thursday, February 21, 2013 4:36:35 PM UTC, Imię Nazwisko wrote:
>
> I sent a test piece of a PDF file but in PNG format and OCR results I 
> received were very weak. The file was in Polish.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[Mayan EDMS: 533] Re: How do I force a language for the OCR to a specific document with metadata?

Reply via email to