So, I tried to get more to the bottom of it. @RobertoRosario would you 
please clarify at least my first question: 

Do you also call the extraction exiting text "OCR processing"? *Yes / No?*

This is what I suspect after spending some time trying to understand what 
happens in the code (I am not a dev), making sure python-pdfminer is 
installed and watching the logs for NoMIMETypeMatch and ParserError. Now I 
just kept the "OCR processing" enabled, watched the task manager and threw 
some dozen files into the upload queue -> No visible tesseract process, and 
everything finished much faster than real OCR processing would have. 
If my conclusions are right, everything works as it should, and all the 
time has been. But if you, dear reader, are understanding OCR as "Optical 
Character Recognition" (like I do) and not as "parse existing text from 
documents and if that fails do a real Optical Character Recognition" as I 
*believe* it happens here, you are very likely to waste the same amount of 
time when you are trying to plan things from the beginning.

yes, that also was a little rant, but hopefully this clarification (if so) 
can be seen as a contribution, too.

Now let's find out how to update to the 2.2 when it's available, and if I 
find it documented somewhere I might have some time left afterwards to 
translate some phrases into german. Motivation is there ;)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to