Bug#886113: ocrodjvu does not find any languages with tesseract 4.x

Helmut Grohne Tue, 02 Jan 2018 04:57:57 -0800

Package: ocrodjvu
Version: 0.10.2-1
Severity: important

After upgrading tesseract to 4.x, ocrodjvu does not work for me at all.
The immediate error message is:


    usage: ocrodjvu [options] FILE
    ocrodjvu: error: language pack for the selected language (eng) is not 
available

That's when I run ocrodjvu -e tesseract -l eng on some file. Looking
closer it seems that ocrodjvu does not find any files anymore. It seems
that this is due to tesseract having changed the layout of files (or the
error message). Formerly the files were located in
/usr/share/tesseract-ocr/tessdata/, but now they moved to
/usr/share/tesseract-ocr/4.00/tessdata/. ocrodjvu uses the command
"tesseract '' '' -l nonexistent" to find these paths and for an old
tesseract the output was:

    Tesseract Open Source OCR Engine v3.03 with Leptonica
    Error opening data file 
/usr/share/tesseract-ocr/tessdata/nonexistent.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent directory of your "tessdata" directory.
    Failed loading language 'nonexistent'
    Tesseract couldn't load any languages!
    Could not initialize tesseract.

But for the new tesseract the output is:

    Error opening data file 
/usr/share/tesseract-ocr/4.00/nonexistent.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.
    Failed loading language 'nonexistent'
    Tesseract couldn't load any languages!
    Could not initialize tesseract.

Note in particular that the error message lacks the tessdata
subdirectory. This confuses ocrodjvu such that it doesn't find any
languages. One can make symlinks from foo -> tessdata/foo to work around
the issue.

This could be a tesseract bug, so maybe it needs to be reassigned.

Helmut

Bug#886113: ocrodjvu does not find any languages with tesseract 4.x

Reply via email to