Bug#886113: marked as done (ocrodjvu does not find any languages with tesseract 4.x)

Debian Bug Tracking System Mon, 07 May 2018 06:57:20 -0700

Your message dated Mon, 7 May 2018 15:52:29 +0200
with message-id <[email protected]>
and subject line Re: Bug#886113: ocrodjvu does not find any languages with 
tesseract 4.x
has caused the Debian Bug report #886113,
regarding ocrodjvu does not find any languages with tesseract 4.x
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
886113: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=886113
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: ocrodjvu
Version: 0.10.2-1
Severity: important

After upgrading tesseract to 4.x, ocrodjvu does not work for me at all.
The immediate error message is:

    usage: ocrodjvu [options] FILE
    ocrodjvu: error: language pack for the selected language (eng) is not 
available

That's when I run ocrodjvu -e tesseract -l eng on some file. Looking
closer it seems that ocrodjvu does not find any files anymore. It seems
that this is due to tesseract having changed the layout of files (or the
error message). Formerly the files were located in
/usr/share/tesseract-ocr/tessdata/, but now they moved to
/usr/share/tesseract-ocr/4.00/tessdata/. ocrodjvu uses the command
"tesseract '' '' -l nonexistent" to find these paths and for an old
tesseract the output was:

    Tesseract Open Source OCR Engine v3.03 with Leptonica
    Error opening data file 
/usr/share/tesseract-ocr/tessdata/nonexistent.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent directory of your "tessdata" directory.
    Failed loading language 'nonexistent'
    Tesseract couldn't load any languages!
    Could not initialize tesseract.

But for the new tesseract the output is:

    Error opening data file 
/usr/share/tesseract-ocr/4.00/nonexistent.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.
    Failed loading language 'nonexistent'
    Tesseract couldn't load any languages!
    Could not initialize tesseract.

Note in particular that the error message lacks the tessdata
subdirectory. This confuses ocrodjvu such that it doesn't find any
languages. One can make symlinks from foo -> tessdata/foo to work around
the issue.

This could be a tesseract bug, so maybe it needs to be reassigned.

Helmut

--- End Message ---

--- Begin Message ---

This was fixed in Tesseract upstream:
https://github.com/tesseract-ocr/tesseract/commit/af6994efd945

For Debian:

  tesseract (4.00~git2207-766b7bd6-1) unstable; urgency=medium
    ...
    *  af6994ef - Don't try alternate path for tessdata (#1328)
    ...
   -- Alexander Pozdnyakov <[email protected]>  Tue, 20 Feb 2018 20:06:42 +0300

--
Jakub Wilk

--- End Message ---

Bug#886113: marked as done (ocrodjvu does not find any languages with tesseract 4.x)

Reply via email to