Package: tesseract-ocr
Version: 2.04-2+b1
Severity: normal
On my machine[0] it takes almost 4 minutes to process
/usr/share/dict/words. I tried to build a DAWG for a Polish dictionary
with more than 3 million words, but I gave up after 2 hours of waiting.
Unless I'm missing something building DAWGs shouldn't be *that* slow.
E.g. dawgdic[1] is able to build a DAWG (in a different format, but
still...) for the Polish dictionary in a few seconds.
[0] $ cat /proc/cpuinfo | grep bogo
bogomips : 4620.39
bogomips : 4620.39
[1] http://code.google.com/p/dawgdic/
-- System Information:
Debian Release: squeeze/sid
APT prefers unstable
APT policy: (990, 'unstable'), (500, 'experimental'), (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores)
Locale: LANG=C, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages tesseract-ocr depends on:
ii libc6 2.11.2-7 Embedded GNU C Library: Shared lib
ii libgcc1 1:4.5.1-11 GCC support library
ii libjpeg62 6b1-1 The Independent JPEG Group's JPEG
ii libstdc++6 4.4.5-8 The GNU Standard C++ Library v3
ii libtiff4 3.9.4-5 Tag Image File Format (TIFF) libra
ii tesseract-ocr-eng [tesser 2.00-2 tesseract-ocr language files for E
ii tesseract-ocr-spa [tesser 2.00-2 tesseract-ocr language files for S
ii zlib1g 1:1.2.5.dfsg-1 compression library - runtime
--
Jakub Wilk
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]