https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7727
--- Comment #15 from John Mertz <john.me...@mailcleaner.net> --- Hello all, As mentioned in the previous comment, a fairly major revision to the plugin has been made which adds image preprocessing with OpenCV. I've been testing succesfully for more than a week on a handful of machines and am currently looking at deploying to some larger installations. The code is now on our GitHub: https://github.com/MailCleaner/TesseractOcr Note that there are two significant branches. master is configured to work correctly with Tesseract version 4 and above which uses training data which provides significantly better results than earlier versions. Using this version is encouraged unless your system does not support it. This also provides additional configuration variables to define the training data location and languages to be passed when executing Tesseract. If you system does not provide Tesseract version 4 or above, there is a branch called 3.00 which will continue to support that. On the issue of adopting the plugin into the core distribution or advertising it as a 3rd party plugin. I don't mind either way now that we've moved distribution of the plugin to GitHub. I would note that the dependency on tesseract-ocr and libopencv-dev does add over 100MB to the plugin's otherwise modest size. Including this bloat for a disabled-by-default plugin probably is not great. We can close this thread and I will announce further updates to the relevant mailing list if no one has any additional input. For emails with many/large images, the impact on scantimes can be significant. As it is, the plugin has configuration variables for various time, size and dimension limits, so the outlier messages cannot be too catastrophic to overall performance, but the next feature is likely to be caching to help reduce some additional load. -- You are receiving this mail because: You are the assignee for the bug.