Hi John, Here is the progress of OCR Plugin for PDFBox. Project consists of two sub projects
1. Tesseract API for java 2. OCR Plugin for PDFBox using Tesseract API *Tesseract API [1]* 1. Currently all necessary functions were implemented and test cases were written in order to check proper functionality 2. Support for Mac and linux operating systems. In future I'll try to add support for Windows also 3. All static libs for Tesseract and Leptonica were pre built and added to resources folder. 4. At build phase it dynamically identify correct libs that support to particular Operating system 5. If some one needs to build above static libs manually, instructions were given in readme. 6. In future, I'll work on adding those static libs creation when project is built. Currently they must be manually built. *OCR plugin [2]* 1. Almost finished implementing. 2. Working fine with sample PDF files I have given. Is there any set of PDF files that can be used to test accuracy and performance? In addition to that, there are some code formatting and commenting stuff to be done. [1] https://github.com/DImuthuUpe/Tesseract-API [2] https://github.com/DImuthuUpe/OCR-Plugin -- Regards W.Dimuthu Upeksha Undergraduate Department of Computer Science And Engineering University of Moratuwa, Sri Lanka
