Hi Kranthi, That is an interesting comparison! But I think Tesseract 4.0 is still alpha? And do you know the VGG software license?
Best, Luis Em 17 de abr de 2017 8:46 AM, "Kranthi Kiran G V" < kkran...@student.nitw.ac.in> escreveu: Hello Tim Allison, I am currently working on improving Tika's OCR capabilities. After suggestion from Thamme Gowda (@thammegowda <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=thammegowda>), I started to work on comparison of Tesseract 4.0's neural network <https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00> subsystem and Visual Geometry Group's (VGG) models <http://www.robots.ox.ac.uk/~vgg/research/text/>. It would be great if you provide the dataset to test the OCR as you mentioned in one of the issues. I would be comparing their running time for evaluation, accuracy, memory consumed and invariance to lighting, orientation, etc. And then I would be integrating the appropriate models into Tika's OCR. Thank you, Kranthi Kiran GV, CS 3/4 Undergrad, NIT Warangal