Hello Thamme, Agreed. Looking at the paper[1], it seems to me that tesseract and VGG models can co-exist in Tika to serve all kinds of input images.
I am able to run one of the models Deep Features for Text Spotting[2] by disabling the GPU. It however doesn't generate any text, but generates only features. The initial assumption that MATLAB version is creating an issue is thus proven wrong. The problem lies with the MatConvNet that is bundled with the models. It is a very old version which doesn't even resemble the current structure. I'm having problems to build it on my system for the other model, Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition[1]. Note that both of them are supplied with custom versions of MatConvNet. Nevertheless, we can build the system to use a latest version of MatConvNet by building it layer by layer looking at the MAT file[3]. I want to hear your views on whether or not I should attempt it. Thank you, Kranthi Kiran GV, CS 3/4 Undergrad, NIT Warangal [1] http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14c/jaderberg14c.pdf [2] http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14/jaderberg14.pdf.pdf [3] https://github.com/vlfeat/matconvnet/issues/239 On Wed, Apr 19, 2017 at 10:42 PM, Thamme Gowda <[email protected]> wrote: > Hi Kranthi, > > Thanks for updating us. > I believe in the long run both of these two models may co-exist (tesseract > for flat-bench scanner images with perfect lighting conditions, VGG models > for natural images taken by cellphone/digital cameras with weird > orientations and lighting conditions). > > I agree with you, we can make VGG OCR as an optional REST API and allow > users to agree their license if they want to use it. Thanks Luis for the > feedback :-) > > Keep up the good work and keep this email thread updated with your > findings. > > Thanks, > TG > > *--* > *Thamme Gowda* > TG | @thammegowda <https://twitter.com/thammegowda> > ~Sent via somebody's Webmail server! > > On Wed, Apr 19, 2017 at 6:12 AM, Kranthi Kiran G V < > [email protected]> wrote: > >> Hello community, >> I have successfully tested Tesseract 4.0 on various images of different >> sizes, orientation and lightening >> conditions. I would, in the next few days, publish the results on a blog >> for you to have a look at. >> >> Although I'm able to reliably measure the clock time, accuracy, etc, I am >> not able to come up with a method >> to reliably measure the memory consumed. Any pointers on this from the >> developer community would be >> appreciated. >> >> VGG group has two models released >> <http://www.robots.ox.ac.uk/~vgg/research/text/#sec-models>. I'm not able >> >> to test any as of now due to no back compatibility with >> the MatConvNet used. I use a recent version of MATLAB. As of now, I am >> trying to get around it by updating >> parts of the code. I'm also contacting the mainters of the repository to >> help me address the issues. >> I'm hopeful to run them. >> >> Addressing Luis' concern, we won't be building VGG's models into Tika' >> source. We would only be helping >> the user deploy a REST API to which Tika's OCR subsystem passes the images >> and retrieve the information >> in the form of a string. >> >> Thank you, >> Kranthi Kiran GV, >> CS 3/4 Undergrad, >> NIT Warangal >> >> On Tue, Apr 18, 2017 at 8:43 AM, Kranthi Kiran G V < >> [email protected]> wrote: >> >> > Hello Luis, >> > Yes, tesseract 4.0 is not yet a stable release. VGG group's model has a >> > 3-clause BSD license. >> > >> > I see it as a long term effort which would help the Tika's community >> > experience near state of art OCR. >> > >> > This is an investigation into it to see if we can try out this >> direction. >> > Thanks for expressing your views. >> > >> > Thank you, >> > Kranthi Kiran GV >> > >> > On Apr 18, 2017 2:44 AM, "Luís Filipe Nassif" <[email protected]> >> wrote: >> > >> > Hi Kranthi, >> > >> > That is an interesting comparison! But I think Tesseract 4.0 is still >> > alpha? And do you know the VGG software license? >> > >> > Best, >> > Luis >> > >> > Em 17 de abr de 2017 8:46 AM, "Kranthi Kiran G V" < >> > [email protected]> escreveu: >> > >> > Hello Tim Allison, >> > >> > I am currently working on improving Tika's OCR capabilities. >> > After suggestion from Thamme Gowda (@thammegowda >> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name >> =thammegowda >> > >), >> > I started to work on comparison of Tesseract 4.0's neural network >> > <https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsI >> nTesseract4.00 >> > > >> > subsystem and Visual Geometry Group's (VGG) models >> > <http://www.robots.ox.ac.uk/~vgg/research/text/>. >> > >> > It would be great if you provide the dataset to test the OCR as you >> > mentioned in one of the issues. >> > >> > I would be comparing their running time for evaluation, accuracy, memory >> > consumed and invariance to lighting, orientation, etc. And then I would >> be >> > integrating the appropriate models into Tika's OCR. >> > >> > Thank you, >> > Kranthi Kiran GV, >> > CS 3/4 Undergrad, >> > NIT Warangal >> > >> > >> > >> > >
