Hi, yes, Lucene is not for OCR. We are using another library for OCR. But we need to get the some source for Lucene. Thanks for the link, I'll take a look at them.
Bye, Deniz On Thu, Jan 16, 2014 at 10:05 PM, Allison, Timothy B. <talli...@mitre.org>wrote: > To confirm, Lucene does not perform OCR. (If you are looking for open > source java ocr packages, you might take a look here for some ideas: > https://issues.apache.org/jira/i#browse/TIKA-93). Are you trying to find > a corpus of noisy OCR'd text to use as input into Lucene? If so, this > looks potentially useful: http://chroniclingamerica.loc.gov/ocr/. Don't > know how well its error rates match yours... > > -----Original Message----- > From: Deniz Atak [mailto:deniza...@gmail.com] > Sent: Thursday, January 16, 2014 2:43 PM > To: java-user@lucene.apache.org > Subject: Sample Data to Test Lucene > > Hi, > > we are new to Lucene. We would like to use Lucene for our archive project. > In this project we have to get some images of documents, get text out of > them via OCR and index them using Lucene. In order to see if Lucene is > suitable for our project we need to test Lucene with sample data. But we > need huge data set that is composed of images of documents. I searched the > net but couldn't find something. Could anyone suggest something about this > issue? > > Thanks in advance, > > -- > Deniz > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Deniz