RE: Sample Data to Test Lucene

Allison, Timothy B. Thu, 16 Jan 2014 12:06:39 -0800

To confirm, Lucene does not perform OCR.  (If you are looking for open source 
java ocr packages, you might take a look here for some ideas: 
https://issues.apache.org/jira/i#browse/TIKA-93).  Are you trying to find a 
corpus of noisy OCR'd text to use as input into Lucene?  If so, this looks 
potentially useful: http://chroniclingamerica.loc.gov/ocr/. Don't know how well 
its error rates match yours...
 
-----Original Message-----
From: Deniz Atak [mailto:[email protected]] 
Sent: Thursday, January 16, 2014 2:43 PM
To: [email protected]
Subject: Sample Data to Test Lucene


Hi,

we are new to Lucene. We would like to use Lucene for our archive project.
In this project we have to get some images of documents, get text out of
them via OCR and index them using Lucene. In order to see if Lucene is
suitable for our project we need to test Lucene with sample data. But we
need huge data set that is composed of images of documents. I searched the
net but couldn't find something. Could anyone suggest something about this
issue?

Thanks in advance,

-- 
Deniz

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Sample Data to Test Lucene

Reply via email to