Hi,

yes, Lucene is not for OCR. We are using another library for OCR. But we
need to get the some source for Lucene. Thanks for the link, I'll take  a
look at them.

Bye,
Deniz


On Thu, Jan 16, 2014 at 10:05 PM, Allison, Timothy B. <talli...@mitre.org>wrote:

> To confirm, Lucene does not perform OCR.  (If you are looking for open
> source java ocr packages, you might take a look here for some ideas:
> https://issues.apache.org/jira/i#browse/TIKA-93).  Are you trying to find
> a corpus of noisy OCR'd text to use as input into Lucene?  If so, this
> looks potentially useful: http://chroniclingamerica.loc.gov/ocr/. Don't
> know how well its error rates match yours...
>
> -----Original Message-----
> From: Deniz Atak [mailto:deniza...@gmail.com]
> Sent: Thursday, January 16, 2014 2:43 PM
> To: java-user@lucene.apache.org
> Subject: Sample Data to Test Lucene
>
> Hi,
>
> we are new to Lucene. We would like to use Lucene for our archive project.
> In this project we have to get some images of documents, get text out of
> them via OCR and index them using Lucene. In order to see if Lucene is
> suitable for our project we need to test Lucene with sample data. But we
> need huge data set that is composed of images of documents. I searched the
> net but couldn't find something. Could anyone suggest something about this
> issue?
>
> Thanks in advance,
>
> --
> Deniz
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Deniz

Reply via email to