Re: OCR with tika-server

kevin slote Tue, 30 Sep 2014 13:01:00 -0700

I am working on ubuntu 10.4. and I am having some trouble.
Tesseract is installed correctly, but just doing a clone from the repo and
installing with maven, I am getting some errors.

This is before I did anything with tesseract installed.

Failed tests:   testPPTXOCR(org.apache.tika.parser.ocr.TesseractOCRTest):
Check for the image's text.
  testDOCXOCR(org.apache.tika.parser.ocr.TesseractOCRTest)
  testPDFOCR(org.apache.tika.parser.ocr.TesseractOCRTest)

Next I hard coded the tesseractPath:

I went into the TesseractOCRConfig.java and hard coded 'tesseractPath.'
The all tests passed and it built successfully, but then I went to post
some tiff's to the server.
That didn't work. So I tried adding some System.out.println("hello world")
 (a little crude I know) inside the unit tests to confirm that tesseract
was working correctly.  It looks like something happens in the unit test in
TesseractOCRTest.java
on the line that says TesseractOCRConfig config = new
TesseractOCRConfig();. Printing to stdout before works, but I get nothing
after. That happens before the assumeTrue(canRun(config));. So an exception
is not get raised.

Then once everything is built, ocr does not work.  That was why I figured I
would ask to see if I missed some sort of configuration step in building it.

Thanks a ton.

On Tue, Sep 30, 2014 at 2:57 PM, Mattmann, Chris A (3980) <
[email protected]> wrote:

> Dear Kevin,
>
> Sure, it already works :) 1.7-SNAPSHOT.
>
> See this wiki page:
>
> https://wiki.apache.org/tika/TikaOCR
>
> I¹d be happy to discuss more.
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: kevin slote <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Tuesday, September 30, 2014 at 8:52 AM
> To: "[email protected]" <[email protected]>
> Subject: OCR with tika-server
>
> >Hello all,
> >
> >I have been testing out the integration of tika with tesseract.
> >I was wondering if there is  a way to get tika-server to run with
> >tesseract's OCR capabilities?
> >
> >Best
> >
> >Kevin Slote
>
>

Re: OCR with tika-server

Reply via email to