FWD’ing to the Tika list (note TO: address change)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From: Ravi Gadapa <[email protected]>
Date: Monday, June 19, 2017 at 8:56 PM
To: "[email protected]" <[email protected]>
Subject: Tesseract - OCR and Tika

I have been using it for our project and i seem to have problem extracting the 
data from pdf documents. Below is the sample how it extracts.

'EldAJ. iNEIWEI‘IEI ‘IVHG El‘c'l TIVHS SEIHOJJMS TIV "8
'NOILVGNEIWINOOEIEI ElElElfliOVdflNVW iNEIWdIflOEI ElElcl SV 3|in EIWVN 
S.J_NE|V\ld|flOE| NO GEISVEI EIEI TIVHS HOJJMS iOEINNOOSIG iNEIWdIflOEI HO:| 
EIZIS ElSflzl TIV 'Z
'GEliON EISIMEIEIHLO SSEI‘INH ‘EldAJ. EltlflSO‘IONEI HS VINEIN NI EIEI TIVHS 
SEIHOJJMS iOEINNOOSIG HOOGiflO TIV 'L


Any suggestions

Thanks

Reply via email to