extracted text with

PDDocument doc = PDDocument.load(new URL(
                           
"http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf";));
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(doc, new OutputStreamWriter(System.out));

looks like this

¡ ¢¤£¦¥¨§ª© ­®©°¯±¢²§ª³ ´¶µ¸·¹¢º© » ¥¼µ½§ff·fi¥ffifl¼´²Â
 "!$#&%ª')(+* ,-%ª.ff/0%ff132"%ff45.ff6
,-.7'84:97!;.7'< "!>=?.ª!>'fi*�...@b.c4®*
ACM Press
New York
Addison-Wesley
D)EGFIH J>KMLON8P$QRH ESPUTffVffWYXZE>TR[\PUQ]L_^`E>ababE>cedgfUahX;ijija
...

best regards
reinhard

Reply via email to