extracted text with PDDocument doc = PDDocument.load(new URL( "http://people.ischool.berkeley.edu/~hearst/irbook/print/chap10.pdf")); PDFTextStripper stripper = new PDFTextStripper(); stripper.writeText(doc, new OutputStreamWriter(System.out));
looks like this ¡ ¢¤£¦¥¨§ª© ®©°¯±¢²§ª³ ´¶µ¸·¹¢º© » ¥¼µ½§ff·fi¥ffifl¼´²Â "!$#&%ª')(+* ,-%ª.ff/0%ff132"%ff45.ff6 ,-.7'84:97!;.7'< "!>=?.ª!>'fi*�...@b.c4®* ACM Press New York Addison-Wesley D)EGFIH J>KMLON8P$QRH ESPUTffVffWYXZE>TR[\PUQ]L_^`E>ababE>cedgfUahX;ijija ... best regards reinhard