Hi
I wrote a code few month ago. It works well with latin font. But do not extract
russian, and seems any not-latin symbols. There is a code:
PdfReader reader = new PdfReader(is, "".getBytes());
PRTokeniser token;
byte[] m = reader.getMetadata();
String sm = new String(m);
HashMap map = reader.getInfo();
StringBuilder builder = new StringBuilder();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
byte[] pageBytes = reader.getPageContent(i);
if (pageBytes != null) {
token = new PRTokeniser(pageBytes);
while (token.nextToken()) {
if (token.getTokenType() == PRTokeniser.TK_STRING) {
builder.append(token.getStringValue() + " ");
}
}
}
}
Thats all
Give me any kink.
Thanks
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php