[iText-questions] Can't extract russian text from pdf

Сергій Карпенко Thu, 11 Sep 2008 00:45:45 -0700

Hi    
    
I wrote a code few month ago. It works well with latin font. But do not extract 
russian, and seems any not-latin symbols. There is a code:    
    
  PdfReader reader = new PdfReader(is, "".getBytes());    
  PRTokeniser token;    
  byte[] m = reader.getMetadata();    
  String sm = new String(m);    
  HashMap map = reader.getInfo();    
  StringBuilder builder = new StringBuilder();    
    
  for (int i = 1; i <= reader.getNumberOfPages(); i++) {    
    byte[] pageBytes = reader.getPageContent(i);    
    if (pageBytes != null) {    
      token = new PRTokeniser(pageBytes);    
      while (token.nextToken()) {    
        if (token.getTokenType() == PRTokeniser.TK_STRING) {    
          builder.append(token.getStringValue() + " ");    
        }    
      }    
    }    
  }    
    
    
    
Thats all    
Give me any kink.    
    
Thanks

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions


Buy the iText book: http://www.1t3xt.com/docs/book.php

[iText-questions] Can't extract russian text from pdf

Reply via email to