The PDF reference doesn't define any encoding, just bytes. In a german locale Acrobat will encode as cp1252 and in a japanese one maybe SJIS, iText uses 8859-1 as it could use anything. getISOBytes() is used to quick convert chars that are known to be in range, it's not a generic converter, for that there are other methods in Java and in PdfEncodings.
Paulo ----- Original Message ----- From: "Michael Bell" <[email protected]> To: <[email protected]> Sent: Saturday, August 29, 2009 7:46 AM Subject: [iText-questions] PDF Encryption 1. The javadocs don't define the character set of the encoding for the byte[] signature to PDFEncryptor.encrypt. Of course a byte array without a character encoding makes no sense. What is the acceptable character encodings (see #2) ? This should be corrected. 2. Cracking open the code, I see the String method calls getISOBytes()...... public static void encrypt(PdfReader reader, OutputStream os, int type, String userPassword, String ownerPassword, int permissions) throws DocumentException, IOException { PdfStamper stamper = new PdfStamper(reader, os); stamper.setEncryption(type, userPassword, ownerPassword, permissions); stamper.close(); } which leads to ... (PDFStamper class) public void setEncryption(boolean strength, String userPassword, String ownerPassword, int permissions) throws DocumentException { setEncryption(DocWriter.getISOBytes(userPassword), DocWriter.getISOBytes(ownerPassword), permissions, strength); } ... Here is the source of getISOBytes() /** Converts a <CODE>String</CODE> into a <CODE>Byte</CODE> array * according to the ISO-8859-1 codepage. * @param text the text to be converted * @return the conversion result */ public static final byte[] getISOBytes(String text) { if (text == null) return null; int len = text.length(); byte b[] = new byte[len]; for (int k = 0; k < len; ++k) b[k] = (byte)text.charAt(k); return b; } a. The javadoc comment is wrong. This does not guarantee an ISO-8859-1 conversion, it merely tries to convert to a SINGLE byte representation of the characters. That MIGHT be ISO-8859-1. It might not be - there are tons of single byte character sets. b. If you feed it unicode, it will drop every other byte (as a side effect it will of course corrupt surrogate pairs horribly) public static void main(String[] args) { String a = "\u2345"; System.out.println(getISOBytes(a).length); // returns 1 System.out.println(convertToHex(getISOBytes(a)); // returns x45 } Am I missing something? I'm concerned not just in the context of a password, but because the iText source code has getISOBytes all over the place (perhaps correctly so in some or most cases - I've not checked, just did a quick search) ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
