[ https://issues.apache.org/jira/browse/PDFBOX-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guillaume Bailleul updated PDFBOX-1374: --------------------------------------- Attachment: PDFBoxLoader.java It seems that the problem is in pdfbox. Run the linked example (PDFBoxLoader). The PDF Document information is encoded, the xmp part is in clear,this command gives the bytes values of the end of dc:title (Rogator???<) $ od -t x1 -j 4191122 -N 11 AA.pdf 17771622 52 6f 67 61 74 6f 72 e2 80 a6 3c e28006 is the UTF8 representation of HORIZONTAL ELLIPSIS (unicode U+2026) With an utf8 console, the value of dc.getTitle is correct : Microsoft Word - LA_LAN01-#230492-v1-j2-Zilker_-_Motion_for_Letters_Rogator… The problem comes from the retrieval of title in PDDocumentInformation. Any link with other encoding problems ? > Error On MetaData: Title > ------------------------ > > Key: PDFBOX-1374 > URL: https://issues.apache.org/jira/browse/PDFBOX-1374 > Project: PDFBox > Issue Type: Bug > Components: Preflight, XmpBox > Affects Versions: 1.7.1, 1.8.0 > Environment: Linux > Reporter: William Fausser > Fix For: 1.8.0 > > Attachments: AA.pdf, PDFBoxLoader.java > > > The file/home/fausser/AA.pdf is not valid, error(s) : > 7.2 : Error on MetaData, Title present in the document catalog dictionary > doesn't match with XMP information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira