Package: pdftk-java Version: 3.3.3-2 Control: affects -1 exiftool
When updating the Info-dictionary date fields, pdftk-java encodes the date string in UTF-16BE with BOM instead of ASCII (or PDFDocEncoding). This causes an interoperability issue with exiftool, which does not normalize the dates into a human-readable form. Grab a sample PDF file, say, https://pdfobject.com/pdf/sample.pdf, and try to update the creation date there with either update_info or update_info_utf8 (for our purposes, pick any): $ pdftk sample.pdf update_info <(echo -e "InfoBegin\nInfoKey: CreationDate\nInfoValue: D:199812231952-08'00'") output sample_with_date.pdf Exiftool shows what's there, but the creation date is not in a human-readable form, whereas the original modification date is way more readable: $ exiftool -a -G sample_with_date.pdf | grep "PDF.*Date" [PDF] Modify Date : 2008:07:01 05:24:47Z [PDF] Create Date : D:199812231952-08'00' The culprit is the encoding of the date in UTF-16BE, starting with the byte-order mark FE FF: $ mutool show sample_with_date.pdf trailer/Info | grep Date /ModDate (D:20080701052447Z00'00') /CreationDate <FEFF0044003A003100390039003800310032003200330031003900350032002D003000380027003000300027> $ xxd sample_with_date.pdf | grep -A3 "Dat" 000045c0: 2028 5061 6765 7329 0a2f 4d6f 6444 6174 (Pages)./ModDat 000045d0: 6520 2844 3a32 3030 3830 3730 3130 3532 e (D:20080701052 000045e0: 3434 375a 3030 2730 3027 290a 2f43 7265 447Z00'00')./Cre 000045f0: 6174 696f 6e44 6174 6520 28fe ff00 4400 ationDate (...D. 00004600: 3a00 3100 3900 3900 3800 3100 3200 3200 :.1.9.9.8.1.2.2. 00004610: 3300 3100 3900 3500 3200 2d00 3000 3800 3.1.9.5.2.-.0.8. 00004620: 2700 3000 3000 2729 0a2f 5072 6f64 7563 '.0.0.')./Produc Instead of UTF-16BE encoding beginning with FE FF, CreationDate should be written as an ASCII string: /CreationDate (D:199812231952-08'00') In the PDF spec 1.3, https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.3.pdf, Table 3.21, a date is a string, and a string is specified as the beginning of § 3.2.3 as a series of bytes—unsigned integer values in the range 0 to 255. In the PDF spec 1.7, https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf, Table 34, a date is an ASCII string. In the PDF spec 2.0, https://developer.adobe.com/document-services/docs/assets/5b15559b96303194340b99820d3a70fa/PDF_ISO_32000-2.pdf, Table 35, a date is also an ASCII string. Though § 7.9.4 in specs 1.7 and 2.0 say that the date is a text string, text strings can be PDFDocEncoded, and PDFDocEncoding contains printable ASCII. The aforementioned exiftool output demonstrates inconsistent encoding of date fields within the same Info dictionary and reduced interoperability when UTF-16BE is used. Requested change: pdftk-java should write Info date fields using printable ASCII (PDFDocEncoding subset) instead of UTF-16BE. Since the PDF date format uses only printable ASCII characters, UTF-16BE encoding is unnecessary and results in inconsistent encoding and reduced interoperability with some PDF tools. This appears to be an upstream pdftk-java behavior rather than a Debian packaging issue.

