[
https://issues.apache.org/jira/browse/PDFBOX-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105097#comment-16105097
]
Tilman Hausherr commented on PDFBOX-3881:
-----------------------------------------
We're conforming to the PDF specification:
{quote}
Conforming readers that process PDF files containing Unicode text strings shall
be prepared to handle supplementary characters; that is, characters requiring
more than two bytes to represent.
{quote}
But considering that Adobe Reader does not display anything (even if I delete
/Metadata and keep /Info only), I'll just change the ">" to ">=".
> Handling of Byte Order Mark with Metadata-Fields
> ------------------------------------------------
>
> Key: PDFBOX-3881
> URL: https://issues.apache.org/jira/browse/PDFBOX-3881
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.7
> Environment: Windows
> Reporter: Nico Prenzel
> Assignee: Tilman Hausherr
> Priority: Minor
> Attachments: ERiCDruck_23776162_ESt_0_20170727_121644-pdfcreator.pdf
>
>
> PDDocumentInformation e.g. getAuthor() honors the byte order of the extracted
> string and removes the byte order mark signs.
> But if the extracted string does only contain the byte order mark signs the
> corresponding string "þÿ" is returned.
> Is this the intended solution?
> I'd appreciate to remove the byte order mark signs also, if the extracted
> string does only contain these signs.
> Problematic code:
> {code:java}
> public String getString()
> {
> if (this.bytes.length > 2)
> {
> if (((this.bytes[0] & 0xFF) == 254) && ((this.bytes[1] & 0xFF) == 255))
> {
> return new String(this.bytes, 2, this.bytes.length - 2,
> Charsets.UTF_16BE);
> }
> if (((this.bytes[0] & 0xFF) == 255) && ((this.bytes[1] & 0xFF) == 254))
> {
> return new String(this.bytes, 2, this.bytes.length - 2,
> Charsets.UTF_16LE);
> }
> }
>
> return PDFDocEncoding.toString(this.bytes);
> }
> {code}
> Attachment has an example pdf
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]