[ 
https://issues.apache.org/jira/browse/PDFBOX-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105104#comment-16105104
 ] 

ASF subversion and git services commented on PDFBOX-3881:
---------------------------------------------------------

Commit 1803283 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1803283 ]

PDFBOX-3881: don't keep BOM for empty strings, as suggested by Nico Prenzel

> Handling of Byte Order Mark with Metadata-Fields
> ------------------------------------------------
>
>                 Key: PDFBOX-3881
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3881
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.7
>         Environment: Windows
>            Reporter: Nico Prenzel
>            Assignee: Tilman Hausherr
>            Priority: Minor
>         Attachments: ERiCDruck_23776162_ESt_0_20170727_121644-pdfcreator.pdf
>
>
> PDDocumentInformation e.g. getAuthor() honors the byte order of the extracted 
> string and removes the byte order mark signs.
> But if the extracted string does only contain the byte order mark signs the 
> corresponding string "þÿ" is returned.
> Is this the intended solution?
> I'd appreciate to remove the byte order mark signs also, if the extracted 
> string does only contain these signs.
> Problematic code:
> {code:java}
> public String getString()
>   {
>   if (this.bytes.length > 2)
>     {
>       if (((this.bytes[0] & 0xFF) == 254) && ((this.bytes[1] & 0xFF) == 255))
>       {
>         return new String(this.bytes, 2, this.bytes.length - 2, 
> Charsets.UTF_16BE);
>       }
>       if (((this.bytes[0] & 0xFF) == 255) && ((this.bytes[1] & 0xFF) == 254))
>       {
>         return new String(this.bytes, 2, this.bytes.length - 2, 
> Charsets.UTF_16LE);
>       }
>     }
>     
>     return PDFDocEncoding.toString(this.bytes);
>   }
> {code}
> Attachment has an example pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to