[ 
https://issues.apache.org/jira/browse/PDFBOX-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121207#comment-16121207
 ] 

Nico Prenzel commented on PDFBOX-3881:
--------------------------------------

Thanks for solving this weak spot. I've tested the fix with the latest 
snapshot. Thanks

> Handling of Byte Order Mark with Metadata-Fields
> ------------------------------------------------
>
>                 Key: PDFBOX-3881
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3881
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.7
>         Environment: Windows
>            Reporter: Nico Prenzel
>            Assignee: Tilman Hausherr
>            Priority: Minor
>              Labels: BOM
>             Fix For: 2.0.8, 3.0.0
>
>         Attachments: ERiCDruck_23776162_ESt_0_20170727_121644-pdfcreator.pdf
>
>
> PDDocumentInformation e.g. getAuthor() honors the byte order of the extracted 
> string and removes the byte order mark signs.
> But if the extracted string does only contain the byte order mark signs the 
> corresponding string "þÿ" is returned.
> Is this the intended solution?
> I'd appreciate to remove the byte order mark signs also, if the extracted 
> string does only contain these signs.
> Problematic code:
> {code:java}
> public String getString()
>   {
>   if (this.bytes.length > 2)
>     {
>       if (((this.bytes[0] & 0xFF) == 254) && ((this.bytes[1] & 0xFF) == 255))
>       {
>         return new String(this.bytes, 2, this.bytes.length - 2, 
> Charsets.UTF_16BE);
>       }
>       if (((this.bytes[0] & 0xFF) == 255) && ((this.bytes[1] & 0xFF) == 254))
>       {
>         return new String(this.bytes, 2, this.bytes.length - 2, 
> Charsets.UTF_16LE);
>       }
>     }
>     
>     return PDFDocEncoding.toString(this.bytes);
>   }
> {code}
> Attachment has an example pdf



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to