[ 
https://issues.apache.org/jira/browse/PDFBOX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Koegler updated PDFBOX-945:
----------------------------------

    Attachment: 05-change-encoding.patch

Patch 05:

PDStream.getInputStreamAsString() convers an byte array to a string, assuming 
that the byte data has plattform encoding.

The content of the Stream are PDF (parts), so its likely in ISO-8859-1.

For the caller, the encoding is irrelevant, as it returns a String. 


> PDFBOX may not depend on plattform encoding
> -------------------------------------------
>
>                 Key: PDFBOX-945
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-945
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>            Reporter: Martin Koegler
>         Attachments: 01-static-init-encoding.patch, 02-encoding.patch, 
> 03-standard-lf.patch, 04-xml-encoding.patch, 05-change-encoding.patch
>
>
> The pdf specification states, that PDFs use an ASCII compatible, 8 bit 
> characterset.
> PDFBOX uses on various places the plattform encoding to convert bytes from/to 
> strings.
> On JREs with an non ASCII compatible plattform encoding (there are such 
> systems out there), this will break pdfbox.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to