[ 
https://issues.apache.org/jira/browse/PDFBOX-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969749#comment-14969749
 ] 

Tilman Hausherr commented on PDFBOX-3044:
-----------------------------------------

{quote}
Ok, so you'd prefer to wait until after 2.0 is released until we change any 
text extraction logic?
{quote}
Yes.

{quote}
Hopefully we can make the files UTF-8 independent of that as it shouldn't 
change the results of the tests at all and will still be helpful for debugging. 
We could make that change on both the 1.8 and 2.0 branches if it's helpful for 
comparison purposes
{quote}
Yes, that can be done before.

> Test files character encoding
> -----------------------------
>
>                 Key: PDFBOX-3044
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3044
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Ben McCann
>
> The files in pdfbox/src/test/resources/input all seem to be UTF16 encoded. 
> I'm having a really difficult time using these files with the tools that I 
> typically use (git, meld, etc.)  Would it be possible to change the encoding 
> to UTF8?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to