[
https://issues.apache.org/jira/browse/PDFBOX-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562124#comment-16562124
]
Tilman Hausherr commented on PDFBOX-4280:
-----------------------------------------
I was able to extract the file from the command line with the 1.8 version by
using "-encoding utf8". If you are doing this from java code, use
{{PDFTextStripper("utf8")}} as constructor.
Does this work for you?
Re 2.0.11: If your file is not too big, call {{new PDFParser(new
RandomAccessBuffer(IOUtils.toByteArray()))}}. Don't forget to close the input
stream after that. Or make your life easy and just call PDDocument.load() on
your stream instead of constructing a PDFParser yourself which has been
obsolete for years but is in many third party "tutorials".
> PDFbox extracts checkboxes as question marks '?'
> ------------------------------------------------
>
> Key: PDFBOX-4280
> URL: https://issues.apache.org/jira/browse/PDFBOX-4280
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox, Parsing, Text extraction
> Affects Versions: 1.8.11
> Reporter: fayaz baig
> Priority: Major
> Attachments: Apache pdfbox issue.docx, test.pdf
>
>
> Hello,
> When i try to extract the checkbox details frfom the pdf, it extracts as a ?
> instead of ☒ or ☐.
> Attached document contains the details.
>
> Please write to [[email protected]|mailto:[email protected]] for anymore
> clarifications required.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]