[
https://issues.apache.org/jira/browse/PDFBOX-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alberto Sampayo updated PDFBOX-2061:
------------------------------------
Description:
Hi, when trying to extract text from a PDF with encoding indentiti-h, result is
ÙÑÎ×ß ÍÛÝÝ×ÑÒßÒ. Try set fonts and return EmptyStackException:
{code}
Map<String,PDFont> pageFonts = null;
List<PDPage> pages = pdDoc.getDocumentCatalog().getAllPages();
for(PDPage page : pages) {
pageFonts=page.getResources().getFonts();
}
pdfStripper.setFonts(pageFonts);
parsedText = pdfStripper.getText(pdDoc);
{code}
was:
Hi, when trying to extract text from a PDF with encoding indentiti-h
{code}
Map<String,PDFont> pageFonts = null;
List<PDPage> pages = pdDoc.getDocumentCatalog().getAllPages();
for(PDPage page : pages) {
pageFonts=page.getResources().getFonts();
}
pdfStripper.setFonts(pageFonts);
{code}
> java.util.EmptyStackException in PDFTextStripper setFonts
> ---------------------------------------------------------
>
> Key: PDFBOX-2061
> URL: https://issues.apache.org/jira/browse/PDFBOX-2061
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.0
> Reporter: Alberto Sampayo
>
> Hi, when trying to extract text from a PDF with encoding indentiti-h, result
> is ÙÑÎ×ß ÍÛÝÝ×ÑÒßÒ. Try set fonts and return EmptyStackException:
> {code}
> Map<String,PDFont> pageFonts = null;
> List<PDPage> pages = pdDoc.getDocumentCatalog().getAllPages();
> for(PDPage page : pages) {
> pageFonts=page.getResources().getFonts();
> }
> pdfStripper.setFonts(pageFonts);
> parsedText = pdfStripper.getText(pdDoc);
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)