[
https://issues.apache.org/jira/browse/PDFBOX-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler closed PDFBOX-139.
-------------------------------------
Resolution: Fixed
Assignee: Andreas Lehmkühler
The CMap-Parser was improved a long time ago and now all mentioned operators
are supported.
Set to closed.
> The CMapParser does not recognize essential cmap operators
> ----------------------------------------------------------
>
> Key: PDFBOX-139
> URL: https://issues.apache.org/jira/browse/PDFBOX-139
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Assignee: Andreas Lehmkühler
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1438028
> Originally submitted by vdimchev on 2006-02-24 03:48.
> The bug is directly related to the following bug I
> discovered in the database:
> [ 1208652 ] PDFTextStripper.writeText Exception:Unknown
> encoding for ..
> I'll try to exlain it again here and supply enough
> resources for its fix.
> The problem is that the current implementation of
> CMapParser class supports only the beginbfchar and
> beginbfrange operators.
> This is not enough and causes the invokation to
> PDFTextStripper.writeText() to throw IOException with
> the following message: Unknown encoding for 'Identity-
> V'.
> I also managed to produce the message: "Unknown
> encoding for '90ms-RKSJ-H'.
> The complete stacktrace is:
> java.io.IOException: Unknown encoding for 'Identity-V'
> at org.pdfbox.encoding.EncodingManager.
> getEncoding(EncodingManager.java:83)
> at org.pdfbox.pdmodel.font.PDFont.
> getEncoding(PDFont.java:627)
> at org.pdfbox.pdmodel.font.PDFont.
> encode(PDFont.java:476)
> at org.pdfbox.util.PDFStreamEngine.
> showString(PDFStreamEngine.java:332)
> at org.pdfbox.util.operator.ShowText.
> process(ShowText.java:66)
> at org.pdfbox.util.PDFStreamEngine.
> processOperator(PDFStreamEngine.java:494)
> at org.pdfbox.util.PDFStreamEngine.
> processSubStream(PDFStreamEngine.java:207)
> at org.pdfbox.util.PDFStreamEngine.
> processStream(PDFStreamEngine.java:160)
> at org.pdfbox.util.PDFTextStripper.
> processPage(PDFTextStripper.java:355)
> at org.pdfbox.util.PDFTextStripper.
> processPages(PDFTextStripper.java:268)
> at org.pdfbox.util.PDFTextStripper.
> writeText(PDFTextStripper.java:220)
> In fact the cause of this exception is that the
> CMapParser does not recognize the begincidchar and
> begincidrange operators (in the case of the 90ms-RKSJ-
> H) encoding and usecmap operator in the case of
> Identity-V encoding.
> The cmap files for these encodings are not properly
> parsed and the corresponding Cmap objects do not
> contain neither one nor two byte mappings, further the
> lookup() method returns null.
> I'll attach two samples for the 90ms-RKSJ-H encoding
> and one for the Identity-V encoding.
> I'll attach cmap reference also.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168711
> 5014.CIDFont_Spec.rar (application/octet-stream), 240282 bytes
> Reference, containing CMAP description
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168709
> ken1.pdf (application/pdf), 33713 bytes
> The Identity-V sample
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168708
> tp0404-2a.pdf (application/pdf), 11434 bytes
> The second 90ms-RKSJ-H sample
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1438028&file_id=168705
> nan_youkou.pdf (application/pdf), 7663 bytes
> The first 90ms-RKSJ-H sample
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira