[
https://issues.apache.org/jira/browse/PDFBOX-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Constantine Dokolas updated PDFBOX-6004:
----------------------------------------
Description:
I've encountered a PDF with a font named "SymbolMT" which defines its encoding
as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a warning
({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) and
multiple {{No Unicode mapping ...}} warnings.
I couldn't find official documentation for this encoding, but the {{pdf.js}}
project has support for this encoding implemented [here
|https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
and it looks correct at first glance.
Perhaps it's possible to support this encoding?
Notes
* I've not tested text extraction with PDFBox to see what codepoints are
generated, but Adobe Acrobat converts those codes to the box char (the
"unknown" char?)
* The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built in
(TTF)"
was:
I've encountered a PDF with a font named "SymbolMT" which defines its encoding
as `SymbolSetEncoding`. Using the debugger app, I get a warning (`Warning
[PDSimpleFont] Unknown encoding: SymbolSetEncoding`) and multiple `No Unicode
mapping ...` warnings.
I couldn't find official documentation for this encoding, but the `pdf.js`
project has support for this encoding implemented [here
|https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
and it looks correct at first glance.
Perhaps it's possible to support this encoding?
Note: I've not tested text extraction with PDFBox to see what codepoints are
generated, but Adobe Acrobat converts those codes to the box char (the
"unknown" char?)
> Support "SymbolSetEncoding" for fonts
> -------------------------------------
>
> Key: PDFBOX-6004
> URL: https://issues.apache.org/jira/browse/PDFBOX-6004
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.33
> Reporter: Constantine Dokolas
> Priority: Minor
>
> I've encountered a PDF with a font named "SymbolMT" which defines its
> encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a
> warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}})
> and multiple {{No Unicode mapping ...}} warnings.
> I couldn't find official documentation for this encoding, but the {{pdf.js}}
> project has support for this encoding implemented [here
> |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207]
> and it looks correct at first glance.
> Perhaps it's possible to support this encoding?
> Notes
> * I've not tested text extraction with PDFBox to see what codepoints are
> generated, but Adobe Acrobat converts those codes to the box char (the
> "unknown" char?)
> * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built
> in (TTF)"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]