[ https://issues.apache.org/jira/browse/PDFBOX-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950258#comment-17950258 ]
Tilman Hausherr commented on PDFBOX-6004: ----------------------------------------- We have SymbolEncoding which seems to be the same. Please share the PDF. > Support "SymbolSetEncoding" for fonts > ------------------------------------- > > Key: PDFBOX-6004 > URL: https://issues.apache.org/jira/browse/PDFBOX-6004 > Project: PDFBox > Issue Type: Improvement > Components: PDModel > Affects Versions: 2.0.33 > Reporter: Constantine Dokolas > Priority: Minor > > I've encountered a PDF with a font named "SymbolMT" which defines its > encoding as {{{}SymbolSetEncoding{}}}. Using the debugger app, I get a > warning ({{{}Warning [PDSimpleFont] Unknown encoding: SymbolSetEncoding{}}}) > and multiple {{No Unicode mapping ...}} warnings. > I couldn't find official documentation for this encoding, but the {{pdf.js}} > project has support for this encoding implemented [here > |https://github.com/mozilla/pdf.js/blob/6f052312d625224173db36d3e661657a89cf1865/src/core/encodings.js#L207] > and it looks correct at first glance. > Perhaps it's possible to support this encoding? > Notes > * I've not tested text extraction with PDFBox to see what codepoints are > generated, but Adobe Acrobat converts those codes to the box char (the > "unknown" char?) > * The pdfdebugger app font viewer says: "Encoding: BuiltInEncoding / built > in (TTF)" -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org