[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679513#comment-17679513
]
Vladimir Plizga commented on PDFBOX-4189:
-----------------------------------------
The current implementation is capable of parsing virtually any GSUB table.
However, in practice it allows to read GSUB data from fonts with Bengali
language only because this is the only language currently
[listed|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/model/Language.java#L36]
in the *{{org.apache.fontbox.ttf.model.Language}}* enumeration. From a FontBox
user's point of view this is a critical limitation because inspection of
various OTF tables is one of main use cases for the library.
The JavaDoc comment on the *{{org.apache.fontbox.ttf.model.Language}}*
enumeration
[states|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/model/Language.java#L25]
that:
?? In order to support a new language, you need to add it here and then
implement the {{GsubWorker}} for the given language and return the same from
the {{GsubWorkerFactory.getGsubWorker(org.apache.fontbox.ttf.CmapLookup,
GsubData)}}??
These instructions make sense for a complete support of a new language, i.e.
including the ability to substitute glyphs with each other according to the
language rules (like the implementation in this issue does). However, for a
pure reading purposes this is not necessary. Unfortunately, simple adding of a
new enumeration entry doesn't work because
*{{GsubWorkerFactory}}*
[throws|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/gsub/GsubWorkerFactory.java#L40]
an exception for any language which is not explicitly supported (i.e. any one
except Bengali).
In order to allow reading GSUB table data for any script tag without
implementing a fully fledged {{{}GsubWorker{}}}, I'm proposing a (mostly)
additive change to GSUB handling: [https://github.com/apache/pdfbox/pull/153]
A more detailed explanation of the change is in the PR description.
I would be glad to see your feedback on this proposal.
> Enable PDF creation with Indian languages, by reading and utilizing the GSUB
> table
> ----------------------------------------------------------------------------------
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
> Issue Type: New Feature
> Components: FontBox, PDModel
> Reporter: Palash Ray
> Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf,
> BengaliPdfGenerationHelloWorld.java, bengali-example.pdf,
> bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf,
> bengali-word-lohit-good.pdf, committed.patch, pdf-output.png, screenshot.png
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph
> substitution. The GSUB table has been read and used effectively to replace
> some compound words with their respective Glyphs. All tests are passing. I
> have tested this for the Bengali font. Please review these changes and let me
> know if it makes sense to incorporate these.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]