[jira] [Commented] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table

Vladimir Plizga (Jira) Sat, 21 Jan 2023 21:49:04 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679513#comment-17679513
 ]


Vladimir Plizga commented on PDFBOX-4189:
-----------------------------------------

The current implementation is capable of parsing virtually any GSUB table. 
However, in practice it allows to read GSUB data from fonts with Bengali 
language only because this is the only language currently 
[listed|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/model/Language.java#L36]
 in the *{{org.apache.fontbox.ttf.model.Language}}* enumeration. From a FontBox 
user's point of view this is a critical limitation because inspection of 
various OTF tables is one of main use cases for the library.

The JavaDoc comment on the *{{org.apache.fontbox.ttf.model.Language}}* 
enumeration 
[states|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/model/Language.java#L25]
 that:

?? In order to support a new language, you need to add it here and then 
implement the {{GsubWorker}} for the given language and return the same from 
the {{GsubWorkerFactory.getGsubWorker(org.apache.fontbox.ttf.CmapLookup, 
GsubData)}}??

These instructions make sense for a complete support of a new language, i.e. 
including the ability to substitute glyphs with each other according to the 
language rules (like the implementation in this issue does). However, for a 
pure reading purposes this is not necessary. Unfortunately, simple adding of a 
new enumeration entry doesn't work because 
*{{GsubWorkerFactory}}* 
[throws|https://github.com/apache/pdfbox/blob/d90928e224cfe99e99a806a10045c9372549a7e7/fontbox/src/main/java/org/apache/fontbox/ttf/gsub/GsubWorkerFactory.java#L40]
 an exception for any language which is not explicitly supported (i.e. any one 
except Bengali).
 
In order to allow reading GSUB table data for any script tag without 
implementing a fully fledged {{{}GsubWorker{}}}, I'm proposing a (mostly) 
additive change to GSUB handling: [https://github.com/apache/pdfbox/pull/153]
A more detailed explanation of the change is in the PR description.
 
I would be glad to see your feedback on this proposal.

> Enable PDF creation with Indian languages, by reading and utilizing the GSUB 
> table
> ----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4189
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4189
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: FontBox, PDModel
>            Reporter: Palash Ray
>            Priority: Major
>         Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, 
> BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, 
> bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, 
> bengali-word-lohit-good.pdf, committed.patch, pdf-output.png, screenshot.png
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table

Reply via email to