[ 
https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090222#comment-18090222
 ] 

Maruan Sahyoun commented on PDFBOX-4951:
----------------------------------------

I have been follwing the discussion of the PR and some of the changes. I'd 
support adding this as I see the benefit of it when it comes to handling 
combining letters, which has been the original intent, but also handling 
foreign scripts.

Looks like the groundwork for a clean module split is already 90% complete 
because {{{}PDAbstractContentStream{}}}, {{{}PDAcroForm{}}}, and 
{{AppearanceGeneratorHelper}} only interact with the interfaces rather than 
concrete implementations.

However, placing {{GlyphLayoutProcessorAwt}} directly into the core module 
introduces a hard dependency on {{java.desktop}} (via {{{}java.awt.*{}}}). For 
users building minimal runtimes via {{{}jlink{}}}, running in restricted 
headless server environments, or working on Android variants, keeping core free 
of desktop dependencies is highly beneficial.

Additionally, as discussed, there is a strong interest in potentially 
supporting alternative text-shaping engines (like {*}Apache FOP{*}'s complex 
script layout) down the line. Keeping the SPI decoupled from the implementation 
allows us to easily introduce a {{pdfbox-layout-fop}} submodule later on 
without altering the core engine.

*Proposed Adjustment*

Since the SPI pattern is already beautifully established here, could we split 
the AWT-specific implementation into its own submodule?

Keep in pdfbox (Core):

- GlyphLayoutProcessorInterface

- ContentStreamForGlyphLayoutInterface

- GlyphsAndPositions

- The wiring hooks in PDAbstractContentStream / PDAcroForm

Move to a new {{pdfbox-layout-awt}} submodule:

- GlyphLayoutProcessorAwt

- GlyphLayoutFontLoaderAwt

This keeps the core module lightweight and headless-friendly, while allowing 
users who need advanced shaping to simply pull in the pdfbox-layout-awt 
dependency.

Other than that +1 from my side

> Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly
> ---------------------------------------------------------------------------
>
>                 Key: PDFBOX-4951
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4951
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.21
>            Reporter: Volker Kunert
>            Priority: Major
>         Attachments: DIN_SPEC_91379_Sequences-aa.pdf, 
> DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf, 
> DIN_SPEC_91379_Sequences.txt, DefaultScriptProcessor.java, DejaVuSans.ttf, 
> DoGlyphLayoutBidi.pdf, DoGlyphLayoutDinSpec91379.pdf, 
> DoGlyphLayoutDinSpec91379Form.pdf, DoGlyphPositionBengali.pdf, 
> ExamplePdfboxFopPos-By-Tilman.pdf, ExamplePdfboxFopPos.java, 
> ExamplePdfboxFopPos.pdf, ExamplePdfboxFopPosForm.java, 
> ExamplePdfboxFopPosForm.pdf, FiraCode-Regular.ttf, 
> FontForge-Lohit-Bengali.png, TestPdfbox.java, TestPdfboxFop2.java, 
> TestPdfboxFop2.pdf, TestPdfboxJava2D.java, TestPdfboxJava2D.pdf, bidi-1.png, 
> bidi-2.png, bidi.png, example-PDFBOX-3147-NotoSansThaiLooped-Regular.png, 
> image-2026-05-23-16-16-53-442.png, image-2026-05-23-16-17-28-172.png, 
> image-2026-05-26-16-49-45-529.png, ligatures-kerning.png, 
> patch-2020-10-02.txt, pdfbox.patch, pdfbox.pdf, screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are 
> rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING 
> DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter 
> A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names 
> and data 
>  exchange in Europe; with digital attachment
>  [https://www.xoev.de/downloads-2316#StringLatin]
>  [https://www.din.de/de/wdc-beuth:din21:301228458]
>  
> The correct rendering should look like the output of hb-view 2.6.8, see files 
> DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running 
> TestPdfbox.java. The sequences are read from file 
> DIN_SPEC_91379_Sequences.txt.
>  
> Font used for testing: NotoSansMono-Regular.ttf, see 
> [https://www.google.com/get/noto/] 
> download: 
> [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
>  See also FOP-2969
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to