[jira] [Commented] (PDFBOX-4951) Sequences with combining letters are rendered incorrectly

Volker Kunert (Jira) Mon, 12 Oct 2020 13:16:52 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212647#comment-17212647
 ]


Volker Kunert commented on PDFBOX-4951:
---------------------------------------

1 The font must be loaded twice - yes we have to load it twice because we use 
the positioning 
  features using FOP's MultiByteFont.  We can't store a reference to 
MultibyteFont in PDType0Font because it is stored in a COSDictionary and 
recreated - loosing extra attributes in this process.

2 Width of A̋ or Ž̧ return the same size as A or Z for me, there is no new code 
involved.
                        PDType0Font font = PDType0Font.load(pdDocument, new 
FileInputStream(fontFile), false);
                        System.out.printf("%f %f%n", font.getStringWidth("A"), 
font.getStringWidth("A̋"));
                        System.out.printf("%f %f%n", font.getStringWidth("Z"), 
font.getStringWidth("Ž̧"));
                        639,000000 639,000000
                        572,000000 572,000000

3 Which variant of Z plus accent is not OK? They look good to me.

4 The bug in FOP (FOP-2969) means e.g. that the accent is not located above the 
current letter, 
  instead e.g. above the following letter.

5 Bengali processing and FOP-positioning do both reorder the glyphs -- so they 
can't work together
  at the moment.
  Integration on the base of the current implementation or based on FOP 
  seems possible but needs a programmer who knows Bengali language and script.

6 IMHO the user should be required to explicitly enable FOP-positioning, in 
order not to break
  other algorithms. Possibly it could be enabled for script latn.

7 I am preparing little corrections to my code.



> Sequences with combining letters are rendered incorrectly
> ---------------------------------------------------------
>
>                 Key: PDFBOX-4951
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4951
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.21
>            Reporter: Volker Kunert
>            Priority: Major
>         Attachments: DIN_SPEC_91379_Sequences-aa.pdf, 
> DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf, 
> DIN_SPEC_91379_Sequences.txt, DefaultScriptProcessor.java, 
> ExamplePdfboxFopPos.java, ExamplePdfboxFopPos.pdf, 
> ExamplePdfboxFopPosForm.java, ExamplePdfboxFopPosForm.pdf, TestPdfbox.java, 
> TestPdfboxFop2.java, TestPdfboxFop2.pdf, TestPdfboxJava2D.java, 
> TestPdfboxJava2D.pdf, patch-2020-10-02.txt, pdfbox.pdf, screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are 
> rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING 
> DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter 
> A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names 
> and data 
>  exchange in Europe; with digital attachment
>  [https://www.xoev.de/downloads-2316#StringLatin]
>  [https://www.din.de/de/wdc-beuth:din21:301228458]
>  
> The correct rendering should look like the output of hb-view 2.6.8, see files 
> DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running 
> TestPdfbox.java. The sequences are read from file 
> DIN_SPEC_91379_Sequences.txt.
>  
> Font used for testing: NotoSansMono-Regular.ttf, see 
> [https://www.google.com/get/noto/] 
> download: 
> [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
>  See also FOP-2969
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4951) Sequences with combining letters are rendered incorrectly

Reply via email to