[jira] [Comment Edited] (PDFBOX-4213) UNICODE fonts UTF8

Jeyan (Jira) Thu, 17 Mar 2022 13:05:06 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508390#comment-17508390
 ]


Jeyan edited comment on PDFBOX-4213 at 3/17/22, 8:04 PM:
---------------------------------------------------------

[~tilman]   thanks for your time.

Ok, Working on adding a GsubWorker implementation for tamil. Taking 3.0.0-RC1 
as [~paawak]  code is not available in 2.0.25.

In addition, just gave an attempt with HarfBuzz as suggested by a SO user [SO 
Link |https://stackoverflow.com/a/71325241/1341062] . It looks working, if I 
provide a font and unicode as an input to hb-shape it gives back glyph ids in 
the expected order with substitutions and reordering. But it does only the 
glyphe id shaping part as expected. Just looking for the possibility to use it 
with a PDF box for complex glyphe id substitutions and ordering.

If below is the sequence,
 # Parse the TTF/OTF font file - PDFBox 
 # Receive the input text. showtext - PDFBox 
 # Convert the input text into uni code points - PDFBox 
 # Get GID/CID (cmap lookup, CID cmap, GSUB, GPOS) from ‘Font file/Parser obj’ 
for the code points with required ordering - PDFBox 
 # Get glyph from Font File using the GID/CID - PDFBox 
 # Embed glyph Subset - PDFBox 
 # Generate byte stream for writer to create PDF files - PDFBox

Can I assume the fourth item in the above list can be replaced with a HarfBuzz 
shaper Engine? The HarfBuzz shaper Engine will take care of the correct 
arrangements of the glyphs provided we could send the unicode codepoints of the 
input text and the font file. PDFbox receives the ordered glyphs and continues 
with getting glyphs from the font file, subsetting and byte stream to writer. 

What would be the challenges on this, I could think of the below,
 # Kerning, Positioning?
 # Subsetting 
 # Integration with C++ codebase 

Can you please comment on this when you get time?


was (Author: JIRAUSER284958):
[~tilman]   thanks for your time.

Ok, I will have to add a GsubWorker implementation for Tamil. Have to start 
with 3.0.0-RC1 as [~paawak]'s  code is not available in 2.0.25.

In addition, just gave an attempt with HarfBuzz as suggested by a SO user [SO 
Link |https://stackoverflow.com/a/71325241/1341062] . It looks working, if I 
provide a font and unicode as an input to hb-shape it gives back glyph ids in 
the expected order with substitutions and reordering. But it does only the 
glyphe id shaping part as expected. Just looking for the possibility to use it 
with a PDF box for complex glyphe id substitutions and ordering.

If below is the sequence,
 # Parse the TTF/OTF font file - PDFBox 
 # Receive the input text. showtext - PDFBox 
 # Convert the input text into uni code points - PDFBox 
 # Get GID/CID (cmap lookup, CID cmap, GSUB, GPOS) from ‘Font file/Parser obj’ 
for the code points with required ordering - PDFBox 
 # Get glyph from Font File using the GID/CID - PDFBox 
 # Embed glyph Subset - PDFBox 
 # Generate byte stream for writer to create PDF files - PDFBox

Can I assume the fourth item in the above list can be replaced with a HarfBuzz 
shaper Engine? The HarfBuzz shaper Engine will take care of the correct 
arrangements of the glyphs provided we could send the unicode codepoints of the 
input text and the font file. PDFbox receives the ordered glyphs and continues 
with getting glyphs from the font file, subsetting and byte stream to writer. 

What would be the challenges on this, I could think of the below,
 # Kerning, Positioning?
 # Subsetting 
 # Integration with C++ codebase 

Can you please comment on this when you get time?

> UNICODE fonts UTF8 
> -------------------
>
>                 Key: PDFBOX-4213
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4213
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox, PDModel
>    Affects Versions: 2.0.7
>            Reporter: tritmain
>            Priority: Major
>         Attachments: pdf_utf_iss.png
>
>
> When we  use the font with UTF8 code support fonts in the PDFbox with Tamil 
> fonts
> String testSting="  பேஸ்புக் " in the jaav applicationa I got output in PDF 
> with attached image pdf_utf_iss.png format.
> Which is wrong 
> some other fonts works perfect "ஆஈஊஐஏளறனடணச"
> Please help us to resolve the issue 
>  
>  
> ----------
> File tamilFontFilePattinatharGist = new 
> File(this.getServletContext().getRealPath("/fonts/GIST-TAM-OTPattinathar_N_Ship.ttf"));
>  PDType0Font fontPattinatharGist = PDType0Font.load(document, 
> tamilFontFilePattinatharGist);//Not ok with பேஸ்புக்
> contentStream.setFont( fontPattinatharGist, 15 );
> String testSting="ஆஈ பேஸ்புக் 
> ஆஈஊஐஏளறனடணசஞ‍இஉஎகபமதநயழரலஙவொஓஔ\\r\\nஆஈஊஐஏளறனடணசஞ‍இஉஎகபமதநயழரலஙவொஓஔ";
>  contentStream.showText(testSting);
>  System.out.println(testSting);
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-4213) UNICODE fonts UTF8

Reply via email to