Hi Dilip
I think something like the ArabicLigaturizer class is needed but then
for Indian
On 15/10/2011 1:16, [email protected] wrote:
Hi Paulo,
What can be done to use iText for Indian languages? As I've mentioned
in my earlier emails, I'm willing to put in time to implement one
Indian language to start with and contribute my discoveries as well as
code to the community.
Any direction in this matter is highly appreciated.
Dilip
*From:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=0>
*Sent:* Thursday, October 13, 2011 2:17 PM
*To:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=1>
*Subject:* Re: ligature implementation for Indian languages /
Devanagari script
I would like to pick up this thread...
Is there any way I could help with implementing Indian languages in
iText? Is there any documentation / code that I can refer to and
attempt to implement one Indian language to start with? I'll be more
than happy to contribute my work to the community.
Dilip
*From:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=2>
*Sent:* Thursday, April 07, 2011 6:40 AM
*To:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=3>
*Subject:* Re: ligature implementation for Indian languages /
Devanagari script
I'm working on a self-funded 'hobby' project. I guess we'll have to
wait for a party with funds who badly needs this done.
Dilip
*From:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=4>
*Sent:* Wednesday, April 06, 2011 4:05 PM
*To:* [hidden email] </user/SendEmail.jtp?type=node&node=3906482&i=5>
*Subject:* Re: ligature implementation for Indian languages /
Devanagari script
?
Indic ligatures are a lot more complex, not only with the posiible
combinations but also, and probably more important, in that the
ligaturized representation has no corresponding Unicode code point.
This requires a GSUB table to provide the glyph id for the ligature.
iText has no capability to read this table (GPOS would also be nice to
have). The process to implement support for Indic scripts would be:
- have the rules for Indic ligatures
- decode the GSUB table in the font to get the glyph id of the ligature
- add the ligature, as a glyph id, to the output text
None of this is supported in iText for the moment and would take
several weeks to implement if we knew how (we can learn, no big deal)
and if someone was willing to pay for the development.
Paulo
----- Original Message -----
*From:* [hidden email]
</user/SendEmail.jtp?type=node&node=3432093&i=0&by-user=t>
*To:* [hidden email]
</user/SendEmail.jtp?type=node&node=3432093&i=1&by-user=t>
*Sent:* Wednesday, April 06, 2011 10:55 PM
*Subject:* Re: [iText-questions] ligature implementation for
Indian languages / Devanagari script
I'll let Paulo comment since he wrote the Arabic shaper and knows
what's involved...
*From:*[hidden email]
</user/SendEmail.jtp?type=node&node=3432093&i=2&by-user=t>
[mailto:[hidden email]
</user/SendEmail.jtp?type=node&node=3432093&i=3&by-user=t>]
*Sent:* Wednesday, April 06, 2011 2:03 PM
*To:* [hidden email]
</user/SendEmail.jtp?type=node&node=3432093&i=4&by-user=t>
*Subject:* Re: [iText-questions] ligature implementation for
Indian languages / Devanagari script
Would that be a common dictionary for all the Indian languages
that use Devanagari script or a separate one for each language
such as Hindi, Marathi, Gujarati, etc. I believe you'll need
separate one for each but then I could be wrong. I know these 3
Indian languages well enough to help out.
What's the format of this dictionary? Could you point to the
Arabic dictionary?
Thanks for so much interest in this subject. Can we make use of
the interest and momentum to get this done.
Dilip
*From:*[hidden email]
</user/SendEmail.jtp?type=node&node=3431809&i=0&by-user=t&by-user=t>
*Sent:*Wednesday, April 06, 2011 1:48 PM
*To:*[hidden email]
</user/SendEmail.jtp?type=node&node=3431809&i=1&by-user=t&by-user=t>
*Subject:*Re: ligature implementation for Indian languages /
Devanagari script
That's EXACTLY what is needed - the "dictionary" that tells iText
that when it sees a specific combination of codepoints to use a
different glyph than normal. iText has one for Arabic text, but
not for Devanagari (and other Indics). If you can build such a
table/dictionary, that would go a LONG WAY to getting support into
iText.
Leonard
-----Original Message-----
From: John Kilbourne [mailto:[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=0&by-user=t&by-user=t&by-user=t>]
Sent: Wednesday, April 06, 2011 1:15 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
Thank you for your clarification.
I understand from your last paragraph that iText would need to
determine whether any contextual glyph shaping needs to be
performed and then find the relevant glyphs in the font file.
iText would not need to read the glyphs 'live' (as they are being
typed in) and change the rendering as subsequent characters are
typed; it just sees a finished sequence of Unicode (often
multi-)byte characters. Is is difficult to have a 'dictionary' of
character combinations within iText that relate the combinations
of Unicode characters ('codepoints' I think is the correct term)
to the appropriate glyphs (e.g. ? + ? = ???)? I would like to help
(because I would really like to use iText), or at least understand
this problem better.
----- Original Message -----
From: "Leonard Rosenthol" <[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=1&by-user=t&by-user=t&by-user=t>>
To: "Post all your questions about iText here" <[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=2&by-user=t&by-user=t&by-user=t>>
Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada
Eastern
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
Roman (also sometimes called Latin) is a class of languages (also
known as Romance) that includes English, French, German, etc.
This is codified in the encoding ISO 8859-1 (also called ISO
Latin 1 - <<A href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1>"
rel=nofollow target=_top
link="external">http://en.wikipedia.org/wiki/ISO/IEC_8859-1>)
Devanagari is a script used for Hindi (and other Indic languages -
see <<A href="http://en.wikipedia.org/wiki/Devanagari>"
rel=nofollow target=_top
link="external">http://en.wikipedia.org/wiki/Devanagari>).
Fonts are simply a way to provide a set of glyphs (visual
representations of "letters" and "symbols"). They may or may not
have a correlation to a specific script or language. In most
cases today, fonts include glyphs for MANY languages & scripts
(eg. Unicode fonts).
A font CAN NOT automatically do anything! The software that lays
out the characters/code points MUST determine whether any
contextual glyph shaping needs to be performed and then find the
relevant glyphs in the font file. See <<A
href="http://people.w3.org/rishida/docs/unicode-tutorial/part3#context-sensitive>"
rel=nofollow target=_top
link="external">http://people.w3.org/rishida/docs/unicode-tutorial/part3#context-sensitive>
which
is just part of a full presentation on Unicode.
Hope that helps clarify things for you.
-----Original Message-----
From: John Kilbourne [mailto:[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=3&by-user=t&by-user=t&by-user=t>]
Sent: Wednesday, April 06, 2011 12:34 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
I wonder if the distinction between font and language and
something in between aren't involved here. English is a language,
Roman is not quite a font (I think), but Times-Roman would be a
font. Hindi is a language, devanagari is not quite a font, and
Sansrit2003 (the font I use for devanagari) is a font.
Anyway, here (http://www.wazu.jp/gallery/Fonts_Devanagari.html)
is a list of devanagari fonts showing the ligatures they
naturally produce. Sanskrit 2003 (the font) automatically renders
devanagari ligatures like
???, ???, ??? for
???+???, ???+???, and ??? + ???.
----- Original Message -----
From: "Leonard Rosenthol" <[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=4&by-user=t&by-user=t&by-user=t>>
To: "Post all your questions about iText here" <[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=5&by-user=t&by-user=t&by-user=t>>
Sent: Wednesday, April 6, 2011 2:49:09 PM GMT -05:00 US/Canada
Eastern
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
The information about what two character codes/code points make up
a given ligature isn't encoded into a font. For example, there is
nothing that tells me that when I find 'f' and 'i' next to each
other in Roman/English text that I can turn that into an 'fi'
ligature. I have to have that type of logic written into my layout
engine.
Contextual languages such as Arabic and Devanagari make this even
more complex - but same concept...
Leonard
-----Original Message-----
From: John Kilbourne [mailto:[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=6&by-user=t&by-user=t&by-user=t>]
Sent: Wednesday, April 06, 2011 11:41 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
How is it that different devanagari fonts render ligatures
differently? It seems that the ligature information is found in
the font itself. Is there no way for iText to utilize the
information already contained the particular font used to render
the text?
----- Original Message -----
From: "1T3XT BVBA" <[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=7&by-user=t&by-user=t&by-user=t>>
To: [hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=8&by-user=t&by-user=t&by-user=t>
Sent: Sunday, April 3, 2011 3:01:34 AM GMT -05:00 US/Canada Eastern
Subject: Re: [iText-questions] ligature implementation for Indian
languages / Devanagari script
[hidden email]
</user/SendEmail.jtp?type=node&node=3431768&i=9&by-user=t&by-user=t&by-user=t>
wrote:
> Does iText have ligature implementation for Indian languages
such as Hindi,
> Gujarati, Marathi, etc?
No, none of the iText developers understand any Indic language.
You are always welcome to contribute code that supports ligatures in
Indic languages.
------------------------------------------------------------------------------
u ask for examples: http://itextpdf.com/themes/keywords.php
--
twitter <http://twitter.com/redlabbe>
redlab-log <http://www.redlab.be/blog/>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php