[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438383#comment-16438383
]
Tilman Hausherr commented on PDFBOX-4189:
-----------------------------------------
Your patch is very much appreciated of course, thank you. It will probably
result in thousands new users / usages. This is a complex patch so expect this
to take some time before it is committed. See PDFBOX-4106 for an example of a
complex patch and the discussion.
Can you please also add an apache header to the .txt files? See the file
"pdfbox\src\main\resources\org\apache\pdfbox\resources\glyphlist\additional.txt".
I'd also need to know where this file came from, or whether you created it
yourself from other data; if yes, please include a comment how, and/or the code
that created the file.
About the commits:
- {color:#333333}What is the story of having different data for jdk7 and
jdk8?{color}
- BengaliPdfGenerationHelloWorld should be integrated into the
EmbeddedFonts.java example
- why a log4j2.xml ? We don't use log4j2 except in preflight where log4j is
used in Tests
- I think I understand why my example didn't work. You disabled subsetting.
But with subsetting the subsetter should "know" which glyphs are used. But we
do need subsetting because otherwise files might get huge
- The generated PDF file has trouble with text extraction: "আমি কোন পথ ীরর
লী ষ পুতুল পো গা ঋষি" i.e. there are some unknown glyphs.
- The move of GlyphsubstitutionTable breaks the API. Like I said in the PR, if
you keep the API as it is (only expand. not change existing methods) then your
change could be used for 2.0 too. The release of 3.0 could take years. The
release of 2.0.10 only a few months.
- There is a lot of logging done ("WARNUNG: oldValue: [52, 114] will be
overridden with newValue: [114, 52]"). This is scary and should be changed or
removed, It scares users and they create issues, thinking that something got
wrong. If you change it to debug, please include a comment what this is about.
See also the discussion in PDFBOX-4106, about{color:#333333} "Trying to
un-substitute a never-before-seen gid"{color}.
- Loosening scope restrictions is a bit of a no-no, as done in
[TTFDataStream.java|https://github.com/apache/pdfbox/pull/46/files#diff-894ae790d373c62634ceed941b264dc3]
,
[TTFTable.java|https://github.com/apache/pdfbox/pull/46/files#diff-355fd8e3330f392bdae0778f942dc124]
, and maybe elsewhere. As preached by "Effective Java", item 15: "make each
class or member as inaccessible as possible".
- Public methods should have a javadoc, same for classes. It doesn't have to
be big, just make it good enough for other people to understand what is done.
See also [https://pdfbox.apache.org/codingconventions.html] , I think most
conventions are already respected.
I have no yet done a review review of the code (looking side-by-side), so more
questions may be coming.
> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -----------------------------------------------------------------------------
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
> Issue Type: New Feature
> Components: FontBox, PDModel
> Reporter: Palash Ray
> Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph
> substitution. The GSUB table has been read and used effectively to replace
> some compound words with their respective Glyphs. All tests are passing. I
> have tested this for the Bengali font. Please review these changes and let me
> know if it makes sense to incorporate these.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]