[
https://issues.apache.org/jira/browse/PDFBOX-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289572#comment-14289572
]
John Hewson edited comment on PDFBOX-2618 at 1/23/15 5:44 PM:
--------------------------------------------------------------
Perhaps I didn't make myself clear: this issue is feature creep and should
either be closed or addressed by adding an example to PDFBox. The complexity of
laying out even a single line of text using Unicode and an OpenType font cannot
be understated - *entire* libraries are devoted to this task. We would need to
use at least ICU4J and some sort of HarfBuzz equivalent for Java (which doesn't
even exist - that's how hard it is to build). Even FOP doesn't get this right.
I don't want to see PDFBox turn from being a high-quality low-level PDF library
into a low-quality typesetting library. But this is exactly what will happen if
this issue is allowed to proceed. It's _guaranteed_ that there will be no end
to the JIRA issues opened once we add such a feature - when users can typeset
Unicode text, they come to depend on it and expect it not to fail when faced
with something complex or non-Western.
We need to rename this issue to better reflect what it's proposing, here are
the choices:
- A) *"Rebuild ICU4J and HarfBuzz ourselves"*
- B) *"Build half-baked fundamentally broken Western typesetting into PDFBox".*
C) Alternatively, we could write an example which uses either ICU4J or the
JDK's font handing (yes, this _will_ work - and is even slated to be replaced
with HarfBuzz in OpenJDK in a future release). It would be great to have such
an example!
A, B, or C?
P.S.
{quote}
We also might want to check with our fop colleagues if there is some synergy
e.g. when it comes to line breaking algorithms and such. They are using an
enhanced version of Knuth and Plass AFAIK.
{quote}
This is *exactly* the kind of typesetting feature creep which we don't need.
was (Author: jahewson):
Perhaps I didn't make myself clear: this issue is feature creep and should
either be closed or addressed by adding an example to PDFBox. The complexity of
laying out even a single line of text using Unicode and an OpenType font cannot
be understated - *entire* libraries are devoted to this task. We would need to
use at least ICU4J and some sort of HarfBuzz equivalent for Java (which doesn't
even exist - that's how hard it is to build). Even FOP doesn't get this right.
I don't want to see PDFBox turn from being a high-quality low-level PDF library
into a low-quality typesetting library. But this is exactly what will happen if
this issue is allowed to proceed. It's _guaranteed_ that there will be no end
to the JIRA issues opened once we add such a feature - when users can typeset
Unicode text, they come to depend on it and expect it not to fail when faced
with something complex or non-Western.
We need to rename this issue to better reflect what it's proposing, here are
the choices:
- A) *"Rebuild ICU4J and HarfBuzz ourselves"*
- B) *"Build half-baked fundamentally broken Western typesetting into PDFBox".*
C) Alternatively, we could write an example which uses either ICU4J or the
JDK's font handing (yes, this _will_ work - an is even slated to be replaced
with HarfBuzz in OpenJDK in a future release). It would be great to have such
an example!
A, B, or C?
P.S.
{quote}
We also might want to check with our fop colleagues if there is some synergy
e.g. when it comes to line breaking algorithms and such. They are using an
enhanced version of Knuth and Plass AFAIK.
{quote}
This is *exactly* the kind of typesetting feature creep which we don't need.
> Create paragraphs with PDFBox
> -----------------------------
>
> Key: PDFBOX-2618
> URL: https://issues.apache.org/jira/browse/PDFBOX-2618
> Project: PDFBox
> Issue Type: Improvement
> Components: Writing
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
>
> [~mkl] wrote this morning on stackoverflow on the topic about creating tables
> with PDFBox:
> {quote}I'm afraid all those samples IMO meely are proofs of concept, probably
> of use in limited use cases but by far not for generic use. PDFBox has its
> strengths, e.g. a quite versatile content extraction framework and a content
> rendering capability, but the absence a proper layouting API is a serious
> weakness.{quote}
> To which I answered:
> {quote}I know... I just don't want to create another iText. We're not the
> Samwer brothers.{quote}
> But he's right. We could of course look at what iText offers and implement
> that on our own, that wouldn't even be illegal, but it wouldn't be nice. I've
> never looked at or used iText, except once when answering this:
> http://stackoverflow.com/a/26820598/535646
> IMO what we need to start, is a method to write a paragraph to a PDF. Such a
> method would have these parameters:
> - text
> - rectangle (or width and height from current position)
> Such a method would then output the text and break the lines at the end of
> the rectangle, and throw an exception if the space isn't enough.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)