[jira] [Comment Edited] (PDFBOX-2580) Decouple implementation specific forms handling from interactive.form PD Model

John Hewson (JIRA) Sat, 07 Feb 2015 19:40:01 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311082#comment-14311082
 ]


John Hewson edited comment on PDFBOX-2580 at 2/8/15 3:39 AM:
-------------------------------------------------------------

Your proposal is built upon ideas which are the almost the opposite of PDFBox's 
actual design, and your reply above contains some very large misunderstandings 
about the current structure of PDFBox.

{quote}
Appearance generation, block formatting, rich text handling, default field 
styles (and if we are to support annotations properly similar items for these) 
don't have a direct reference in the PDF specification or are incorporated into 
the spec. IMHO they don't belong to the PD model as to me PD also means that 
this is (well) defined in the PDF specification.
{quote}

There's a huge amount of code in PDFBox which isn't part of the PDF spec, it's 
probably what 50% of my effort goes into. There's easily several thousand lines 
of code in PDFBox devoted to this and most of that code is in the PD model. 
Your description of the PD model is completely inaccurate: *PD _is_ the home of 
our non-standard PDF code* and that's where it belongs, because PD provides our 
high-level abstraction on top of the low-level COS objects, and so it's the 
responsibility of PD to provide a clean interface to the user. It would be 
impossible to remove non-standard PDF code from the PD model, because without 
those workarounds the PD model could no longer function. For example, it would 
be impossible to have any sort of useful font API in PD were all the 
non-standard code to be removed.

The 3 levels in PDFBox's design already cover everything we need:

- COS - low-level raw PDF objects
- PD - normalized, high-level wrappers for PDF concepts (sometimes these wrap 
many COS objects, e.g. PDPageTree)
- util - multi-PDF utilities such as merging and splitting

There has to be an extremely good reason why new functionality absolutely 
cannot fit into that existing design.

{quote}
By having that - especially the implementation specific parts of the appearance 
generation which are not defined in the PDF specification - in a separate 
package we differentiate the defined part from the undefined part because the 
undefined part is our own interpretation of how that shall be done.
{quote}

But *we don't have that*. What we have is PD which performs all the high-level 
non-standard PDF repair and provides a consistent, clean and usable API. Nobody 
has ever suggested that PD shouldn't contain this kind of code, I don't know 
where you got that idea from, because it doesn't reflect the actual structure 
of PDFBox.

{quote}
As an added benefit it will allow to better modularize PDFBox. E.g. one could 
leave out the forms creation capability easily (but still has the ability to do 
it himself by using the PD model).
{quote}

I disagree that it's a useful goal, I've never heard of anybody who wants to do 
this. There would be a minimal impact on the size of the PDFBox jar and no 
impact on its dependencies. We can be mindful of such things, but they 
certainly shouldn't be something which dictates the design of our API.

{quote}
Finally I do know about the Sonar issue. To me that's an intermediate issue as 
this only happens as appearance generation is currently triggered in 
.interactive.forms but handled by .services.forms. I had planned to remove that 
anyway so appearance generation will only be done via .services.forms. and no 
longer triggered from .interactive.forms. A lot of the Sonar issues AcroForms 
originally had before I started working on that package are already resolved.
{quote}

Ok, that would remove the Sonar warning, however what you'll have is a 
situation where the PD API provides a broken-by-design implementation of forms. 
Then the real forms API is in a separate non-PD model, despite the fact that 
all other high-level per-PDF functionality in PDFBox is contained in the PD 
model. So the end result is to take something consistent and predictable and 
replace it with something arbitrary and unexpected.

You need to take some time to correct your misunderstandings about the 
structure of PDFBox, because unless you do so, you're not going to be able to 
make a meaningful proposal for improving the forms API. The main thing is to 
realise that improving PD is nearly always the answer, it worked for color 
spaces, fonts, and images, and it will work for forms.


was (Author: jahewson):
Your proposal is built upon ideas which are the almost the opposite of PDFBox's 
design, and your reply above contains some very large misunderstandings about 
the current structure of PDFBox.

{quote}
Appearance generation, block formatting, rich text handling, default field 
styles (and if we are to support annotations properly similar items for these) 
don't have a direct reference in the PDF specification or are incorporated into 
the spec. IMHO they don't belong to the PD model as to me PD also means that 
this is (well) defined in the PDF specification.
{quote}

There's a huge amount of code in PDFBox which isn't part of the PDF spec, it's 
probably what 50% of my effort goes into. There's easily several thousand lines 
of code in PDFBox devoted to this and most of that code is in the PD model. 
Your description of the PD model is completely inaccurate: *PD _is_ the home of 
our non-standard PDF code* and that's where it belongs, because PD provides our 
high-level abstraction on top of the low-level COS objects, and so it's the 
responsibility of PD to provide a clean interface to the user. It would be 
impossible to remove non-standard PDF code from the PD model, because without 
those workarounds the PD model could no longer function. For example, it would 
be impossible to have any sort of useful font API in PD were all the 
non-standard code to be removed.

The 3 levels in PDFBox's design already cover everything we need:

- COS - low-level raw PDF objects
- PD - normalized, high-level wrappers for PDF concepts (sometimes these wrap 
many COS objects, e.g. PDPageTree)
- util - multi-PDF utilities such as merging and splitting

There has to be an extremely good reason why new functionality absolutely 
cannot fit into that existing design.

{quote}
By having that - especially the implementation specific parts of the appearance 
generation which are not defined in the PDF specification - in a separate 
package we differentiate the defined part from the undefined part because the 
undefined part is our own interpretation of how that shall be done.
{quote}

But *we don't have that*. What we have is PD which performs all the high-level 
non-standard PDF repair and provides a consistent, clean and usable API. Nobody 
has ever suggested that PD shouldn't contain this kind of code, I don't know 
where you got that idea from, because it doesn't reflect the actual structure 
of PDFBox.

{quote}
As an added benefit it will allow to better modularize PDFBox. E.g. one could 
leave out the forms creation capability easily (but still has the ability to do 
it himself by using the PD model).
{quote}

I disagree that it's a useful goal, I've never heard of anybody who wants to do 
this. There would be a minimal impact on the size of the PDFBox jar and no 
impact on its dependencies. We can be mindful of such things, but they 
certainly shouldn't be something which dictates the design of our API.

{quote}
Finally I do know about the Sonar issue. To me that's an intermediate issue as 
this only happens as appearance generation is currently triggered in 
.interactive.forms but handled by .services.forms. I had planned to remove that 
anyway so appearance generation will only be done via .services.forms. and no 
longer triggered from .interactive.forms. A lot of the Sonar issues AcroForms 
originally had before I started working on that package are already resolved.
{quote}

Ok, that would remove the Sonar warning, however what you'll have is a 
situation where the PD API provides a broken-by-design implementation of forms. 
Then the real forms API is in a separate non-PD model, despite the fact that 
all other high-level per-PDF functionality in PDFBox is contained in the PD 
model. So the end result is to take something consistent and predictable and 
replace it with something arbitrary and unexpected.

You need to take some time to correct your misunderstandings about the 
structure of PDFBox, because unless you do so, you're not going to be able to 
make a meaningful proposal for improving the forms API. The main thing is to 
realise that improving PD is nearly always the answer, it worked for color 
spaces, fonts, and images, and it will work for forms.

> Decouple implementation specific forms handling from interactive.form PD Model
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2580
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2580
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: AcroForm
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>             Fix For: 2.0.0
>
>         Attachments: sonar.png
>
>
> The interactive.form PD model currently holds classes reflecting the various 
> fields intermixed with appearance generation and layout handling.
> In order to separate the PD model from the service of forms filling and 
> appearance generation this functionality shall be moved into a new package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-2580) Decouple implementation specific forms handling from interactive.form PD Model

Reply via email to