[jira] [Comment Edited] (PDFBOX-2580) Decouple implementation specific forms handling from interactive.form PD Model

John Hewson (JIRA) Sat, 07 Feb 2015 21:34:27 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311082#comment-14311082
 ]


John Hewson edited comment on PDFBOX-2580 at 2/8/15 5:33 AM:
-------------------------------------------------------------

Your reply above contains some very significant misunderstandings about the 
current structure of PDFBox and the role of the PD model.

{quote}
Appearance generation, block formatting, rich text handling, default field 
styles (and if we are to support annotations properly similar items for these) 
don't have a direct reference in the PDF specification or are incorporated into 
the spec. IMHO they don't belong to the PD model as to me PD also means that 
this is (well) defined in the PDF specification.
{quote}

There's a large amount of code in PDFBox which isn't part of the PDF spec, it's 
probably what fifty percent of my effort goes into. There's easily several 
thousand lines of code in PDFBox devoted to this and most of that code is in 
the PD model. Your description of the PD model is not accurate: *PD _is_ the 
home of our non-standard PDF code* and its sole purpose is to provide a 
user-friendly high-level abstraction over the contents of a PDF file. PD is 
_not_ just a type safe wrapper around COS objects, it's much more than that. It 
would be impossible to remove non-standard PDF code from the PD model, because 
without those workarounds the PD model could no longer function. For example, 
it would be impossible to have any sort of useful font API in PD were all the 
non-standard code to be removed.

The 3 levels in PDFBox's design already cover everything we need:

- COS: low-level raw PDF objects
- PD: normalized, high-level wrappers for PDF concepts (sometimes these wrap 
many COS objects, e.g. PDPageTree)
- util: multi-PDF utilities such as merging and splitting

There has to be an extremely good reason why new functionality absolutely 
cannot fit into that existing design. There's enormous value in PD being the 
single and definitive user-friendly API for high-level PDF manipulation: it 
gives us consistency, discoverability, simplicity, predicability, and avoidance 
of surprises.

{quote}
By having that - especially the implementation specific parts of the appearance 
generation which are not defined in the PDF specification - in a separate 
package we differentiate the defined part from the undefined part because the 
undefined part is our own interpretation of how that shall be done.
{quote}

But *we don't have any such separation* of non-standard PDF handling code. What 
we have is PD which performs all the high-level non-standard PDF repair and 
provides a consistent, clean and usable API. Nobody has ever suggested that PD 
shouldn't contain this kind of code, I don't know where you got that idea from, 
because it doesn't reflect the actual structure of PDFBox.

{quote}
As an added benefit it will allow to better modularize PDFBox. E.g. one could 
leave out the forms creation capability easily (but still has the ability to do 
it himself by using the PD model).
{quote}

I disagree that it's a useful goal. The impact on the size of the size of the 
PDFBox jar would be negligible, nor would there be a any reduction of external 
dependencies. I've not seen any user demand for this kind of modularisation 
(other kinds such as removing AWT rendering or ICU, sure, but not the kind of 
dismantling of the PD model which you propose).

{quote}
Finally I do know about the Sonar issue. To me that's an intermediate issue as 
this only happens as appearance generation is currently triggered in 
.interactive.forms but handled by .services.forms. I had planned to remove that 
anyway so appearance generation will only be done via .services.forms. and no 
longer triggered from .interactive.forms. A lot of the Sonar issues AcroForms 
originally had before I started working on that package are already resolved.
{quote}

Ok, that would remove the Sonar warning, however what you'll have is a 
situation where the PD API provides a broken-by-design implementation of forms. 
Then the real forms API is in a separate non-PD model, in contrast to all other 
high-level single-PDF functionality in PDFBox which is contained in the PD 
model. The end result is to take something consistent and predictable and 
replace it with something arbitrary, inconsistent, and unexpected.

The PD model in PDFBox has been a strong point in its design, and it continues 
to evolve and serve the high-level needs of PDFBox users very well. Many new 
high-level features have been added in 2.0 which perform exactly the sort of 
tasks which you claim that PD doesn't and shouldn't do. Any proposal based on 
such an inaccurate characterisation of the PD model is going to be deeply 
flawed. When one realises that PD is where we provide our safe, user-facing API 
which tolerates malformed PDFs, then becomes clear that improving PD is nearly 
always the best answer: it worked for color spaces, fonts, and images, and it 
will work for forms.


was (Author: jahewson):
Your reply above contains some very significant misunderstandings about the 
current structure of PDFBox and the role of the PD model.

{quote}
Appearance generation, block formatting, rich text handling, default field 
styles (and if we are to support annotations properly similar items for these) 
don't have a direct reference in the PDF specification or are incorporated into 
the spec. IMHO they don't belong to the PD model as to me PD also means that 
this is (well) defined in the PDF specification.
{quote}

There's a large amount of code in PDFBox which isn't part of the PDF spec, it's 
probably what fifty percent of my effort goes into. There's easily several 
thousand lines of code in PDFBox devoted to this and most of that code is in 
the PD model. Your description of the PD model is not accurate: *PD _is_ the 
home of our non-standard PDF code* and its sole purpose is to provide a 
user-friendly high-level abstraction over the contents of a PDF file. PD is 
_not_ just a type safe wrapper around COS objects, it's much more than that. It 
would be impossible to remove non-standard PDF code from the PD model, because 
without those workarounds the PD model could no longer function. For example, 
it would be impossible to have any sort of useful font API in PD were all the 
non-standard code to be removed.

The 3 levels in PDFBox's design already cover everything we need:

- COS: low-level raw PDF objects
- PD: normalized, high-level wrappers for PDF concepts (sometimes these wrap 
many COS objects, e.g. PDPageTree)
- util: multi-PDF utilities such as merging and splitting

There has to be an extremely good reason why new functionality absolutely 
cannot fit into that existing design. There's enormous value in PD being the 
single and definitive user-friendly API for high-level PDF manipulation: it 
gives us consistency, discoverability, simplicity, predicability, and avoidance 
of surprises.

{quote}
By having that - especially the implementation specific parts of the appearance 
generation which are not defined in the PDF specification - in a separate 
package we differentiate the defined part from the undefined part because the 
undefined part is our own interpretation of how that shall be done.
{quote}

But *we don't have any such separation* of non-standard PDF handling code. What 
we have is PD which performs all the high-level non-standard PDF repair and 
provides a consistent, clean and usable API. Nobody has ever suggested that PD 
shouldn't contain this kind of code, I don't know where you got that idea from, 
because it doesn't reflect the actual structure of PDFBox.

{quote}
As an added benefit it will allow to better modularize PDFBox. E.g. one could 
leave out the forms creation capability easily (but still has the ability to do 
it himself by using the PD model).
{quote}

I disagree that it's a useful goal. The impact on the size of the size of the 
PDFBox jar would be negligible, nor would there be a any reduction of external 
dependencies. I've not seen any user demand for this kind of modularisation 
(other kinds such as removing AWT rendering or ICU, sure, but not the kind of 
dismantling of the PD model which you propose).

{quote}
Finally I do know about the Sonar issue. To me that's an intermediate issue as 
this only happens as appearance generation is currently triggered in 
.interactive.forms but handled by .services.forms. I had planned to remove that 
anyway so appearance generation will only be done via .services.forms. and no 
longer triggered from .interactive.forms. A lot of the Sonar issues AcroForms 
originally had before I started working on that package are already resolved.
{quote}

Ok, that would remove the Sonar warning, however what you'll have is a 
situation where the PD API provides a broken-by-design implementation of forms. 
Then the real forms API is in a separate non-PD model, in contrast to all other 
high-level per-PDF functionality in PDFBox which is contained in the PD model. 
The end result is to take something consistent and predictable and replace it 
with something arbitrary, inconsistent, and unexpected.

The PD model in PDFBox has been a strong point in its design, and it continues 
to evolve and serve the high-level needs of PDFBox users very well. Many new 
high-level features have been added in 2.0 which perform exactly the sort of 
tasks which you claim that PD doesn't and shouldn't do. Any proposal based on 
such an inaccurate characterisation of the PD model is going to be deeply 
flawed. When one realises that PD is where we provide our safe, user-facing API 
which tolerates malformed PDFs, then becomes clear that improving PD is nearly 
always the best answer: it worked for color spaces, fonts, and images, and it 
will work for forms.

> Decouple implementation specific forms handling from interactive.form PD Model
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2580
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2580
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: AcroForm
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>             Fix For: 2.0.0
>
>         Attachments: sonar.png
>
>
> The interactive.form PD model currently holds classes reflecting the various 
> fields intermixed with appearance generation and layout handling.
> In order to separate the PD model from the service of forms filling and 
> appearance generation this functionality shall be moved into a new package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Comment Edited] (PDFBOX-2580) Decouple implementation specific forms handling from interactive.form PD Model

Reply via email to