[ 
https://issues.apache.org/jira/browse/PDFBOX-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311670#comment-14311670
 ] 

John Hewson edited comment on PDFBOX-2580 at 2/9/15 12:35 AM:
--------------------------------------------------------------

Modularising PDFBox is one of the tasks which we have been working on in 2.0, 
though I wouldn't go as far as to call it a goal, as it doesn't achieve 
anything in its own right: the actual goals are things such as reducing the 
number of third party jars which we depend on, having PDFBox run on Android, or 
avoiding the use of AWT in server environments.

A great deal of modularisation has been done in 2.0, and we've pretty much 
achieved our modularisation goals, with the exception of removing AWT as a 
dependency, which isn't feasible at this stage. PDFBOX-586 provides a great 
overview of the modularisation which has been delivered in 2.0:

- examples have been moved to their own module
- lucene integration has been moved to the examples module
- ant integration has been moved to the examples module
- moved all command line tools into their own module
- moved PDFViewer swing component into the tools module
- replaced usage of ICU for complex text extraction with Java's built-in support

The benefit of these changes is not the removal of a small amount of code from 
pdfbox core\* but to reduce the jars which pdfbox core depends on, namely:

- lucene
- ant
- swing
- ICU

All of which are very large dependencies. These are the kinds of changes which 
we mean when we refer to modularisation, because they have tangible benefits 
and help users satisfy goals such as reducing the size of the PDFBox 
distribution from hundreds to tens of megabytes.

It's notable that we still haven't achieved one of our original modularisation 
goals of text extraction on Android (PDFBOX-586) due to the fact that Android 
does not support AWT, and PDFBox core and FontBox depend deeply on AWT. In 
practice it may simply not be practical to remove AWT from PDFBox core without 
causing significant collateral damage. Even text extraction has deep 
dependencies on AWT via PD's clipping paths and FontBox's fonts.

Returning to the topic of forms, any modularisation needs to be presented in 
terms of third party dependencies. 1) What jars would be no longer have to 
depend on in core if we move forms into their own module? 2) Conversely, what 
pdfbox jars would a user of a separate forms module be able to avoid having 
dependencies on?

---
\* If you want small jar files, there are off-the-shelf tools which can perform 
dead code analysis and strip down and repack jar files, which will produce much 
better results than any kind of package structure which we come up with in 
PDFBox. What's harder is reducing third party dependencies, which is something 
that only we can do.


was (Author: jahewson):
Modularising PDFBox is one of the tasks which we have been working on in 2.0, 
though I wouldn't go as far as to call it a goal, as it doesn't achieve 
anything in its own right: the actual goals are things such as reducing the 
number of third party jars which we depend on, having PDFBox run on Android, or 
avoiding the use of AWT in server environments.

A great deal of modularisation has been done in 2.0, and we've pretty much 
achieved our modularisation goals, with the exception of removing AWT as a 
dependency, which isn't feasible at this stage. PDFBOX-586 provides a great 
overview of the modularisation which has been delivered in 2.0:

- examples have been moved to their own module
- lucene integration has been moved to the examples module
- ant integration has been moved to the examples module
- moved all command line tools into their own module
- moved PDFViewer swing component into the tools module
- replaced usage of ICU for complex text extraction with Java's built-in support

The benefit of these changes is not the removal of a small amount of code from 
pdfbox core\* but to reduce the jars which pdfbox core depends on, namely:

- lucene
- ant
- swing
- ICU

All of which are very large dependencies. These are the kinds of changes which 
we mean when we refer to modularisation, because they have tangible benefits 
and help users satisfy goals such as reducing the size of the PDFBox 
distribution from hundreds to tens of megabytes.

It's notable that we still haven't achieved our original modularisation goals 
of text extraction on Android (PDFBOX-586) due to the fact that Android does 
not support AWT, and PDFBox core and FontBox depend deeply on AWT. In practice 
it may simply not be practical to remove AWT from PDFBox core without causing 
significant collateral damage. Even text extraction has deep dependencies on 
AWT via PD's clipping paths and FontBox's fonts.

Returning to the topic of forms, any modularisation needs to be presented in 
terms of third party dependencies. 1) What jars would be no longer have to 
depend on in core if we move forms into their own module? 2) Conversely, what 
pdfbox jars would a user of a separate forms module be able to avoid having 
dependencies on?

---
\* If you want small jar files, there are off-the-shelf tools which can perform 
dead code analysis and strip down and repack jar files, which will produce much 
better results than any kind of package structure which we come up with in 
PDFBox. What's harder is reducing third party dependencies, which is something 
that only we can do.

> Decouple implementation specific forms handling from interactive.form PD Model
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2580
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2580
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: AcroForm
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>             Fix For: 2.0.0
>
>         Attachments: sonar.png
>
>
> The interactive.form PD model currently holds classes reflecting the various 
> fields intermixed with appearance generation and layout handling.
> In order to separate the PD model from the service of forms filling and 
> appearance generation this functionality shall be moved into a new package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to