[ https://issues.apache.org/jira/browse/PDFBOX-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313265#comment-14313265 ]
John Hewson edited comment on PDFBOX-2580 at 2/10/15 12:42 AM: --------------------------------------------------------------- {quote} what I wanted to say is that they are not based on COS objects but are based on external specifications. {quote} Many aspects of PDF are based on external specifications, TrueType, Type1, CFF, ICC Profiles, JPEG, JPX, etc. and we handle all of those in the PD model. {quote} XFA is deprecated in the current draft for PDF 2. {quote} The "standard 14" fonts are deprecated in ISO 32000 but we implement them in PD because they don't stop existing once they're deprecated. The fact is that we have to support XFA rich text fields, and it has to be a part of core PDFBox because our core functionality such as rendering depends upon it. {quote} If we ever support XFA I won't see that to be part of PD. It's a huge spec with a different layout and rendering model to PDF. The same with rich text strings and CSS. [...] To support rich text / CSS we would need a HTML like renderer and formatter. Is PD a good home for that - I don't think so. {quote} We don't need anything like the sophistication of HTML or full CSS. While the XFA spec is indeed large, we only need support for rich text strings and a tiny subset of CSS2 style attributes, which wouldn't be too hard to implement. Yes there's a box model, but it's trivial compared to the proper CSS box model, which is not part of XFA. The CSS2 is so minimal that a few regex classes could handle it, and the XML is simple enough to use Java's built in classes. The reason why this has to be part of PD, is that rendering is part of PD and rendering XFA form fields is a core part of PDF which we need to support. There's no point in creating an external module for XFA if PD depends on it, and it depends on PD - it's a cyclic dependency. {quote} I rather see that sitting in it's own package or even subproject being referred to from PD where needed. {quote} That code might well go in its own package, but it would be a sub-package of PD. There's no reason to put it somewhere else if PD depends on it, because it's tightly coupled to PD. The key is not to introduce cyclic package dependencies as Sonar was warning about, i.e. keep coupled classes in the same package unless there is a reason they are absolutely required to be elsewhere. {quote} If you look at it from a rendering perspective and you want to be able to render everything then you always need everything. But what about users who only want to merge documents or do text extraction? {quote} We already have APIs which provide those features. Perhaps you were thinking of some other goal, e.g. text extraction on Android? You'll need to elaborate on what the goal is before we can say whether or not the solution satisfies it. If your question is "what if users only want a subset of our API?", as I've said before the best results will be obtained by using ProGuard, unless it's a dependency issue (e.g. AWT) in which case we need to address it as part of our modularisation. Remember though, that there is no advantage in having a separate forms module which depends upon PD, as it will bring with it all the same baggage as PD, i.e. it can't be used in isolation, so it's of no advantage. {quote} I'll start working putting the missing functionality in later this week and from there we can rework naming/packaging as needed. If the decision is being made to have the stuff in interactive.forms my work will be based on PDAcroForm being the main entry keeping as much as possible package private. {quote} Great, in the meantime can you provide a list of the classes you're planning on adding to interactive.forms and their responsibilities? The great thing about package-private is that we can evolve the API as we need and expose something public-facing when if it is strictly needed and that can even happen post-2.0 without such changes being breaking. was (Author: jahewson): {quote} what I wanted to say is that they are not based on COS objects but are based on external specifications. {quote} Many aspects of PDF are based on external specifications, TrueType, Type1, CFF, ICC Profiles, JPEG, JPX, etc. and we handle all of those in the PD model. {quote} XFA is deprecated in the current draft for PDF 2. {quote} The "standard 14" fonts are deprecated in ISO 32000 but we implement them in PD because they don't stop existing once they're deprecated. The fact is that we have to support XFA rich text fields, and it has to be a part of core PDFBox because our core functionality such as rendering depends upon it. {quote} If we ever support XFA I won't see that to be part of PD. It's a huge spec with a different layout and rendering model to PDF. The same with rich text strings and CSS. [...] To support rich text / CSS we would need a HTML like renderer and formatter. Is PD a good home for that - I don't think so. {quote} We don't need anything like the sophistication of HTML or full CSS. While the XFA spec is indeed large, we only need support for rich text strings and a tiny subset of CSS2 style attributes, which wouldn't be too hard to implement. Yes there's a box model, but it's trivial compared to the proper CSS box model, which is not part of XFA. The CSS2 is so minimal that a few regex classes could handle it, and the XML is simple enough to use Java's built in classes. The reason why this has to be part of PD, is that rendering is part of PD and rendering XFA form fields is a core part of PDF which we need to support. There's no point in creating an external module for XFA if PD depends on it, and it depends on PD - it's a cyclic dependency. {quote} I rather see that sitting in it's own package or even subproject being referred to from PD where needed. {quote} That code might well go in its own package, but it would be a sub-package of PD. There's no reason to put it somewhere else if PD depends on it, because it's tightly coupled to PD. The key is not to introduce cyclic package dependencies as Sonar was warning about, i.e. keep coupled classes in the same package unless there is a reason they are absolutely required to be elsewhere. {quote} If you look at it from a rendering perspective and you want to be able to render everything then you always need everything. But what about users who only want to merge documents or do text extraction? {quote} We already have APIs which provide those features. Perhaps you were thinking of some other goal, e.g. text extraction on Android? You'll need to elaborate on what the goal is before we can say whether or not the solution satisfies it. If your question is "what if users only want a subset of our API?", as I've said before the best results will be obtained by using ProGuard, unless it's a dependency issue in which case we need to address it as part of our modularisation. Remember though, that there is no advantage is having a separate forms module which depends upon PD, as it will bring with it all the same baggage as PD, i.e. it can't be used in isolation, so it's of no advantage. {quote} I'll start working putting the missing functionality in later this week and from there we can rework naming/packaging as needed. If the decision is being made to have the stuff in interactive.forms my work will be based on PDAcroForm being the main entry keeping as much as possible package private. {quote} Great, in the meantime can you provide a list of the classes you're planning on adding to interactive.forms and their responsibilities? The great thing about package-private is that we can evolve the API as we need and expose something public-facing when if it is strictly needed and that can even happen post-2.0 without such changes being breaking. > Decouple implementation specific forms handling from interactive.form PD Model > ------------------------------------------------------------------------------ > > Key: PDFBOX-2580 > URL: https://issues.apache.org/jira/browse/PDFBOX-2580 > Project: PDFBox > Issue Type: Improvement > Components: AcroForm > Reporter: Maruan Sahyoun > Assignee: Maruan Sahyoun > Fix For: 2.0.0 > > Attachments: sonar.png > > > The interactive.form PD model currently holds classes reflecting the various > fields intermixed with appearance generation and layout handling. > In order to separate the PD model from the service of forms filling and > appearance generation this functionality shall be moved into a new package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org