Hi Jeremias, I'll address your concerns inline: <snip/> > First, I would like to suggest you start with listing relevant code > portions on the wiki before anything else. Where is what? And what's the > problem with each case? Right now there are only a few vague pointers > and an underlying unhappiness with the various approaches.
My apologies for any lack of clarity on my part, the issues around URI resolution are in no way due to any unhappiness on my part. Infact, rather frustratingly, great effort has been made to have fallback mechanisms to allow ambiguity from the user (which I'm sure we all know happens all too often). However, as far as I can tell, URI resolution is a single problem, and as such, should have a single solution. Now, I do appreciate there are nuances involved, but allowing for them (which I'll discuss later) there should be a single URI resolution mechanism. The problem I'm trying to tackle here is how do we sandbox FOPs file access? With the current implementation, that isn't possible. There's too much contingency i.e. if this resolved URI doesn't exist, check this one. As I've said, in the cloud, we have to be very strict, we cannot allow one user to gain access (intentionally or otherwise) to another users data. <snip/> > I think it can be useful to think about simplifying the use of > URIResolver to new interface for resource resolution where only > InputStreams are required. IMO, it should then still be possible to use > a URIResolver to resolve those URIs. An adapter for URIResolvers should > be possible to write. I spent quite some time deliberating on which approach would be best, returning an InputStream or a Source object. The problem is, the only time FOP actually reads XML is when parsing SVG. Even reading the FO is done by the JAXP transformer. So I do appreciate the JAXP system is tried and tested, but using that API isn't the best approach, IMO. The reason being that everytime we want to convert a Source object to an InputStream, we need to re-write the code that does so, which is non-trivial since that is where the URI is actually resolved. We could cast Source to StreamSource, but that returns an InputStream anyway. <snip/> > I get the impression that you're suggesting that only a single base URI > (on the FopFactory?) is required. In the past, we've had to add multiple > base URIs precisely because there isn't a single base URI. Some URIs > need to be resolved relative to the input FO document (base URI on > FOUserAgent.base). Or they need to be resolved relative to the XSLT > stylesheet in use (images may or may not be stored next to the XSLT > stylesheets). Fonts (FontManager.fontBase) and hyphenation patterns > (FopFactory.hyphenBase) may be at a different location respectively. Or > they could simply be relative to the main configuration file > (FopFactory.base). Granted, that adds complexity but also flexibility > for those who need it. So here's the problem: whichever client that is calling FOP, gives it a URI resolver. This resolver, all it does, is convert a URI to an InputStream, it shouldn't need to hold any state (i.e. base URI). Now, having all these base URIs is going to get pretty confusing no? I think all that's needed is font-base and base (defined in the fop.xconf). Without getting too much into the nuts and bolts, this is where the wrapper comes in. The wrapper holds the state (defined by the user when FopFactory is instantiated and/or in the fop.xconf), and it can resolve against the base, giving the user defined resolver an absolute URI to read from. This would allow users to define their own URIs and a single resolution mechanism. Not only does this give the security of sandboxing, it also allows for the full flexibility of URIs to be exploited. The user can define their own schemes, queries etc and the resource being read doesn't even need to be on the file system. It could be in a database; a remote resource; whatever as long as it can be resolved and converted to an InputStream. > > Looking at that, the signature "InputStream getInputStream(URI)" may be > insufficient. Like in URIResolver, you may need to extend that to > "InputStream getInputStream(URI resource, URI base)", so you can get the > URI resolved against the applicable base URI of the context you're > working in (fonts, hyph patterns, config files etc.). OTOH, we have some > special resolution interfaces (like FontResolver) which don't have a > base URI because it is implicit and handled by the caller. The various > specialized resolver interfaces help decouple the various packages from > neighbouring ones to reduce dependencies. I think I've addressed most of these concerns above, but I believe the user already defines a base with <fop-base> or <base> in the fop.xconf. So these should be used to resolve relative URIs. In terms of decoupling, I don't think I could agree more. The fonts packages especially are in dire need of some TLC, and extracting them to their own module is what I've been pushing for. However, let's be realistic here, as they stand, they're not a library. There is far too much coupled to the rest of FOP and giving them a URI resolver, isn't really really adding much to the bindings. It's all done in a single class. Also, because I plan on removing all the URI Strings, it would probably actually help in making it more of a library. The fonts library shouldn't have to care about URI resolution. You should give it an InputStream and it should do what it does. > In this context, I find it suboptimal when there are dependencies on > org.apache.fop.apps from packages like "fonts", "pdf" or "hyph" because > they have the potential to be used independently from FOP. More than > once did I have to adjust changes that caused the PDF library to have > unnecessary dependencies into new packages (ex. the FOUserAgent which > even from its name doesn't have anything to do with a basic PDF library). > In this spirit, I like how Victor Mote took his FOP fork (FOray) apart > into multiple modules with clearly defined dependencies. We've had > discussions about doing similar things but have not come to a consensus > which is why we still have the huge, scary (for newbies), single source > tree. Having the renderers in separate subprojects could allow people to > scale FOP down to the subsets they need. Only a few really need AFP but > it adds a lot of byte code to fop.jar. Having done a lot of OSGi on the > past years, I have come to appreciate smaller JARs (Bundles in OSGi talk). > This approach forces better package design and management of > dependencies. I think we're in danger of violently agreeing with each other here. My plan is to move the URI resolver to XGC, as such, there'll be a single resolver for the whole project. There will be no superfluous dependencies floating around. > In Batik land, the build produces a number of subsystem JARs besides the > "all-jar" which we bundle with FOP. Having worked on Batik, I found it a > challenge to deal with one huge file tree producing multiple JARs while > keeping the dependencies in order. It's a bit easier in FOP but not > everyone pays attention to this. > > Personally, I'd still love to see FOP split up into: core, util, hyph, > fonts, pdf, afp, pcl, etc. and getting the XGC support and other stuff > for SVG into Batik. Related to this: > http://wiki.apache.org/xmlgraphics/XmlGraphicsCommonComponents > > But I'm getting off course... Off course maybe, but I like the direction! I absolutely agree. > As for "OutputStream getOutputStream(URI)", I would put that into a > separate interface since IMO it mixes concerns. The input side should be > easy to integrate with URIResolver, but the output side will produce a > problem here. Usually, you only need one or the other, but rarely both > (I think the font cache is an example of the combined case). When > generating pages as PNG, you have a special case where we have to pass > in a file name from which other file names are derived to produce > multiple files (PNG is strictly single-page). That already warrants a > special interface for that purpose which could use the > "getOutputStream(URI)" interface (standard functionality currently in > MultiFileRenderingUtil). I respectfully disagree. URI resolution should address just that, resolving URIs into an interface so that FOP can read bits/bytes. The resolution mechanism should be the same regardless of whether you're reading or writing. > Finally, a few more words about Batik: Batik does not support > URIResolvers or EntityResolvers which has hurt me more than once. You > have to do tricks by either registering URL handlers or a > ParsedURLProtocolHandler, both of which are registered in a static and > otherwise inflexible fashion. Refactoring this would be a major tasks > since, like URIResolvers in FOP, ParsedURL is used all over the Batik > place. AbstractFOPImageElementBridge, for example, intercepts the > ParsedURL (which should actually be considered a URI, not a URL) to load > external images using the XGC image loader rather than Batik's own image > support. Yeah, I haven't actually looked at Batik yet. We'll cross that bridge when we get to it. No doubt, it's going to be a barrel o' laughs. I also haven't looked at XGC, which I do appreciate is something we need to look at. My intention was to do this incrementally, I think, since it's so sensitive to change to address one thing at a time. > In the end, I'd like to ask you: > - to pay attention to package dependencies (keeping them at a minimum) > - to avoid reducing the chance that FOP may be split up into clean > modules in the future > - to minimize backwards-incompatible API changes (removing > FopFactory.setURIResolver() should not be necessary, for example) > - to keep the ability to do plain URI -> InputStream resolution using > URIResolver somehow. > - to preserve the ability to use DOMSource and SAXSource as image > sources, i.e. to not change the XGC image loaders. I think, aside from some minor API changes, we are doing all the above apart the last. I'll review XGC shortly and obviously put any ideas in a public forum for futher discussion. <snip/> Thanks a lot for addressing your concerns here, I appreciate I may have been a bit vague on the details on the Wiki, but I've just started writing the actual code. I'll try and update the wiki with a bit more information, my only worry is not to get lost in minutiae. Mehdi
