Re: Thoughts about image handling

Max Berger Fri, 23 Jun 2006 00:18:17 -0700

Jeremias,

Actually, you left out the pre-loading of the image size. That's
important if you want to delay image loading until the rendering stage
(or avoid it altogether). Note that not in all cases will it be
necessary to load an image. Sometimes only references to the images are put in the output format. Currently, this only applies to RTF but could
actually be used in PostScript and maybe AFP output. Special languages
such as PPML even go so far an make it their prime purpose not having to
handle all the image data but providing them on the targte platform.

Good point. This requires parsing the header without actually parsing the whole imagedata. I know that ImageIO provides this capability, but I am not sure about JIMI and other solutions.

AAMOF the ImageIO interface provides the exact capabilities required: Registering of Image Readers, providing support for setting the image data source, then getting all the meta-info without decoding the actual image. Unfortunately ImageIO does not seem to support getting the "raw" image data, but that functionality is available in fop's JpegImage.


So there are more steps:
- detect file format (may already read metadata and image if needed)
- read metadata (may read image if needed)
- read image data OR get raw image data.

Going towards Raster or RenderedImage for the in-memory representation
of the image is certainly a very welcome step.

What I'm missing a little is that certain images will be converted
before they are processed by the renderer. For example, Barcode4J
converts its barcodes to SVG, EPS or Java2D graphics depending on the
output format in use. Generally, each renderer will have different
preferences how an image will be processed.


That is exactly the problem I see in the current renderers:

To render MathML or plan (in the examples) to pdf they are first rendered as SVG and then the SVG is rendered via Java2D into the PDF. I don't know how barcode4j does it, but I would assume it is similar.

However, every one of the renderers (or at least the awt, ps, and pdf renderers) support a Java2D compatibility interface, which is currently used for SVG images.


What I propose is:

- If there is a Java2D interface, offer that directly to all vector image providers. - If there is no Java2D interface (such as in the rtf output) then render the vector image into a bitmap image (awt has standard support for that, however I do not know if it works headless in 1.3) and use that.

While PDF can embed TIFF
CCITT4 files directly, they have to be decoded for PCL. The ideal image
subsystem will also cache a preconverted image so the
conversion/decoding can be avoided next time the image is used.

Ok. Here's an idea: Use a Map ImageURL-> ImageInfo, where ImageInfo contains mime-type, metadata, raw content and decoded content (if possible). Every one of them may be null, and will be loaded on demand. For the actual imagedata, a SoftReference may be used.

I don't think that'll work considering the above. I rather think the
Renderer will have to tell the image subsystem the preferred flavor of
the image. It will then receive the image in the right form if that is
possible.


How about this (for bitmaps):

An image has a "native" format, which describes the raw stream if possible. Typical values are EPS, DCT, RLE, LZW, CCITT, and so on. The renderer can then check if it has support for this raw data type. If so, it will use it. If not, it will have to use the decoded Raster data.

We can provide additional compressors which will take raster data and provide raw data in one of the lossless formats (gzip, rle). They can be used by the renderer to reduce file size.


To support extensibility, a registration mechanism is provided. Here
is the basic idea:

Java provides standard mechanisms to find all resources with a given
name in all classpath items. This allows to find all META-INF/
MANIFEST.MF files given in all JAR files in the classpath (1). These
files can be parsed using standard Manifest functionality.

The files contain some attributes that describe classes used. For
image handlers, this could be a classname and the supported image
type. It may contain additional attributes, such as supported
subtypes (e.g. LZW for TIFF). Ideally the exact specification of
these attributes would be coordinated between fop and foray to
support reuse.

This information can be parsed once and stored.

This mechanism requires the user to change only the classpath, and
nothing else.


Ok, something like that sounds pretty good. Remains to be seen whether

the config needs to be in a file or rather in a factory class like we've

done it before (example: AbstractRendererMaker). The only thing left
might be the question how to handle priorities if two implementations
support the same kind of image.

What you describe here is already in use in FOP and Batik. We don't use

the MANIFEST.MF directly but the class name of the provider
class/interface. See [1] and [2]. Let's reuse what we already have.

[1] http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/ META-INF/services/ [2] http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/ org/apache/xmlgraphics/util/Service.java?view=log

Ok. This solution is much better than the one that I had in mind. Especially since it is way more generic and already tested.

Since it does not provide support for additional meta information, all of this must be extracted from factory code, which is feasable. In the case of ImageIO it is actually better to extract capabilities in code: then ImageIO can be asked on the local jdk which image types are actually supported.


to check which implementation to prefer, two things must be considered:

- capabilities: ImageIO JPEG can read header data without reading the image data, whereas the existing JpegImage provides support for the RAW stream, but always reads the whole image. This may be a question speed vs. size that can only be answered by a user configuration file. Some Image providers may not support all subtypes, such as LZW compressed tiff files.

- speed of the implementation: this is a good question, which one is faster: ImageIO or JIMI ?

- It may (or may not) make sense to use different implementations in different steps of the process or in different renderers. The disadvantage is that the URL will have to be re-opened. It may (or may not) make sense to use different implementations depending on the origin of the image. (Local images are RandomAccessFiles, while remote images are usually forward-only files).

questions? comments?
Have you seen this Wiki page? http://wiki.apache.org/xmlgraphics- fop/ImageSupport

I've just looked at it, addresses pretty much the same issues covered in this mail.

I'm happy to see that you volunteer to work in this area. It's something I wanted to fix for a long time now but it always had a lower priority. I
envisioned a slightly different direction as you can guess from my
comments but this is still open for discussion.

I'd rather discuss and bounce ideas back and forth first than writing tons of code that needs to be revised. This is not my main project either, but it is good to have something to divert myself from time to time :)

Jeremias Maerki


Max Berger

PGP.sig
Description: This is a digitally signed message part

Re: Thoughts about image handling

Reply via email to