Hi, I currently thinking about the interface to use for reading OpenType files.
There are two possibilities: - reading on top of an InputStream or - reading on top of a RandomAccessFile or FileChannel. Currently the implementation in FOP uses the class FontFileReader which expects an InputStream. But it immediately calls IOUtils.toByteArray(in) and works on that byte array instead. So it needs to hold the file completely in memory. FontBox which is part of PDFBox uses some abstract class called TTFDataStream with template methods which has two implementations, one called RAFDataStream which operates on top of a RandomAccessFile and one called MemoryTTFDataStream which operates on top of a byte array. I started using pure InputStreams. That means I implemented the whole OpenType file reading using a hierarchy of FilterInputStreams. At the lowest level I have a DataInputStream which takes every Inputstream and provides methods to read the basic data types of OpenType just like java.io.DataInputStream does for java data types. On top of that, I have streams that can read some small scale data structures, than streams which can read whole tables and finally a stream which can read the whole OpenType file. To read an OpenType file, all you have to write is: InputStream in = ... OpenTypeFileInputStream otfIn = new OpenTypeFileInputStream(in); OpenTypeFile otf = otfIn.readOpenTypeFile(); In my opinion this system works really good. You can take every InputStream, the reading is decoupled from the OpenType classes itself and you can test peaces of OpenType structure using only the individual streams. But! My approach has one flaw. I need to seek extensively while reading an OpenType file. The whole file format consists of headers with offsets and data structures which one has to read from that offsets. To get this seeking work with streams, I use mark(), reset() and skip(). My common approach at the beginning of such a structure is to mark, than read the header and for every part, reset to the start, mark again, skip to the offset and read the part. But with this approach I'm ending up to hold the whole file in memory. To make it worse, this mark(), reset(), skip() interface doesn't support hierarchical marking. If I seek inside smaller scale structures the mark position of the larger scale structure is overwritten. I don't think that it is possible to build hierarchical mark support on top of any markable InputStream. (Oh look I did it later as I wrote this longish mail.) I think, one have to reimplement BufferedInputStream holding ones own byte array. In fact I did this on top of ByteArrayInputStream. The key problem is that one can't get a position out of an InputStream which does not surprise as the concept of streams doesn't have a position. It is possible to read the parts in offset order. But there are duplicated offsets (more than one offset pointing to the same part) and parts that have to go into an array in a semantic order which doesn't have to be the offset order. So I have to first reorder the offsets to read the parts in offset order and than I have to reorder the read parts again to get them back into the semantic order. That said - it is still possible that the offsets are in fact in the semantic order of the parts, but the spec doesn't say this. I don't want to depend on RandomAccessFile or FileChannel, because I need to be able to test reading of substructures out of byte arrays. What I need is an Interface from which I can read bytes and which allows multiple relative seeks. With multiple relative seeks I mean something like multiple marks. As I wrote this, I implemented such a thing inside my DataInputStream. There is now a method: public SkipHandle mark(); and the SkipHandle class looks like this: public class SkipHandle { private final long relativePos; public void skipTo(long offset); } SkipHandle is a non-static inner class of DataInputStream. DataInputStream counts the bytes read and skipped to get an idea of its actual position. The SkipHandle gets the actual stream position on creation so that it is able to skip on DataInputStream relative to its creation position. If the skip would be negative, SkipHandle resets the whole stream to the start (on creation of DataInputStream, a normal mark is set) and skips afterwards. It works, but I find it a little but ugly. First I have to set a mark(Integer.MAX_VALUE) on DataInputStream creation, because I want always be able to reset the whole stream, but I don't have any information about how many bytes are on the road. Than I have to disable markSupport on my DataInputStream so that nobody kills my own mark. But the biggest problem is that DataInputStream has now a non-standard mark(), skipTo() API. Its not like a normal FilterInputStream anymore. You can't use normal marking, because it's disabled and you have to learn this new API instead. Streams simply aren't the right API for reading stuff like OpenType files which require massive seeking. But all the seekable API's are tight on files. The TTFDataStream API of FontBox is completely custom. I would like to avoid such things. So I simply don't know a standard Java API which allows byte reading and seeking over an arbitrary source and throws IOExceptions on its methods. What about NIO? I don't see any skipping or seeking on channels. Any idea is welcome. Best Regards Alex - e-mail: alexanderk...@gmx.net web: www.alexanderkiel.net
signature.asc
Description: This is a digitally signed message part