Re: Writing PDF Documents and other source code parts

Adrian Cumiskey Thu, 01 Oct 2009 10:19:41 -0700

Hi Alexander,

I feel your pain bro [1] - I recommend you have a good read through themail archives. This is unfortunately a very familiar story when tryingto add new features to the FOP code base. Fonts are probably the onearea that is most in need of refactoring. I get a reassuring feelingthat you are really trying to do things the right way, and I'm sure anyimprovements you are able to make there will be greatly appreciated.Good luck!


Adrian.

[1] http://fop-dev.markmail.org/message/re7w3pwvc64amuwq?q=god+object

Alexander Kiel wrote:

Hi,

I know my goal is to implement basic OpenType support for FOP. But from
font subsetting/embedding my eyes touched the actual PDF output
routines.

I think, that this module needs refactoring. If you have a look at the
PDFWritable interface, there is a really ugly method. The method
outputInline takes an OutputStream and a Writer, which are related to
each other. The comment says, that the writer is buffered and every time
out want to write something to the OutputStream, you have to flush the
Writer first. Thats crude.

What is really needed is some output interface which is able to do both,
write chars and write bytes.

I had also a look at PDFBox regarding writing PDF's. Maybe we shouldn't
refactor FOP's own, maybe a bit legacy PDF code. But I don't like PDFBox
code either.

So I'm a bit helpless now. The problem is, regardless of what code I
see, let it be:

TTFSubSetFile

    Which is all about, reading a TrueType file, taking account of
    some glyph mapping (the glyphs used) and returning a byte array,
    which contains the bytes of a TrueType file with the subset of
    glyphs. This thing extends TTFFile which is about representing a
    TrueType file mixed with all the reading stuff. Here, reading,
    writing and representing some real world object is mixed in a
    really ugly way.

PDFFactory

    This class does two things: creating and registering PDF objects.
    A factory should only create objects. Than this class has nearly
    1800 lines of code. Maybe it is a factory of to much things?

    If I look at the method which interests me "makeFontFile" the
    comment says: "Embeds a font.", but the method name is
    "makeFontFile". "makeFontFile" makes sense in a factory. But
    "Embeds a font." hints that this created font file is actually
    embedded in the PDF document. Than this method has nearly 100
    lines of code, which does all sorts of things that I can't
    understand fast. In some line the TTFSubSetFile is created and
    the resulting bytes go into some PDFTTFStream - okay.

    So do not wonder about memory problems. Here you have whole
    300 kb+ fonts sitting in arrays.

MultiByteFont

It seems to me that the MuliByteFont tracks the glyph usage."getUsedGlyphs", "mapChar", "subSet". I always thought that

    fonts are immutable objects, representing a font program which
    can be used shared all over the application. Enjoy building
    a common font source in FOP!

I don't know how I should integrate my own code into it. I think here is
a lot of refactoring necessary in order to get the FOP parts into some

state here I can integrate new code.

But I'm not sure where to start, not sure if here are enough tests. I
don't know the overall structure. I'm simply a bit helpless.

I have a nice fonts.opentype package here with 155 classes and 279 tests
covering 93 % of the classes and 80 % of the lines. I can already read
all of the TrueType metrics and OpenType kerning info. I have a class of
every entity of the OpenType spec and a Reader for every such class.
That means you can test reading every substructure alone. I think that
this is a really nice API for reading OpenType files.

So now as I saw what TTFSubSetFile really has to do, I will start adding
write support for OpenType files. Than I will write some manipulation
routine which can build a subset of a file. But I don't like so get the
glyph mapping info for this manipulation from a MultiByteFont which
should be really immutable.

I found it sufficient to write a KerningMapBuilder which stuffs kerning
pairs into a really nice double nested Map construction. As the comment
on CustomFont#replaceKerningMap says:

the kerning map (Map<Integer, Map<Integer, Integer>,the integers are character codes)


Such a high specialized, self explaining, problem-oriented data
structure is spread all over the font system. Know your tools!

So where to start?

Best Regards
Alex

Re: Writing PDF Documents and other source code parts

Reply via email to