On 04.12.2008 23:25:57 Andreas Delmelle wrote:
> On 04 Dec 2008, at 23:05, egibler wrote:
> 
> >
> > I think we uncovered the culprit!
> >
> > The files we received in the 0.2 version used an 'OCRAExtended'  
> > font, which
> > was TrueType and Encoding: Ansi.
> >
> > The files we receive now with 0.95 have 'OCRAExtended', which is  
> > TrueType
> > (CID) and Encoding: Entity-H.
> >
> > From what I've read, the CID font is basically for foreign languages  
> > with
> > huge character sets.  Is this correct?
> 
> Basically, yes.

But not just that. It doesn't really take an "exotic" language for that
CID fonts to become interesting.

> > Is there a reason that the guys who
> > make the PDF need this CID font in version 0.95.  The only  
> > characters used
> > are numbers (0-9).
> 
> I'm not sure. I'll leave the detailed explanation to those who are  
> more familiar with that part of the code (if they decide to chime in),  
> but my best guess would be that the on-the-fly metrics generation  
> (introduced in later versions) somehow defaults to CID font-metrics (?)

Yes, we do CID fonts for TrueType fonts by default because that's the
most versatile approach. I was not aware that this can slow down
Acrobat's PDF concatenation this much. If Adobe had chosen to simply
embed the font multiple times (as probably other tools would), the whole
thing would be much faster. But the PDF file would obviously be bigger
in the end.

The thing with this choice is that we currently cannot switch from
single byte handling to multi byte handling when necessary. The
architecture of the font subsystem doesn't currently allow that. So we
have to make a decision beforehand which approach to take. And currently,
if you don't use an XML font metrics file, the choice is hard-coded to
CID fonts. Maybe that needs to be made configurable at least. I've
recorded an RFE in Bugzilla so this doesn't get forgotten:
https://issues.apache.org/bugzilla/show_bug.cgi?id=46348

> > I swapped the font out in my samples, and combining 200 one-page pdfs
> > dropped from 30+ minutes to a few seconds.  This would make all the
> > difference to us, if it is doable from the developers' perspective.
> >
> > What do you think?
> 
> Makes sense that this is such a drain. Using CID fonts means Acrobat  
> has to merge the glyph tables from 200 documents (and possibly change  
> references in the documents themselves too, accordingly).
> 
> If my above assumption is correct, then the issue could be alleviated  
> on the producing end. Even if no longer necessary, it is still  
> possible to manually generate a font-metrics file for the given font  
> in Ansi encoding, and reference that in the config file.

Right, that's one work-around: an XML font metrics file generated with
-ansi. The other option, since you're talking about an OCR font: you
could switch to the Type 1 variant in which case you're automatically in
the single byte realm and the problem would not appear.

> If they do not immediately know how, refer them to: 
> http://xmlgraphics.apache.org/fop/0.95/fonts.html
> 
> 
> HTH!
> 
> Cheers
> 
> Andreas
> 


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to