[Patch+RFC] UTF8 Postscript printing

Cyrille Chepelov Mon, 05 Feb 2001 11:31:15 -0800
Hi all,

   My previous DPS patch had a big TODO: item left: it was using the Adobe
encoding to display characters between 128 and 255. Well, here's a patch
that does not yet fix this, but does some infrastructure work.

   Basically, the "novelty" here is an object, PSUnicoder, which will build
custom encodings on the fly, and swap them if we use than 236 glyphs in the
same time. Each glyph can be from anywhere in the 16-bit Unicode space.

   Upside is that if everything follows the plan, we can display (print) almost
arbitrary characters. 
   Downside is that the strings in the Postscript output look like line
noise, and won't be easily processable by the current perl tools such as
ogonkify (but if practice follows theory, they shouldn't be needed anymore).

   Non-latin1 users' feedback would be very great ; as dia doesn't yet speak
utf8 internally, you'll have to compile with
-DEFAULT_8BIT_CHARSET=ISO-8859-n (where n is the appropriate value) to see
the effect.

   The patch here does two (for the moment) orthogonal things: on one side,
it uses Display Postscript on the screen (only if you configure with 
--enable-dps), but with the Adobe encoding, and on the other side, it prints
with "utf8" encoding(s) (if the libunicode library is found).  

   Also, if this patch doesn't build, because of missing dependencies
(gtkDPS or libunicode missing), I'd be very happy to hear about this.

Side note about a side effect:
------------------------------
        converting all of dia's sample directory to eps with the regular and
the new string engines gives interesting results:
muscat%cat sizes.txt
  36 demos-utf8/ER-demo.eps
  28 demos-utf8/SADT.eps
  28 demos-utf8/UML-demo.eps
  16 demos-utf8/chronograms.eps
  80 demos-utf8/grafcet.eps
  32 regular/ER-demo.eps
  28 regular/SADT.eps
  24 regular/UML-demo.eps
  20 regular/chronograms.eps
  56 regular/grafcet.eps

Newer files are a bit longer (this is expected, since we can't guess what
the encoding tables will be in advance, they can be rebuilt each time we
find new characters ; proper fix would be to have the main rendering loop do
a pre-rendering pass on all strings, to look at their encodings, and then do
the actual rendering) ; however, running 
  for i in */*.eps ; do
    echo -n "$i "
    /usr/bin/time --quiet -f "%Uuser %Ssystem %Eelapsed" --output="/tmp/foo" \
-- gs -dBATCH -dNOPAUSE -q $i
    cat /tmp/foo
  done
on them gives the following results:
demos-utf8/ER-demo.eps 2.78user 0.34system 0:06.77elapsed
demos-utf8/SADT.eps 2.27user 0.10system 0:03.16elapsed
demos-utf8/UML-demo.eps 2.38user 0.13system 0:03.38elapsed
demos-utf8/chronograms.eps 1.74user 0.12system 0:02.52elapsed
demos-utf8/grafcet.eps 2.91user 0.16system 0:03.83elapsed
regular/ER-demo.eps 6.66user 0.74system 0:10.33elapsed
regular/SADT.eps 6.05user 0.56system 0:08.48elapsed
regular/UML-demo.eps 6.25user 0.47system 0:08.99elapsed
regular/chronograms.eps 6.02user 0.37system 0:07.44elapsed
regular/grafcet.eps 6.34user 0.55system 0:09.42elapsed

that is, the new files run almost twice faster than the older files (this is
to be expected, as we don't load and re-encode all possible fonts ; rather,
we load things on-demand. Of course, extremely complex documents will loose
this advantage (and be probably much worse, thanks to encoding shuffling).


Comments ? Is it OK for me to commit this ?

        -- Cyrille

-- 
Grumpf.
[Patch+RFC] UTF8 Postscript printing

Reply via email to