Re: latex2html: A Ghostscript question

Ross Moore Mon, 15 Jun 1998 21:49:47 -0400

> 
> My fist solution was to use emtex's drivers to generate a CROPPED PCX file 
>(dvidrv+dvidot with
> om+ option), then using netpbm
> to go from PCX to PPM, PGM, PBM. This works fine but I have to think of people not 
>having emtex
> (may working on UNIX).
> 
> A solution has been proposes by Emmanuel BIGLER (Paris, Jussieu) using DVIPS and then
> Ghostscript with the ppmraw output device.

This is exactly the  technique that LaTeX2HTML has been using for years.
It has been refined  par excellence  in the current latex2html/pstoimg
combination.

Marek has already mentioned the -E option to  dvips  which creates an .eps
file for the *correct* size. This is a over-simplification of what -E actually
does. For the files created by LaTeX2HTML, this is not a vital part of the
processing done by  pstoimg .

The problem is that dvips calculates its bounding box using just the position
and character size information that TeX places into the .dvi file;
information read from the .tfm files.

This does *not* take into account

 A. the sizes of graphics placed using \special commands;

 B. characters whose glyphs extend outside the .tfm size
        e.g. due to the slant in italics
             rounded bottoms that drop slightly below the baseline
             serifs extending to the left or right
             etc.

 C. the need to often have some extra white space above/below or left/right
    of the characters appearing in the image.

 D. the crunch case for C. is that HTML only allows alignments to the
    top/middle/bottom of an image.
    Thus it is necessary to expand a non-zero depth to equal the height
    so that middle alignment can be used.

The way to overcome these difficulties is to place rules, \vrule and \hrule
along with the stuff you really want, in order to define a rectangle which
encloses everything that you want in the image.
Having recognised this, and tinkered with TeX code to make it work,
the decision is whether to make these rules infinitely thin
or to leave then with a finite thickness, to be cropped off later.


If infinitely thin, then dvips -E would work, but you still need to give
Ghostscript a -g option (which has to be integer x integer ).
However typically there is a non-integer scale-factor.
The PPM bitmap may still need to be cropped by a few pixels to get it
exactly correct
--- but there is no way to determine what this should be,
since the cropping-bar information is now lost.

If a finite thickness is used, then this is included in the PPM image
created by Ghostscript. A strategy is needed to crop it off later.

When the cropping fails, then there is visual feedback that a problem exists.
It can usually be fixed by adding a little extra space at the left or right
of the text being imaged.
Sometimes space is needed above --- this is harder to fix.
If the problem is repeatable, then latex2html can be coded to recognise the
situation and add this extra space automatically.
This is precisely what I've been working on recently for fonts which
generate traditional Indic language letters and syllables.



> The works on big computers, but it appears that the generated PBM file is huge 
>(resolution
> needed is 360DPI), so that PNMCROP
> runs... out of memory on my Win95/DOS window. A big problem.

Well if you need that resolution you have to expect large images.
This is true no matter what system you work on: Mac/PC/Unix
As you well know, bit-mapped graphics grow as the square of the resolution.


If PNMCROP is the only tool that has problems with the image size,
then it is quite simple to

  1. make the cropping-bars of zero thickness  (2 edits required)
  2. suppress the size estimate that latex2html passes to  pstoimg
        via a -geometry switch   (4 edits required)
  3. ensure that  dvips uses the  -E  option, if supported on your system.
        ( 0 or 1 edit required)

Try this, and report whether it solves your problem and gives satisfactory
images. It should, since the cropping-bar strategy is still being used.
You may end up with letters raised slightly, rather than sitting square
on the base-line.

 
> Looking at pstoimg I find that it calls GS with the following (obcure) options:
> 
>   -  -g75x98   => quat does this mean ("man gs" is not very explicit) and which are 
>the units of
> numbers 75 and 98 (or other numbers in -g option)

Geometry in points. This gets scaled by resolution/72  to get number of pixels,
them multiplied by the color depth to determine how much memory is required
to hold the final bitmap.

>   -  -dTextAlphaBits=4  => meaning?
 
Allow 4 bits for the color of text pixels.
i.e. turn on 'anti-aliasing' and use 16 = 2^4 levels of color (usually gray-shades)

>   -  GS>-67 -739 translate  => I undestand these are limite to the part to be 
>translated, but
> which are the units (pixels, inches, ?)?
 
Translates the image to the bottom-left corner of the page.
 LaTeX2HTML/LaTeX/dvips  puts it near the top-left, with margin.


> Can somebody explain that (or give me a pointer to a detailed description of 
>GhostScript
> specifications)?
 
It all comes with Ghostscript. There is a lot of it.


> Lasr question: is there a way of telling GS to produce an output ALREADY cropped to 
>non the
> blank rectangle of the image, in the same way as done by dviscr+dvidot?


Nope, not easily.
If dviscr/dvidot get this right, it is because they generate a larger image
and crop to the smallest rectangle containing ink.
This is not necessarily good enough for LaTeX2HTML, since you frequently want
to retain some white space.

The same thing could probably be done in Ghostscript using a tricky
PostScript program  --- but it isn't obvious that you would get what is
really needed, in all cases.


Hopefully this explains all the issues,
and gives you ideas on how to proceed to meet your requirements.



All the best,


        Ross Moore
Re: latex2html: A Ghostscript question

Reply via email to