Re: latex2html: Two smallish bugs in V98.2beta8

Ross Moore Fri, 12 Feb 1999 20:46:57 -0500
> 
> 
> Okay, I've got a suggestion for a fix for some of the problems I
> mentioned earlier. Please bear with me through the length of this
> email --- the actual problems are really fairly minor. My final
> recommendation is to include all the fixes Ross Moore suggested
> earlier, and add an asterisk or two!
> 

Thanks Michael.
Here I'll explain how the code works...

...perhaps not perfectly.  ;-)


> > 
> >       # Move all LaTeX comments into a local list
> > ! #    s/([ \t]*(^|[^\\]))(%.*)\n[ \t]*/print "%";
> > !     s/([ \t]*(^|[^\\]))(%.*(\n[ \t]*|$))/print "%";

This finds % characters, not preceded by a \ (i.e. \%)
as possible candidates for comments.
If it is a comment, then everything up to the following \n 
constitutes the comment. White space beginning the *next* line
is then ignored by TeX; i.e. spaces and tabs, and other unknown
characters which we assume do not actually occur in our nice
clean LaTeX source --- 'cos we checked it in LaTeX, right ?

That is what the top (commented) line did. But here's the rub...

  What if this part of the source is meant to be within a verbatim-like environment ?
 At this point in the processing we are not yet processing environments,
 so the stuff that might be discarded needs to be saved, in case it turns out
 later to be needed. Hence the replacement action for the pattern-match is...

> >         $comments{++$global{'verbatim_counter'}} = "$3";
> >         &write_mydb("verbatim", $global{'verbatim_counter'}, $3);
> >         "$1$comment_mark".$global{'verbatim_counter'}."\n"/mge;
> > 

This saves the comment + subsequent spaces; In fact it is saved twice,
once locally, and again globally in a database.
 A marker is placed, followed by a unique identifying number,
finishing with a \n, which serves as a delimiter, and also to prevent
multiple *commented* lines concatenating to produce a single huge line,
potentially overflowing string buffers, etc.
 (Yes, that overflow has been observed, during earlier development.)

OK, so my first suggested fix included the \n , and subsequent space,
with the saved comments --- clearly necessary.
It also copes with the possibility of a % on the final line of the source,
where there need be no following \n , just the end-of-file.



OK, now. Here we are processing a environment, and encounter the previously
inserted comment-markers.
Being inside a verbatim-like environment, the comments have to be put
back into place:

> >             # re-insert comments
> > !           $contents =~ s/$comment_mark(\d+)\n/$comments{$1}/g;
> >   #         $contents =~ s/$comment_mark(\d+)/$verbatim{$1}/g;
> > 

Note how my suggested fix now matches and discards the \n that is supposed to
occur after the numbered comment-marker.



The same code occurs again, in the situation of an \input or \include
command having occurred within the environment, so that LaTeX2HTML
has split the total input source into several pieces, stored temporarily
in different files --- just a technical gizmo that used to be necessary for
improved performance.

> >         } else {
> >             print "Cannot find \\end{$env}\n";
> > !           $after =~ s/$comment_mark(\d+)\n/$comments{$1}/g;
> >   #         $after =~ s/$comment_mark(\d+)/$verbatim{$1}/g;
> >             if ($env =~ /rawhtml|$keepcomments_rx/i) {
> >                   $after = &revert_to_raw_tex($contents);
> > ***************
> 
> This fixes the alignment problem, but creates a new problem, described
> in Bug 1a in the file above. While not understanding all of the logic
> involved in the least, I guessed and was able to fix Bug 1a by adding
> an asterisk as follows:


> >             # re-insert comments
> > !           $contents =~ s/$comment_mark(\d+)\n*/$comments{$1}/g;
>                                                 ^^^ - here
 
Strange why this * is needed. The \n is supposed to always be there.
Also, the * may match too many \n s, resulting in asome needed ones being
discarded. Instead, I'd suggest:

    $contents =~ s/$comment_mark(\d+)\n?/$comments{$1}/g;
                                      ^^^ --- matches at most 1 occurrence.

However I don't yet understand how the \n could get lost !

Oh-oh, maybe it gets swallowed with the  \end{verbatim} command,
because generally we want:

    ... some text....\end{<environment>}
and
   ...some text....
   \end{<environment>}

to give identical results.




> You might also need one as follows, I don't know:
> 
> >             print "Cannot find \\end{$env}\n";
> > !           $after =~ s/$comment_mark(\d+)\n*/$comments{$1}/g;
>                                              ^^^ - here

Yes, but use ? not * .


> There still seems to be an extra vertical space (carriage return) in
> the output after a single line verbatim environment (as in Bug 1a),
> but that doesn't bother me a whole lot. I imagine that Ross might be
> able to fix that too.
 
That is probably a browser thing, concerning how it displays:

  ...some text...</PRE>

and 

   ...some text ...
   </PRE>


Are these the same ?
Or do we get an extra line of vertical space ?

Should LaTeX2HTML produce the former, or the (more readable in the HTML file) latter?



> Now, on to Bug 2. It is shown by this file:
> 
....
>
> Ross suggested this fix:
> 
> > Yes, add the missing '.' in the main latex2html script, as follows:
> >  
> > sub do_cmd_the_appendix {
> >     local($val,$level) = (0,@_[0]);
> >     if ($level == 3) { $val=$global{'section'} }
> >     elsif ($level == 2) { $val=$global{'chapter'} }
> >     join('', &fAlph($val), '.', @_[1]);
> >                          ^^^^^__________ here !!!
> > }
> 
> I *incorrectly* reported that this didn't fix the problem. In reality,
> it *does* fix the problem.
 
Ah, that's nice to hear.  ;-)
The only explanation that I could dream up for this not working
was way to complicated to be easily fixable.  ;-)


> So, Ross's fixes should be incorporated in the new version (they're
> not currently there), along with my asterisk suggestion (or some
                                      ^^^^^^^^--- question-mark

> better fix to Bug 1a by someone who knows more about it).


> BTW, is it time to freeze L2H V98.2 and let new updates modify a
> V99.1 ? Or maybe we should just call the latest version 99.1 ?
 
I support this idea.
With the next set of patches that I, or anyone else makes,
we should change the name to V99.1 (beta) . 

Sorry the documentation isn't ready yet for the frames feature.
Uli Wortmann has been testing this extensively and has offered
a few useful modifications and minor fixes.

If this is done by the end-of-this-month,
we can probably still make it for the TeX-Live4 CD-ROM.

It's doubtful that we can have an executable version that can be run
directly off the CD --- it'll still require installation first.
 Marek, any comments ?




Hope this helps,

        Ross Moore
Re: latex2html: Two smallish bugs in V98.2beta8

Reply via email to