Here's a Linux (ubuntu) system administration question for you. What
is the best html -> latex converter?  To help LyX users access the
htmltolatex program,what is the best approach?

Explanation:

I never tried to import an html file until yesterday, and I got the
error from LyX indicating that java could not find htmltolatex.jar.
While tracking that down, I was surprised that preferences for lyx
converters assumed I had htmltolatex installed.  The LyX preference
was set "java htmltolatex.jar --input $$i --output $$o".

I don't know why it is set that way!  I don't have htmltolatex.jar
installed, obviously it should fail.

I'm reading the LyX configure script, and I see it checks for  3
possible converters, "html2latex",  then "gnuhtml2latex", and then
htmltolatex.

Are these supposed to be in order of quality?  I tried GNUhtml2latex
first because there is an Ubuntu package for it. The imported file
didn't look great in LyX.

It appears to me that html2latex is a perl script and the homepage for
it was last edited in 1998.

Is htmltolatex better?  Its web page is more up-to-date. Either that
means it is not done yet or that it has great new features :)

The sourceforge page for htmltolatex is
http://htmltolatex.sourceforge.net.  The download link the points to a
tarball that does not have system administrator information.  How is a
person supposed to install this?

$ ls
build.xml  config.xml  gpl.txt      htmltolatex.jar  LICENCE.txt
README.txt  src
classes    config.xsd  htmltolatex  javadoc          manual       samples

The file "htmltolatex" is a shell script that calls java on the
indicated file, and it accesses htmltolatex.jar.  Here's what I mean:

=======================

$ cat htmltolatex
#!/bin/sh

if [ $# -lt 1 ]; then
        echo "Usage: $0 -input <input-HTML-file> -output
<output-LaTeX-file> [-css <css-file-assigned-to-input file>] [-config
<configuration-file>]"
        exit 1
fi


java -jar htmltolatex.jar $@


======================

I tested the C programmer approach. Put htmltolatex script in the path
somewhere, and put the htmltolatex.jar file somewhere like
/usr/share/htmltolatex, and then edit the htmltolatex script to adjust
for the path? :

java -jar /usr/share/htmltolatex/htmltolatex.jar $@

I tried that approach and it failed because it can't find other files it wants.

Error: Cannot convert file
----------------------------------------
An error occurred whilst running htmltolatex -input 'News.html'
-output 'News.tex'
Fatal error: Can't load configuration.
/home/pauljohn/config.xml (No such file or directory)
Error: Cannot convert file
----------------------------------------
An error occurred whilst running htmltolatex -input 'New.html' -output 'New.tex'

Then I copied config.xml into /usr/share/htmltolatex and modified the
script by adding a -config file option.

java -jar /usr/share/htmltolatex/htmltolatex.jar $@ -config
/usr/share/htmltolatex/config.xml

Horray, it runs from LyX with no crash.  I have no way of knowing if
it will work in other test cases.   The LaTeX markup does include
images, that is encouraging. HTML enumerated and bullet lists do come
into LyX correctly. However, LyX can't compile the document. It
complains about an undefined option in this ERT:

\href{http://pj.freefaculty.org}{thing}

I see the LyX Document->settings->pdf properties menu has a hypreref
option, and once I enable that support, then the document will
compile.  That's awesome.

Its a little encouraging, but still troublesome.  Am I taking the best
approach?  It reminds me of a time about 5 years ago when I was trying
to generate HTML from LyX.  The default converters were tex4ht  or
latex2html or something like that, and we were debating about how to
configure those programs, and somebody spoke up "hevea" works much
better than either of those.

Anyway, I wonder if people who have wrestled with html -> latex will
speak up and let us know which html to latex converter works best, and
if it is the Java one htmltolatex, can we hear how you install that on
a multiuser system.

PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

Reply via email to