I'm evaluating elinks as a candidate component for a toolchain in which I have to translate html to plain text, obeying some housestyle rules. [Namely, in the production of ebooks for Project Gutenberg]. For instance, I have to render all <strong>tags</strong> into *tags*, to add four blank lines before any <h2> and two after, white-space indentations in some places, changing <span dir="rtl"></span> with unicode directionals, and so on.

Elinks provides me a with much better solution than other html-to-text tools (I've assessed w3m, links2, lynx, html2text.py, netrik), because it honors to some extent the css. However, I have some questions about what is supported and what not, and to which extent renderings could be customized, if possible without hacking the source code.

Some of the transformations I need to do toward the required text rendering, could in fact be done by preliminary regexp substitutions in the source html, and/or subsequent substitutions in the result. However, the toolchain would be a lot more streamlined if I could add some lines to my css, and let elinks work. For instance,

    strong:before { content: '*'; }
    strong:after { content: '*'; }

would take care of what exemplified above.

I'm typically using elinks 0.12pre5 with

    elinks -dump-width 80 -no-numbering -no-references -dump 1 $1.html > $1.txt

The first batch of questions which comes up is:

-are :strong and :after css selectors honored? [not as I can see; could they?]

-are margin, padding and text-indent properties honored in some way, by adding an appropriate number of blanks? [ditto]

-why -dump produces a text with 4 blanks at the beginning of each line, and can this be changed?

-how can I change the number of blank lines before and after <h1>,<h2>,...,?

-when elinks dumps, is it using a particular @media selector?

-Is the unstable snapshot more advanced in any respect relevant to me?

Thanks in advance for any hint and for providing this nice piece of software.

Enrico
_______________________________________________
elinks-users mailing list
elinks-users@linuxfromscratch.org
http://linuxfromscratch.org/mailman/listinfo/elinks-users

Reply via email to