Hi Greg,

On 2024-03-09 15:34, Greg Reagle wrote:
I have an epub ebook.  It is a novel, but when I get this process working, I 
want to repeat it for any epub ebook.

I want to read it, with formatting (such as underline or italics), with less.  
I am happy to use any software that exists in the process, but I MUST use less 
in the end to read it.  The terminal emulators that I use are usually st, 
xterm, and termux.  All of them are capable of colored text and underlining and 
so forth, and I want to take advantage of this.

Pandoc does a very good job converting epub to html, and it looks good with 
w3m, however when I use w3m in a pipe, the output is truly *plain* text, 
meaning there are no escape codes for formatting.  Same story with elinks.  Is 
it possible to get either of these programs, or some other program, to dump 
html to text *with* escape codes?

Since I could not get HTML to work, I went with man format.  Amazing.  Pandoc 
automatically chooses man format for output based on the '.1' extension in the 
followingv
     pandoc --standalone -o City_of_Truth-Morrow.1 City_of_Truth-Morrow.epub
Remember to use standalone option or it won't work.  Then
     man --local-file --pager 'less -ir' City_of_Truth-Morrow.1
It looks great!  (for text only on a terminal)  It has bold and underlined 
text.  From there I can use less 's' command to save the formatted text to a 
file.

There might be a better or more direct way of achieving this goal, but this I 
what I figured out for now.  And the rationale is this:  I already know and 
love less.  There is no good reason for me to learn the user interface of a 
different program like an epub reader or an html reader to read a book that 
does not have graphics, diagrams, pictures, and/or custom formatting.

Just modify your workflow slightly and you are good:

Option 1: use w3m

pandoc -s -t html City_of_Truth-Morrow.epub | w3m -T text/html

Option 2: use man/less

pandoc -t man City_of_Truth-Morrow.epub | man -l -

Option 3, save as html for future use:

pandoc -s  -o City_of_Truth-Morrow.html City_of_Truth-Morrow.epub

Saves your epub to html. Whenever you want to view it, use your favorite browser, i.e. w3m, with all its features.

Option 4: save as man:

pandoc -s -t man -o City_of_Truth-Morrow.man City_of_Truth-Morrow.epub

Whenever you view it, use: man -l City_of_Truth-Morrow.man

- - -

Some notes:

The reason you loose formatting when saving from less(1) or w3m is, that these programs on purpose do not save the terminal control characters which are doing the markup. Line breaks and terminal control are created on demand, depending on the type and size of the terminal (window) and will display different (weird) when any of this is different from the terminal you (would have) saved them to a file.

The -s option (--standalone) option for Pandoc is not required for man page output. For html (and other formats) pandoc outputs only the <body> content, the -s options wraps this into a complete <html> document.

Best Regards,


  Georg


Reply via email to