Re: [dev] reading an epub book with less: adventures in text processing

2024-03-11 Thread Viktor Grigorov
Rather late to the party and I've already forgotten the initial email. Nevertheless, I'll give the program I most use: epub2txt.[0] It's not perfect, but compared to calibre's ebook-convert, and everything else I found in C in github or codeberg or gitlab, it's the best. A once-over with an

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-11 Thread Κρακ Άουτ
On 2024-03-11 17:44 Greg Reagle wrote: > Now my next question is, what is the tool that does the *best* job of > turning a PDF book into a readable text document? Via html or > docbook or markdown or whatever--doesn't matter. My previous > experience trying things out to achieve this goal is

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-11 Thread Greg Reagle
On Sat, Mar 9, 2024, at 1:15 PM, Greg Minshall wrote: > for some personal tastes/usage cases, this, using pandoc's `-t` > option, might be minor-ly simpler: > > man --local-file --pager 'less -ir' \ > <(pandoc --standalone -t man \ >

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-11 Thread Greg Reagle
On Sat, Mar 9, 2024, at 4:06 PM, Georg Lehner wrote: > Option 1: use w3m [snip] All great commands. Thank you. > The reason you loose formatting when saving from less(1) or w3m is, that > these programs on purpose do not save the terminal control characters > which are doing the markup. Line

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-11 Thread Greg Reagle
On Sat, Mar 9, 2024, at 11:33 AM, Hiltjo Posthuma wrote: > Maybe mupdf/mutools or the eGhostscript tools o qpdf? Yes, thank you for this excellent advice. I tried "mutool convert", but I am more satisfied with pandoc's output, for both text and html output (from epub).

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-09 Thread Georg Lehner
Hi Greg, On 2024-03-09 15:34, Greg Reagle wrote: I have an epub ebook. It is a novel, but when I get this process working, I want to repeat it for any epub ebook. I want to read it, with formatting (such as underline or italics), with less. I am happy to use any software that exists in the

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-09 Thread Greg Minshall
Greg, thanks for this! for some personal tastes/usage cases, this, using pandoc's `-t` option, might be minor-ly simpler: man --local-file --pager 'less -ir' \ <(pandoc --standalone -t man \ 2015.31233.Arab-Geographers-Knowledge-Of-Southern-India.epub) | less and,

Re: [dev] reading an epub book with less: adventures in text processing

2024-03-09 Thread Hiltjo Posthuma
On Sat, Mar 09, 2024 at 09:34:12AM -0500, Greg Reagle wrote: > I have an epub ebook. It is a novel, but when I get this process working, I > want to repeat it for any epub ebook. > > I want to read it, with formatting (such as underline or italics), with less. > I am happy to use any software

[dev] reading an epub book with less: adventures in text processing

2024-03-09 Thread Greg Reagle
I have an epub ebook. It is a novel, but when I get this process working, I want to repeat it for any epub ebook. I want to read it, with formatting (such as underline or italics), with less. I am happy to use any software that exists in the process, but I MUST use less in the end to read