On 2011-12-04, Steve Litt wrote:
> Hi all,

> I'm making this thread in hopes that everyone who knows something on 
> the subject will add something, and at the end we'll all know how to 
> go from LyX to Kindle.

> I'm quickly coming to the conclusion that the best way to do it is to 
> write a post-processor for Alex's HTML. Simple, modular, and I can do 
> it myself (and obviously make it free software). The post-processor 
> might need to read the LyX file itself to get a few bits of information 
> not in Alex's HTML file.

> So far I've discovered that LyX pagebreaks don't translate to Kindle 
> page breaks (start at top of reader). Therefore, to every <h1> should 
> be added the property style="page-break-before: always;". That way 
> every chapter starts at the top of the reader.

I think this could/should be done better in a global CSS rule.

> I assume the official interpreter of the LyX project is Python, and if 
> that assumption's true I'll make the postprocessor in Python. This 
> post processing will be a heck of a lot easier if someone can point me 
> to a good XML or HTML parser for Python, so I can look at nodes and 
> attributes instead of trying to parse tags. I've noticed Alex's HTML 
> appears very standard, with matching start and end tags and the like. 
> This should theoretically make my job easier. So anyone have any 
> suggestions for HTML or XML parsers for Python?

beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0
and the standard library xml.* submodules:

xml.dom
xml.dom.minidom
xml.dom.pulldom
xml.etree.ElementTree
xml.parsers.expat
xml.sax
xml.sax.handler
xml.sax.saxutils
xml.sax.xmlreader

> Other things I've noticed:

> 1) The LyX table of contents, when translated to HTML and then to 
> Kindle format, crashes the Kindle previewer, so it must be removed 
> from the HTML file and used to create an NCX TOC.

> 2) With the Kindle you probably don't want the title page, so there 
> should be a post-processor option to capture the author, title and 
> date, and then remove everything from the Title or Author 
> environmented text to the next <h1>.

> Hopefully this thread will serve as an accumulation of knowledge 
> resulting in a post-processor. The post processor may simply serve as 
> a stepping stone to a "real conversion". I've found that often the 
> best specification for the right solution is obtained by implementing 
> and evaluating a quick and dirty solution.

Keep up the work.

Günter

Reply via email to