On 2011-12-04, Steve Litt wrote: > Hi all, > I'm making this thread in hopes that everyone who knows something on > the subject will add something, and at the end we'll all know how to > go from LyX to Kindle.
> I'm quickly coming to the conclusion that the best way to do it is to > write a post-processor for Alex's HTML. Simple, modular, and I can do > it myself (and obviously make it free software). The post-processor > might need to read the LyX file itself to get a few bits of information > not in Alex's HTML file. > So far I've discovered that LyX pagebreaks don't translate to Kindle > page breaks (start at top of reader). Therefore, to every <h1> should > be added the property style="page-break-before: always;". That way > every chapter starts at the top of the reader. I think this could/should be done better in a global CSS rule. > I assume the official interpreter of the LyX project is Python, and if > that assumption's true I'll make the postprocessor in Python. This > post processing will be a heck of a lot easier if someone can point me > to a good XML or HTML parser for Python, so I can look at nodes and > attributes instead of trying to parse tags. I've noticed Alex's HTML > appears very standard, with matching start and end tags and the like. > This should theoretically make my job easier. So anyone have any > suggestions for HTML or XML parsers for Python? beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0 and the standard library xml.* submodules: xml.dom xml.dom.minidom xml.dom.pulldom xml.etree.ElementTree xml.parsers.expat xml.sax xml.sax.handler xml.sax.saxutils xml.sax.xmlreader > Other things I've noticed: > 1) The LyX table of contents, when translated to HTML and then to > Kindle format, crashes the Kindle previewer, so it must be removed > from the HTML file and used to create an NCX TOC. > 2) With the Kindle you probably don't want the title page, so there > should be a post-processor option to capture the author, title and > date, and then remove everything from the Title or Author > environmented text to the next <h1>. > Hopefully this thread will serve as an accumulation of knowledge > resulting in a post-processor. The post processor may simply serve as > a stepping stone to a "real conversion". I've found that often the > best specification for the right solution is obtained by implementing > and evaluating a quick and dirty solution. Keep up the work. Günter
