request for comments on EPUB exporting

Josh Hieronymus Fri, 30 Aug 2013 20:32:31 -0700

Hi everyone,

I'm working on exporting LyX documents to EPUB as part of my Google Summer
of Code project, and I'd like to invite you to try out my current
implementation, which can be found in the "epub/master" branch of the gsoc
repository (g...@git.lyx.org:gsoc.git). The export process begins by
exporting the document to XHTML via LyXHTML, then converting the XHTML to
EPUB with the scripts in lib/scripts/epub.


Right now, documents will successfully export to EPUB 2.0.1, with the
following caveats:
- Almost all metadata fields (author, book id, etc.) are filled in with
default values. Only the title field is taken from the XHTML file from
which the EPUB is converted.
- No intra-document navigation is implemented; the document is just one
long page.
- MathML isn't part of the EPUB 2.0.1 standard, so the document output
settings should be set to output math as images.

What I'd like to implement soon:
- Extracting other metadata fields from the document. The required fields
are language, title, and identifier. The title field is taken from the
document, but not  the language or the identifier. I'm taking the title
from the first paragraph to use the "title" inset, but there aren't
corresponding insets for the other elements, so I'm not sure of the best
way or ways to get the rest of the info. (There's an inset for author, but
the author name is needed in both reading order and "file-as" order, and
there's only one author inset.) One thought is to create custom insets, and
another is to ask for the information via the document settings.
- Intra-document navigation. In order to skip around within the document,
add bookmarks, etc., navigation information needs to be added to the
toc.ncx file within the EPUB archive. Which locations in the document
should be added to the list of navigable points is not obvious. First, I
read (here at http://www.gbenthien.net/Kindle%20and%20EPUB/ncx.html) that
some e-readers only work with at most one depth level--only parts, or only
chapters, or only sections, or whatever. I'm not sure whether this is
correct or not. Either way, we can't always assume what depth the user
wants in the table of contents--this is probably something we should ask.
It's probably easiest to pull the navigation info straight from the
document's table of contents, but I don't know if this info is available in
the exported XHTML file without appearing visibly.

What I'd like to implement at some point:
- optional conversion of images to SVG format
Note: Vector-based graphics scale better than raster-based graphics, making
them well-suited for electronic media.
Note: EPUB specifications require compliant e-readers to support SVG.
Note: Older versions of some browsers (primarily IE) don't support SVG.
Note: Preliminary searches turn up a package named dvisvgm (
http://www.ctan.org/pkg/dvisvgm) that converts DVI to SVG, and it's
licensed under the GPL v3 or later.
- ability to split large XHTML files into smaller ones
Note: Splitting large XHTML files should boost the performance of the
converted EPUB documents.
- allow selection of an image for front cover artwork
Note: Amazon requires JPEG or TIFF format for front cover artwork.

I'd love to hear any thoughts, comments, and suggestions you all have,
especially if you encounter any bugs or see something important I'm
overlooking.

Thanks,
Josh

request for comments on EPUB exporting

Reply via email to