On Wed, 11 May 2011, Alberto Barranco Ramón wrote:
We are looking for a .epub parser. We consider Tika at begining but we
realized making test that Tika doesn't parse for now .xhtml files. At this
moment just .html files are parsed. I saw a TODO in the source code at
EpubContentParser.java and it says :

/**
* Parser for EPUB OPS <code>*.html</code> files.
*
* For the time being, assume XHTML (TODO: DTBook)
*/

To me that comment says Tika only handles xhtml. (The important thing isn't the file extension, but what's in it)

What happens when you try giving the parser one of your xhtml epub files?

Nick

Reply via email to