[
https://issues.apache.org/jira/browse/TIKA-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018310#comment-17018310
]
Tim Allison commented on TIKA-3027:
-----------------------------------
FBReader has no problem with at least one of these files.
> Consider using html parser instead of xml parser for epub contents
> ------------------------------------------------------------------
>
> Key: TIKA-3027
> URL: https://issues.apache.org/jira/browse/TIKA-3027
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: testEPUB_html.epub
>
>
> We have a good number of files in our regression set whose content "xhtml"
> files cause problems for the XML parser. Should we switch to the HTMLParser?
>
> To name a few:
> {noformat}
> commoncrawl3/6H/6HAGP5DFUKFYPUAUBPZ6NX54LUT6H5YO
> commoncrawl3/LR/LR53ZVY5VR4BILUK27LGKROTBMVQ4YMV
> commoncrawl3/Q4/Q4F2HATL7V5A6AYDJKZYNXV4AU6NXRMX
> commoncrawl3/7I/7I6CKCIX75V22UNG7YPUVL6O2F3WVUTF
> commoncrawl3/PF/PFYKV55F57N46PQJXAPZDEXCGJ54W26N
> commoncrawl3/QK/QKVFV2QCCPXCQT27ZKRTOTTA5PHLFLIE
> commoncrawl3/XB/XBUNGEOTNUBZ4EDHIEXRR5NW2PWF4WNN
> commoncrawl3/72/72CJJQCXYVNIBX6O2M2AEJOHUZJUK625 {noformat}
> I'm attaching a 6HA... renamed.
>
> The few that I've tried to open in iBooks cause errors in iBooks and don't
> open at all. Will try a few other readers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)