Koutsoulis Philippe created TIKA-1138: -----------------------------------------
Summary: I got empty body and empty title with some documents Key: TIKA-1138 URL: https://issues.apache.org/jira/browse/TIKA-1138 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.3 Environment: Windows 7 (my desktop) Reporter: Koutsoulis Philippe *+Tested version:+* Apache Tika 1.3 (with the Apache Tika GUI) Hi all, I have empty body and empty title with some documents. Do you have an idea? *+Extract from my "Structured Text"+* {noformat} <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> ... <title/> </head> <body/></html> {noformat} *+Files to reproduce+* [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls] [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls] [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls] [http://top1000.anthologeek.net/participants.current.txt] [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls] [http://www.rad.fr/solupro.xls] [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls] [http://www.gregdonner.org/workbench/wb_31rev.txt] (i) No error in logs :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira