https://bz.apache.org/bugzilla/show_bug.cgi?id=57847
Nick Burch <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO --- Comment #2 from Nick Burch <[email protected]> --- For slightly complicated reasons, we have two different .doc -> .html converters, one in the POI codebase (WordToHtmlConverter) and one in the Tika codebase (org.apache.tika.parser.microsoft.WordExtractor) If you could, it'd be great if you could try your same file with Apache Tika, and see if that manages to get the lists out. (Grab the tika-app jar and run it with --html for a quick way to check) If Apache Tika does it right, we can hopefully bring over the logic to the AbstractWordConverter family of converters. If not, we can look to fix it in both at the same time! -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
