https://bz.apache.org/bugzilla/show_bug.cgi?id=61354
--- Comment #3 from Tim Allison <[email protected]> --- Karthik, Thank you for sharing a patch and triggering document! PJ, thank you for fixing this so quickly! As a side note, Tika's experimental SAX parser for docx does extract everything; and this is exactly one of the reasons that I added it -- so that if we don't account for structural rareties(?), we'll still get the text. With our DOM model, we're looking for some specific things in specific places (see also TIKA-1130). Make no mistake, we need to fix our DOM parser when people find problems, and I'm grateful that you opened this! -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
