https://bz.apache.org/bugzilla/show_bug.cgi?id=61354

--- Comment #3 from Tim Allison <[email protected]> ---
Karthik, Thank you for sharing a patch and triggering document!  PJ, thank you
for fixing this so quickly!

As a side note, Tika's experimental SAX parser for docx does extract
everything; and this is exactly one of the reasons that I added it -- so that
if we don't account for structural rareties(?), we'll still get the text.  With
our DOM model, we're looking for some specific things in specific places (see
also TIKA-1130).

Make no mistake, we need to fix our DOM parser when people find problems, and
I'm grateful that you opened this!

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to