To close the loop and share my gratitude publicly... Thank you, Dominik, for transferring 41k, 5GB of docx/dotx to our regression corpus!
I’ve already found a number of “areas for improvement” in Tika's experimental
docx SAX parser, and a few areas for improvement in POI's XWPFDocument/DOM
parser…all thanks to your documents and your common crawl code.
Thank you!
Cheers,
Tim
