I finally got around to comparing the experimental SAX parser over on Tika with POI/DOM-based parser for docx on the 170k docx files we have.
http://162.242.228.174/reports/dom_vs_sax_docx.tar.gz Fewer exceptions...more content. Both are only slight, but overall, this looks promising.
