I finally got around to comparing the experimental SAX parser over on Tika with 
POI/DOM-based parser for docx on the 170k docx files we have.

http://162.242.228.174/reports/dom_vs_sax_docx.tar.gz

Fewer exceptions...more content.  Both are only slight, but overall, this looks 
promising.

Reply via email to