Wouldn't shock me to find out it is at the XML level - Word saving the same text in two different ways under the same parent would be completely within my jaded expectations.
On Thu, Aug 31, 2017 at 11:37 AM Allison, Timothy B. <[email protected]> wrote: > I ran the regression tests against docx, and I'm finding no problematic > new exceptions. We are extracting some new text in the phonetic/ruby runs > (great!). However, I am finding some duplication of content within > textboxes(? may be other sources ?). I need to figure out if this is at > the POI level or the Tika level. > > Reports are here: > http://162.242.228.174/reports/poi-3.17-rc2-docx.tar.gz > > -----Original Message----- > From: Allison, Timothy B. [mailto:[email protected]] > Sent: Wednesday, August 30, 2017 8:05 PM > To: POI Developers List <[email protected]> > Subject: RE: [VOTE] Apache POI 3.17 release (RC2) > > I’ll run regression tests at least against our .docx tonight to make sure > I didn’t wreck anything with 61470. > > >
