I didn't find any showstoppers. Are we ready for Chris to roll 1.14-rc1?
Some notes: We're getting quite a few new attachments: 315k (mostly from newly recognized mbox, and MSOffice) New mimetypes: mbox, text/calendar, x-sh, vnd.djvu, dbf, and many more The upgraded copy of icu4j is misidentifying a handful of files as UTF-16[LB]E. We're missing a small amount of text from custom PPT templates (known issue) We're getting quite a few new exceptions for attachments that weren't formerly extracted. These are unknown embedded objects that are being misidentified as PSD, other image files or TTF. We're getting quite a few new exceptions for files that are now correctly identified as "x-ms-asx" because they contain invalid xml -----Original Message----- From: Allison, Timothy B. [mailto:[email protected]] Sent: Wednesday, September 28, 2016 1:34 PM To: [email protected] Subject: RE: Tika 1.14? All, I finished running the regression tests. I have just started going through the results. Reports are available here: https://github.com/tballison/share/blob/master/tika_comparisons/reports_1_14-trunk_vs_1_13.zip -----Original Message----- From: Chris Mattmann [mailto:[email protected]] Sent: Thursday, September 22, 2016 12:25 PM To: [email protected] Subject: Re: Tika 1.14? Sounds great to me Tim. If you tell me when the tests are done, I’d be happy to RC a release! On 9/21/16, 11:31 AM, "Allison, Timothy B." <[email protected]> wrote: All, PDFBox 2.0.3 is now integrated, I'm about to push the integration with POI-3.15. I have a few cleanup things I'd like to take care of. Any other items for 1.14? Should we aim for Mon 26th for final code changes for 1.14? I can run the regression tests, and then maybe we could cut the release candidate some time mid to end of next week? Best, Tim
