I finally had a chance to look through the results of the first regression run.
I made a few trivial changes to our parsers and to tika-eval.
We appear to have many more exceptions in files parsed by our CompressorParser,
but this is because of reporting...not because of reality -- the exception is
now coming in the container file, not an attachment...and tika-eval wasn't
matching A and B correctly.
There is a regression that's been fixed in PDFBox trunk (PDFBOX-3717), but I
don't see that as a blocker.
We have new exceptions in the new parsers, EMF, WMF, .xlsb, wordperfect, but
that's because we're actually parsing those now. :)
All else looks to be in decent shape.
Chris and Team and All,
Let me know when you're ready for me to kick off the next regression run.
Cheers,
Tim
-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:[email protected]]
Sent: Wednesday, April 26, 2017 12:48 PM
To: [email protected]
Subject: Re: Tika 1.15
Thank you!
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF
& Open Source Projects Formulation and Development Offices (8212) NASA Jet
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate
Professor, Computer Science Department University of Southern California, Los
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 4/26/17, 9:35 AM, "Allison, Timothy B." <[email protected]> wrote:
Oh. Ok. Will wait, then?
-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:[email protected]]
Sent: Wednesday, April 26, 2017 11:38 AM
To: [email protected]
Subject: Re: Tika 1.15
I want to see if I can get in the VideoRecognition parser, and also the
Sentiment one.
I hope to get it done in the next day or so. Thanks.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager,
NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct
Associate Professor, Computer Science Department University of Southern
California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 4/26/17, 7:54 AM, "Allison, Timothy B." <[email protected]> wrote:
With the added TSD parser, I think I should rerun the regression
testing. Given that, I also fixed 2099, and we'll benefit from a rerun.
Anything else before I rerun the regression testing?
Any problems observed in first run?