Sounds good. W00t! -----Original Message----- From: Chris Mattmann [mailto:[email protected]] Sent: Monday, May 1, 2017 4:57 PM To: [email protected] Subject: Re: Tika 1.15
Thanks Tim. I am going to try and get tika-dl added (if possible), and also try the Sentiment Parser next. If I can get one or both of those (in the next day or so), then I will give you the heads up to begin testing. Video recognition is in! On 5/1/17, 12:42 PM, "Allison, Timothy B." <[email protected]> wrote: I finally had a chance to look through the results of the first regression run. I made a few trivial changes to our parsers and to tika-eval. We appear to have many more exceptions in files parsed by our CompressorParser, but this is because of reporting...not because of reality -- the exception is now coming in the container file, not an attachment...and tika-eval wasn't matching A and B correctly. There is a regression that's been fixed in PDFBox trunk (PDFBOX-3717), but I don't see that as a blocker. We have new exceptions in the new parsers, EMF, WMF, .xlsb, wordperfect, but that's because we're actually parsing those now. :) All else looks to be in decent shape. Chris and Team and All, Let me know when you're ready for me to kick off the next regression run. Cheers, Tim -----Original Message----- From: Mattmann, Chris A (3010) [mailto:[email protected]] Sent: Wednesday, April 26, 2017 12:48 PM To: [email protected] Subject: Re: Tika 1.15 Thank you! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-503 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 4/26/17, 9:35 AM, "Allison, Timothy B." <[email protected]> wrote: Oh. Ok. Will wait, then? -----Original Message----- From: Mattmann, Chris A (3010) [mailto:[email protected]] Sent: Wednesday, April 26, 2017 11:38 AM To: [email protected] Subject: Re: Tika 1.15 I want to see if I can get in the VideoRecognition parser, and also the Sentiment one. I hope to get it done in the next day or so. Thanks. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 180-503E, Mailstop: 180-503 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 4/26/17, 7:54 AM, "Allison, Timothy B." <[email protected]> wrote: With the added TSD parser, I think I should rerun the regression testing. Given that, I also fixed 2099, and we'll benefit from a rerun. Anything else before I rerun the regression testing? Any problems observed in first run?
