Oh sorry, I thought I have sent to dev list, forwarding...

Luis

---------- Forwarded message ----------
From: Allison, Timothy B. <talli...@mitre.org>
Date: 2017-12-07 14:10 GMT-02:00
Subject: RE: Tika 1.17?
To: "lfcnas...@gmail.com" <lfcnas...@gmail.com>


Agreed.  Thank you!  Do you mind sharing this with the list?



*From:* Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
*Sent:* Thursday, December 7, 2017 10:26 AM
*To:* Allison, Timothy B. <talli...@mitre.org>
*Subject:* RE: Tika 1.17?



Hi Tim,



I don't think it is a blocker, maybe a minor regression, given we are much
better with 20x more fixed exceptions. I sent it just to let us be aware.
There are some few ~40 new exceptions with pdf, and 20x more fixed ones, so
my vote is to go for 1.17!



Luis





Em 7 de dez de 2017 11:47 AM, "Allison, Timothy B." <talli...@mitre.org>
escreveu:

Thank you, Luís!  Given where POI is in its dev cycle, should we go for a
release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this?
Should we revert to 3.17-beta1? (wait, we can't do this because of a bug
that prevents parsing of pptx in Solr)

Or is this grave enough to wait a few months before we release 1.17?

I found a zip/mime detection issue that we need to fix at the Tika level,
but that fix is trivial.


-----Original Message-----
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Wednesday, December 6, 2017 9:30 AM
To: dev@tika.apache.org
Subject: Re: Tika 1.17?

Hi Tim,

I've had a briefly look at exceptions folder, seems we are much better with
ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new
exceptions with ppt. I did not check the files to see if they are
corrupted, but some common tokens were lost. Below the most common new
stacktrace:

org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the
class for type with id 1000 on class class org.apache.poi.hslf.record.Document
:
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(
HSLFSlideShowImpl.java:279)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(
HSLFSlideShowImpl.java:260)
at
org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(
HSLFSlideShowImpl.java:166)
at
org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181)
at
org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:78)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(
RecursiveParserWrapper.java:158)
at
org.apache.tika.batch.FileResourceConsumer.parse(
FileResourceConsumer.java:406)
at
org.apache.tika.batch.fs.RecursiveParserWrapperFSConsum
er.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at
org.apache.tika.batch.FileResourceConsumer._processFileResource(
FileResourceConsumer.java:181)
at
org.apache.tika.batch.FileResourceConsumer.call(
FileResourceConsumer.java:115)
at
org.apache.tika.batch.FileResourceConsumer.call(
FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 25 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 1010 on class class
org.apache.poi.hslf.record.Environment :
java.lang.reflect.InvocationTargetException
Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Document.<init>(Document.java:133)
... 29 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 31 more
Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't
instantiate the class for type with id 2005 on class class
org.apache.poi.hslf.record.FontCollection :
java.lang.reflect.InvocationTargetException
Cause was : java.lang.IllegalArgumentException: typeface can't be null nor
empty at org.apache.poi.hslf.record.Record.createRecordForType(
Record.java:186)
at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129)
at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182)
... 37 more
Caused by: java.lang.IllegalArgumentException: typeface can't be null nor
empty at
org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface(
HSLFFontInfo.java:129)
at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74)
at org.apache.poi.hslf.record.FontCollection.<init>(FontCollection.java:47)
... 41 more


2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <talli...@mitre.org>:

> Reports are here:
>
> http://162.242.228.174/reports/reports_Tika1_16V1_17.zip
>
> I haven't had a chance to look.  Tomorrow...
>
> Let me know what you find.
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Wednesday, November 29, 2017 1:08 PM
> To: dev@tika.apache.org
> Subject: RE: Tika 1.17?
>
> +1
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattm...@apache.org]
> Sent: Wednesday, November 29, 2017 12:57 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.17?
>
> Thanks so much for fixing this. It worked during MEMEX and then I
> think has since fallen out of date and perhaps I committed Zarana’s
> code wrong or something. Will be great to get this working!
>
>
>
> On 11/29/17, 9:54 AM, "David Meikle" <loo...@gmail.com> wrote:
>
>     I am thinking TIKA-2385. I've got a resized image that I can
> commit tonight
>     that should close this one off.
>
>     Cheers,
>     Dave
>
>
>     On 29 Nov 2017 14:42, "Allison, Timothy B." <talli...@mitre.org>
> wrote:
>
>     Many thanks to Bob for help on TIKA-2502!
>
>     Anything else we want to put into 1.17 before I run the regression
> tests?
>
>     -----Original Message-----
>     From: Allison, Timothy B. [mailto:talli...@mitre.org]
>     Sent: Monday, November 13, 2017 1:42 PM
>     To: dev@tika.apache.org
>     Subject: RE: Tika 1.17?
>
>     Y.  You're right.  Thank you!
>
>      I think I've been avoiding that because there were some regressions
in
>     metadata-extractor last I looked at this.  Let's hope those are gone
in
>     2.10.1.
>
>     -----Original Message-----
>     From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org]
>     Sent: Sunday, November 12, 2017 2:54 PM
>     To: dev@tika.apache.org
>     Subject: RE: Tika 1.17?
>
>     TIKA-2486 might be worth blocking on since there is a CVE.
>
>     Tyler
>
>     On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <talli...@mitre.org>
> wrote:
>
>     > Y.  I'm happy enough  to wait a few more days.  I wasn't able to
kick
>     > off the regression tests last week.  Should I wait for the new
> parsers
>     > to run the regression tests?
>     >
>     > -----Original Message-----
>     > From: David Meikle [mailto:loo...@gmail.com]
>     > Sent: Friday, November 3, 2017 7:42 PM
>     > To: dev@tika.apache.org
>     > Subject: Re: Tika 1.17?
>     >
>     > Sounds good. I have a couple of new parsers I would like to slot in
>     > but not had a chance the last few months. Will go for it over the
>     > weekend, if that works for you Tim.
>     >
>     > Cheers,
>     > Dave
>     >
>     >
>     >
>     > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) <
>     > chris.a.mattm...@jpl.nasa.gov> wrote:
>     >
>     > > Let’s make it so (
>     > >
>     > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     > > Chris Mattmann, Ph.D.
>     > > Principal Data Scientist, Engineering Administrative Office (3010)
>     > > Manager, NSF & Open Source Projects Formulation and Development
>     > > Offices
>     > > (8212)
>     > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>     > > Office: 180-503E, Mailstop: 180-503
>     > > Email: chris.a.mattm...@nasa.gov
>     > > WWW:  http://sunset.usc.edu/~mattmann/
>     > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     > > Director, Information Retrieval and Data Science Group (IRDS)
>     > > Adjunct Associate Professor, Computer Science Department
University
>     > > of Southern California, Los Angeles, CA 90089 USA
>     > > WWW: http://irds.usc.edu/
>     > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     > >
>     > >
>     > >
>     > > On 11/3/17, 7:35 AM, "Allison, Timothy B."
> <talli...@mitre.org>
> wrote:
>     > >
>     > >     All,
>     > >
>     > >     PDFBox 2.0.8 is now integrated.  I want to fix TIKA-2490
before
>     > > we release 1.17.  Are there other issues that are blockers or
you'd
>     > > like to fix before 1.17 (TIKA-2471, maybe?)?
>     > >
>     > >     I plan to run initial large scale regression tests shortly for
>     > > rfc822 and mbox because of TIKA-2478.  I'll run the full
regression
>     > > tests before cutting the RC, but I want to focus on those for now.
>     Other requests?
>     > >
>     > >     Cheers,
>     > >
>     > >                 Tim
>     > >
>     > >
>     > >
>     >
>
>
>
>

Reply via email to