I'm OK with trying the fix in 1.8 (or 1.7 if people feel strongly). As Nick just recommended, I'll try adding metadata extraction to Tesseract soon, then adding the extensible solution in 1.8.
Tyler On Thu, Dec 18, 2014 at 11:58 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > > I haven’t tried my hand at it - been super busy. tyler if you have a > chance go for it, I think that’s the remaining blocker. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > -----Original Message----- > From: Tyler Palsulich <tpalsul...@gmail.com> > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > Date: Thursday, December 18, 2014 at 12:54 PM > To: "dev@tika.apache.org" <dev@tika.apache.org> > Subject: Re: 1.7 release? > > >Hi All, > > > >It's been a few months, so I just want to follow up on this thread. We've > >resolved/closed 51 issues for v1.7 [0]. There are two on JIRA marked as > >1.7 > >(TIKA-1465 and TIKA-894). Do we still want to aim for 1.7 with TIKA-1445? > >Has anyone tried their hand at the suggested (significant) fix? > > > >Are there any other issues someone would like to fit in? > > > >Cheers, > >Tyler > > > >[0] - > > > https://issues.apache.org/jira/browse/TIKA/fixforversion/12327096/?selecte > >dTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel > > > >On Tue, Oct 28, 2014 at 1:46 AM, Mattmann, Chris A (3980) < > >chris.a.mattm...@jpl.nasa.gov> wrote: > >> > >> Thanks Tim saw your patch and am looking now. > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: <Allison>, "Timothy B." <talli...@mitre.org> > >> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > >> Date: Monday, October 27, 2014 at 12:30 PM > >> To: "dev@tika.apache.org" <dev@tika.apache.org> > >> Subject: RE: 1.7 release? > >> > >> >Sounds good. As long as the default behavior remains the same, I'm > >> >happy. I'm going to play with a combination of your patch and Tyler's > >> >and see what the ramifications are for embedded docs. > >> > > >> >To confirm, the OCR integration is fantastic. Thank you and Tyler! > >> > > >> > > >> >Best, > >> > > >> > Tim > >> > > >> >-----Original Message----- > >> >From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > >> >Sent: Friday, October 24, 2014 5:36 PM > >> >To: dev@tika.apache.org > >> >Subject: Re: 1.7 release? > >> > > >> >Hey Tim, > >> > > >> >What do you think about my existing patch for 1445? For example to > >> >just call all the parsers? I thought I was seeing behavior that was > >> >slow because of that, but it turned out to be Tesseract and my machine > >> >at the time? > >> > > >> >I think my patch for 1445 may be enough, and we should get the metadata > >> >I think? Thoughts? > >> > > >> >I honestly think we need to deliver Tesseract in 1.7. We're close. I'll > >> >even take it upon myself to try and experiment with the idea of > >>multiple > >> >parsers being called. I think a simple solution to the metadata key > >> >conflict issue is simply to have a policy to add values (by default) > >>and > >> >replace if a property is set in ParseContext. Some simple updates to > >> >CompositeParser would allow this. > >> > > >> >Thoughts? > >> > > >> >Cheers, > >> >Chris > >> > > >> > > >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >Chris Mattmann, Ph.D. > >> >Chief Architect > >> >Instrument Software and Science Data Systems Section (398) > >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >Office: 168-519, Mailstop: 168-527 > >> >Email: chris.a.mattm...@nasa.gov > >> >WWW: http://sunset.usc.edu/~mattmann/ > >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >Adjunct Associate Professor, Computer Science Department > >> >University of Southern California, Los Angeles, CA 90089 USA > >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > > >> > > >> > > >> > > >> > > >> > > >> >-----Original Message----- > >> >From: <Allison>, "Timothy B." <talli...@mitre.org> > >> >Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >Date: Friday, October 24, 2014 at 2:24 PM > >> >To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >Subject: RE: 1.7 release? > >> > > >> >>Sorry for coming late to the game on the implications of TIKA-1445. I > >> >>don't want to hold up the release of 1.7. > >> >> > >> >>However, would it be possible to return to the legacy default > >>behavior of > >> >>extracting metadata from images? > >> >> > >> >>We can then document on the OCR parser page on the wiki that you need > >>to > >> >>install Tesseract _and_ make a change in the parser/mime config file. > >>If > >> >>you want this new capability, it will take a small bit of work until > >>we > >> >>solve TIKA-1445. > >> >> > >> >>I worry that the current behavior of 1.7 would be surprising to most > >> >>non-dev users (well, even to at least one dev :) ). > >> >> > >> >>Cheers, > >> >> > >> >> Tim > >> >> > >> >>________________________________________ > >> >>From: Oleg Tikhonov [olegtikho...@gmail.com] > >> >>Sent: Friday, October 24, 2014 2:24 PM > >> >>To: dev@tika.apache.org > >> >>Subject: Re: 1.7 release? > >> >> > >> >>Hi Tyler, > >> >>don't mention. > >> >> > >> >>Cheers, > >> >>Oleg > >> >>On Oct 24, 2014 8:02 PM, "Tyler Palsulich" <tpalsul...@gmail.com> > >>wrote: > >> >> > >> >>> Thank you for the help, Oleg! I just resolved TIKA-1422. So, are > >>there > >> >>>any > >> >>> other issues anyone would like to resolve before a new release? > >> >>> > >> >>> Thanks, > >> >>> Tyler > >> >>> > >> >>> On Tue, Oct 21, 2014 at 2:42 AM, Oleg Tikhonov > >><olegtikho...@gmail.com > >> > > >> >>> wrote: > >> >>> > >> >>> > Sorry!!! > >> >>> > > >> >>> > On Tue, Oct 21, 2014 at 9:37 AM, Mattmann, Chris A (3980) < > >> >>> > chris.a.mattm...@jpl.nasa.gov> wrote: > >> >>> > > >> >>> > > Thanks Oleg, will try tomorrow for me Los angeles time! > >> >>> > > > >> >>> > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > Chris Mattmann, Ph.D. > >> >>> > > Chief Architect > >> >>> > > Instrument Software and Science Data Systems Section (398) > >> >>> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>> > > Office: 168-519, Mailstop: 168-527 > >> >>> > > Email: chris.a.mattm...@nasa.gov > >> >>> > > WWW: http://sunset.usc.edu/~mattmann/ > >> >>> > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > Adjunct Associate Professor, Computer Science Department > >> >>> > > University of Southern California, Los Angeles, CA 90089 USA > >> >>> > > > >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > -----Original Message----- > >> >>> > > From: Oleg Tikhonov <o...@apache.org> > >> >>> > > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > Date: Monday, October 20, 2014 at 11:20 PM > >> >>> > > To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > Subject: Re: 1.7 release? > >> >>> > > > >> >>> > > >Please take a try with newest patch. > >> >>> > > >Cheers, > >> >>> > > >Oleg > >> >>> > > > > >> >>> > > >On Tue, Oct 21, 2014 at 9:08 AM, Oleg Tikhonov < > >> >>> olegtikho...@gmail.com> > >> >>> > > >wrote: > >> >>> > > > > >> >>> > > >> Taken. Thanks. in progress ... > >> >>> > > >> > >> >>> > > >> On Tue, Oct 21, 2014 at 8:54 AM, Mattmann, Chris A (3980) < > >> >>> > > >> chris.a.mattm...@jpl.nasa.gov> wrote: > >> >>> > > >> > >> >>> > > >>> Trunk is the current checkout/branch: > >> >>> > > >>> > >> >>> > > >>> http://svn.apache.org/repos/asf/tika/trunk > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> Chris Mattmann, Ph.D. > >> >>> > > >>> Chief Architect > >> >>> > > >>> Instrument Software and Science Data Systems Section (398) > >> >>> > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>> > > >>> Office: 168-519, Mailstop: 168-527 > >> >>> > > >>> Email: chris.a.mattm...@nasa.gov > >> >>> > > >>> WWW: http://sunset.usc.edu/~mattmann/ > >> >>> > > >>> > >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> Adjunct Associate Professor, Computer Science Department > >> >>> > > >>> University of Southern California, Los Angeles, CA 90089 USA > >> >>> > > >>> > >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >>> -----Original Message----- > >> >>> > > >>> From: Oleg Tikhonov <olegtikho...@gmail.com> > >> >>> > > >>> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > >>> Date: Monday, October 20, 2014 at 10:16 PM > >> >>> > > >>> To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > >>> Subject: Re: 1.7 release? > >> >>> > > >>> > >> >>> > > >>> >Hi, I can try this on. > >> >>> > > >>> >What is a trunk? > >> >>> > > >>> > > >> >>> > > >>> > > >> >>> > > >>> >Thanks, > >> >>> > > >>> >Oleg > >> >>> > > >>> > > >> >>> > > >>> >On Tue, Oct 21, 2014 at 6:21 AM, Mattmann, Chris A (3980) < > >> >>> > > >>> >chris.a.mattm...@jpl.nasa.gov> wrote: > >> >>> > > >>> > > >> >>> > > >>> >> Hmm any idea why this is failing on Windows? Tyler P. and > >> >>> > > >>> >> I were talking the other day - maybe we shouldn't run the > >> >>> > > >>> >> tests from TIKA-1422 unless Tesseract is installed? > >> >>>Thoughts? > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> >> Chris Mattmann, Ph.D. > >> >>> > > >>> >> Chief Architect > >> >>> > > >>> >> Instrument Software and Science Data Systems Section > >>(398) > >> >>> > > >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >>> > > >>> >> Office: 168-519, Mailstop: 168-527 > >> >>> > > >>> >> Email: chris.a.mattm...@nasa.gov > >> >>> > > >>> >> WWW: http://sunset.usc.edu/~mattmann/ > >> >>> > > >>> >> > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> >> Adjunct Associate Professor, Computer Science Department > >> >>> > > >>> >> University of Southern California, Los Angeles, CA 90089 > >>USA > >> >>> > > >>> >> > >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> -----Original Message----- > >> >>> > > >>> >> From: Hong-Thai Nguyen <thaicha...@gmail.com> > >> >>> > > >>> >> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > >>> >> Date: Thursday, October 16, 2014 at 2:03 AM > >> >>> > > >>> >> To: "dev@tika.apache.org" <dev@tika.apache.org> > >> >>> > > >>> >> Subject: Re: 1.7 release? > >> >>> > > >>> >> > >> >>> > > >>> >> >Hi Andrzej, > >> >>> > > >>> >> > > >> >>> > > >>> >> >We are impatient for 1.7 release too. > >> >>> > > >>> >> >I'm having compiling problem of TIKA-1422 on me. If > >>anyone > >> >>>can > >> >>> > > >>>build > >> >>> > > >>> >> >successfully on Windows, I have no objection to release > >>1.7 > >> >>> > > >>> >> > > >> >>> > > >>> >> >Thanks, > >> >>> > > >>> >> > > >> >>> > > >>> >> >On Thu, Oct 16, 2014 at 10:51 AM, Andrzej Białecki < > >> >>> > a...@getopt.org> > >> >>> > > >>> >>wrote: > >> >>> > > >>> >> > > >> >>> > > >>> >> >> Hi, > >> >>> > > >>> >> >> > >> >>> > > >>> >> >> Any news on the 1.7 release? or at least a 1.6.1 > >>release > >> >>>that > >> >>> > > >>> >>includes > >> >>> > > >>> >> >>the > >> >>> > > >>> >> >> fix for broken ODF parsing... > >> >>> > > >>> >> >> > >> >>> > > >>> >> >> --- > >> >>> > > >>> >> >> Best regards, > >> >>> > > >>> >> >> > >> >>> > > >>> >> >> Andrzej Bialecki > >> >>> > > >>> >> >> > >> >>> > > >>> >> >> > >> >>> > > >>> >> > > >> >>> > > >>> >> > > >> >>> > > >>> >> >-- > >> >>> > > >>> >> >-------------- > >> >>> > > >>> >> >Hong-Thai > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> > >> >>> > > >>> > >> >>> > > >> > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > >> > > >> > >> > >