Config dumper would be most appreciated in tika-examples! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: <Allison>, "Timothy B." <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, September 18, 2014 10:19 AM To: "[email protected]" <[email protected]> Cc: "[email protected]" <[email protected]> Subject: RE: How to exclude a mimetype in tika? >Speaking of which...last time I went looking for an example of an >up-to-date tika config file, it was hard to find (thank you, jboss and >https://wiki.csc.calpoly.edu/DocuCategMontano/browser/Parser/tika-config.x >ml). > >Should I add a DefaultTikaConfigDumper to the examples module that would >dump a default tika config with the current version of Tika so that >people can dump it and then modify it? > >Or, did I just plain miss an already existing example on our website/wiki? > >Best, > > Tim > > >-----Original Message----- >From: Mattmann, Chris A (3980) [mailto:[email protected]] >Sent: Thursday, September 18, 2014 12:56 PM >To: [email protected] >Cc: [email protected] >Subject: Re: How to exclude a mimetype in tika? > >+1 Tim, I believe so? > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > >-----Original Message----- >From: <Allison>, "Timothy B." <[email protected]> >Reply-To: "[email protected]" <[email protected]> >Date: Thursday, September 18, 2014 7:45 AM >To: "[email protected]" <[email protected]> >Cc: "[email protected]" <[email protected]> >Subject: FW: How to exclude a mimetype in tika? > >>Tika Colleagues (Tika'ers, Tikis?), >> >>Is this the right answer: >> >>Drop the relevant parsers from the tika.config file and make sure to >>point solr to this file in your solr request handler definition: <str >>name="tika.config">/my/path/to/tika.config</str>? >> >> I only have experience as a programmatic user of Tika and would use a >>DocumentSelector, but would the above work? >> >>-----Original Message----- >>From: keeblerh [mailto:[email protected]] >>Sent: Thursday, September 18, 2014 10:15 AM >>To: [email protected] >>Subject: Re: How to exclude a mimetype in tika? >> >>eShard wrote >>> Good afternoon, >>> I'm using solr 4.0 Final >>> I need movies "hidden" in zip files that need to be excluded from the >>> index. >>> I can't filter movies on the crawler because then I would have to >>>exclude >>> all zip files. >>> I was told I can have tika skip the movies. >>> the details are escaping me at this point. >>> How do I exclude a file in the tika configuration? >>> I assume it's something I add in the update/extract handler but I'm not >>> sure. >>> >>> Thanks, >> >>I am having the same issue. I need to exlcude some mime types from the >>zip >>files and using SOLR 4.8. Did you ever get an answer to this? THanks. >> >> >> >>-- >>View this message in context: >>http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp41 >>2 >>7168p4159676.html >>Sent from the Solr - User mailing list archive at Nabble.com. >
