Config dumper would be most appreciated in tika-examples!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Allison>, "Timothy B." <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, September 18, 2014 10:19 AM
To: "[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: RE: How to exclude a mimetype in tika?

>Speaking of which...last time I went looking for an example of an
>up-to-date tika config file, it was hard to find (thank you, jboss and
>https://wiki.csc.calpoly.edu/DocuCategMontano/browser/Parser/tika-config.x
>ml).
>
>Should I add a DefaultTikaConfigDumper to the examples module that would
>dump a default tika config with the current version of Tika so that
>people can dump it and then modify it?
>
>Or, did I just plain miss an already existing example on our website/wiki?
>
>Best,
>
>            Tim
>
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:[email protected]]
>Sent: Thursday, September 18, 2014 12:56 PM
>To: [email protected]
>Cc: [email protected]
>Subject: Re: How to exclude a mimetype in tika?
>
>+1 Tim, I believe so?
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: <Allison>, "Timothy B." <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Thursday, September 18, 2014 7:45 AM
>To: "[email protected]" <[email protected]>
>Cc: "[email protected]" <[email protected]>
>Subject: FW: How to exclude a mimetype in tika?
>
>>Tika Colleagues (Tika'ers, Tikis?),
>>
>>Is this the right answer:
>>
>>Drop the relevant parsers from the tika.config file and make sure to
>>point solr to this file in your solr request handler definition: <str
>>name="tika.config">/my/path/to/tika.config</str>?
>>
>>  I only have experience as a programmatic user of Tika and would use a
>>DocumentSelector, but would the above work?
>>
>>-----Original Message-----
>>From: keeblerh [mailto:[email protected]]
>>Sent: Thursday, September 18, 2014 10:15 AM
>>To: [email protected]
>>Subject: Re: How to exclude a mimetype in tika?
>>
>>eShard wrote
>>> Good afternoon,
>>> I'm using solr 4.0 Final
>>> I need movies "hidden" in zip files that need to be excluded from the
>>> index.
>>> I can't filter movies on the crawler because then I would have to
>>>exclude
>>> all zip files.
>>> I was told I can have tika skip the movies.
>>> the details are escaping me at this point.
>>> How do I exclude a file in the tika configuration?
>>> I assume it's something I add in the update/extract handler but I'm not
>>> sure.
>>> 
>>> Thanks,
>>
>>I am having the same issue.  I need to exlcude some mime types from the
>>zip
>>files and using SOLR 4.8.  Did you ever get an answer to this?  THanks.
>>
>>
>>
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp41
>>2
>>7168p4159676.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to