Hmm if your mime-types.xml in Tika has that MIME format it should call the right parser.
Alternatives: Try the AutoDetectCrawler where you can feed it your own MIME repo mapping. Use the ExternMetExtractor and wire it up to call Tika from the command line or tika-python and then customize as needed Cheers, Chris From: lewis john mcgibbney <lewi...@apache.org> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> Date: Thursday, October 18, 2018 at 9:41 PM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: Forcing invocation of specific Tika parser when running TikaCmdLineMetExtractor Hi Folks, I asked a similar question a while back.... but I don't think I communicated it clearly enough. I'm running the crawler_launcher as follows ./crawler_launcher --filemgrUrl http://localhost:9000 --operation --launchMetCrawler --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory --productPath /usr/local/coal-sds-deploy/data/staging --metExtractor org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor --metExtractorConfig /usr/local/coal-sds-deploy/crawler/etc/tika_aviris_hdr.properties The project is parsers and ingested into File Manager, however Tika only uses the org.apache.tika.parser.DefaultParser... which is not sufficient as I am working with application/envi.hdr files which are rich in metadata. The --metExtractorConfig file contains the following primitive metadata ProductType=GenericFile Content-type=application/envi.hdr And yes the 'Content-type=application/envi.hdr' is successfully added to the metadata record in File Manager. I am just not sure how to force Tika to invoke a specific parser. Thanks for any help, Lewis -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc