Hmm if your mime-types.xml in Tika has that MIME format it should call the 
right parser.

 

Alternatives:

 

Try the AutoDetectCrawler where you can feed it your own MIME repo mapping.

Use the ExternMetExtractor and wire it up to call Tika from the command line or 
tika-python and then customize as needed

 

Cheers,

Chris

 

 

 

 

From: lewis john mcgibbney <lewi...@apache.org>
Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Date: Thursday, October 18, 2018 at 9:41 PM
To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: Forcing invocation of specific Tika parser when running 
TikaCmdLineMetExtractor

 

Hi Folks,

I asked a similar question a while back.... but I don't think I

communicated it clearly enough.

I'm running the crawler_launcher as follows

 

./crawler_launcher --filemgrUrl http://localhost:9000 --operation

--launchMetCrawler --clientTransferer

org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory

--productPath /usr/local/coal-sds-deploy/data/staging --metExtractor

org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor

--metExtractorConfig

/usr/local/coal-sds-deploy/crawler/etc/tika_aviris_hdr.properties

 

The project is parsers and ingested into File Manager, however Tika only

uses the org.apache.tika.parser.DefaultParser... which is not sufficient as

I am working with application/envi.hdr files which are rich in metadata.

 

The --metExtractorConfig file contains the following primitive metadata

 

ProductType=GenericFile

Content-type=application/envi.hdr

 

And yes the 'Content-type=application/envi.hdr' is successfully added to

the metadata record in File Manager. I am just not sure how to force Tika

to invoke a specific parser.

 

Thanks for any help,

Lewis

 

 

 

-- 

http://home.apache.org/~lewismc/

http://people.apache.org/keys/committer/lewismc

 

Reply via email to