Hi Val, Hmm... Is there a particular (wrong) mime-type that keeps getting detected (like text/plain, or something)? I'm curious if the type is just returning a default. Or, is it a seemingly random file type? What are the contents of your mime-types.xml file? If it's different than https://raw.githubusercontent.com/apache/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml, can you try copying it over?
I'm not sure I'll be able to replicate your error on my computer without a bit of difficulty. Do you think there is any way you could create a JUnit test case with the problem? Tyler On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie < [email protected]> wrote: > Hi Tyler, > > I'm have been looking into an issue that cropped up in my OODT system when > I upgraded to OODT 0.8. The issue is, my AutoDetectProductCrawler, which is > launched from a PGETaskInstance is unable to determine the mime-type for my > product files. I am using the same filemgr/etc/mime-types.xml file that I > was using with OODT 0.7, and I am using the same > oodt/extensions/policy/mime-extractor-map.xml file that I was using with > OODT 0.7, but now, in MimeTypeRepo::getExtractorSpecsForFile, the call to > this.mimeRepo.getMimeType(file) is returning the wrong mime-types for all > of my files, and so the AutoDetectProductCrawler is telling me I have no > extractor specs for my files. > > I noticed that you did some work on MimeTypeUtils for OODT-630 in OODT > 0.8. At first glance, it doesn't' look like any of this work would be > directly responsible. Can you think of anything that might be causing this > to happen? I don't know anything about tika. Do I need to make any changes > to my policy files to remain compatible. Just looking for clues on how to > resolve this. I have verified by adding log messages throughout the code > that, prior to launching the AutoDetectProductCrawler, all of the policy > files are read correctly. The MimeExtractorConfigReader is reading the > correct mim-extractor-map.xml file, and it is calling setMimeRepoFile with > the correct mime-types.xml file, and it is setting the correct extractor > config file, etc. But, once AutoDetectProductCrawler starts crawling it try > to getExtractorSpecsForFile but determines the wrong mime type and then > can't find the extractor spec. > > Thanks, > Val > > > > Valerie A. Mallder > > New Horizons Deputy Mission System Engineer > The Johns Hopkins University/Applied Physics Laboratory > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > 240-228-7846 (Office) 410-504-2233 (Blackberry) > >
