Hi Tyler, Yes, this fix did take care of my problem. Thanks so much!
Chris, if you want to make a new OODT 0.8.1, be sure to also include the fix for OODT-805 below in the radix installation. My system is back up and running now. Thanks, Val Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Tyler Palsulich [mailto:[email protected]] > Sent: Thursday, January 22, 2015 10:35 PM > To: dev > Subject: Re: FW: Tyler - I may need your help > > Hi Val, > > Please see OODT-805 and > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c62fb6b38cf529f > fb2 > for what I believe is the fix. > > Can you make the MimeTypeUtils changes locally or try out trunk? > > Let me know! > Tyler > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich <[email protected]> > wrote: > > > Hi Val, > > > > Yes, I think you've hit the nail on the head -- if Tika isn't passed > > your updated mimetypes configuration file (with your custom types), > > then those files will not be properly identified. I'll look into this > > issue more tonight and hopefully find a fix. :) > > > > > by default tika only knows about xml files, text files, > > application/octet-stream files. > > I'm not sure what you mean by this? Tika knows about much more than > > that, but is there an OODT config that overrides that? > > > > > I'm a newbie with Java and I can't guarantee I would be able to > > > build a > > JUnit test program very easily. But I will continue to investigate and > > see what I can do. > > No worries! :) If you have time and want to try your hand at it, the > > best way to learn is by looking at the existing tests, like in > > https://github.com/apache/oodt/blob/trunk/metadata/src/test/org/apache > > /oodt/cas/metadata/util/TestMimeTypeUtils.java > > . > > > > Have a good night, > > Tyler > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie < > > [email protected]> wrote: > > > >> Hi Tyler, > >> > >> Can you tell me more about the tika-mimetypes.xml file? Is this a new > >> 'required' file? I'm not 100% sure about this yet, but it seems to > >> me that, since MimeTypeUtils.java instantiates Tika with the default > >> constructor, and never explicitly tells Tika which mime-types file to > >> use (even though the correct mime-types.xml file is passed to the > >> MimeTypeUtils constructor from MimeExtractorRepo) there is no place > >> where the contents of my mime-types.xml file is being read and stored > >> in the Tika's MimeTypeRegistry, and by default tika only knows about > >> xml files, text files, application/octet-stream files. > >> > >> I will keep looking at this tomorrow and verify which the file that > >> is passed to the Tika's MimeTypesFactory class, but I have to head home > >> now. > >> > >> Val > >> > >> > >> > >> > >> Valerie A. Mallder > >> New Horizons Deputy Mission System Engineer Johns Hopkins > >> University/Applied Physics Laboratory > >> > >> > >> -----Original Message----- > >> From: Mallder, Valerie > >> Sent: Thursday, January 22, 2015 11:42 AM > >> To: dev > >> Subject: RE: Tyler - I may need your help > >> > >> Hi Tyler, > >> > >> I have defined a few custom mime types in my > >> filemgr/etc/mime-types.xml file. The contents of my file looks > >> exactly like the contents of > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resources > >> /mime-types.xml with the addition of project-specific mime-types . > >> The tika-mimetypes.xml file you pointed me to has ~2000 additional > >> lines in it as compared to the > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resources > >> /mime-types.xml > >> file and the > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/radix/src/m > >> ain/resources/archetype-resources/filemgr/src/main/resources/etc/mime > >> -types.xml file. So, it is definitely different than the one I've > >> been using. But, I copied it over and added my mime types to it, and > >> it didn't help. The mime types it is returning are 'reasonable' > >> mime-types to return, they are just not the mime-types that I defined > >> them as. For instance, I have *.sfdu files and *.out files that > >> contain binary data, and tika says they are > >> "application/octet-stream" files. I also have *.ecsv files that > >> contain text, and tika says they are "text/plain" files. > >> > >> But here are the mime-types I defined for these files for my project, > >> and these are the mime-types that have defined extractors for. None > >> of these filename extensions "*.out, *.ecsv, and *.sfdu" are defined > >> elsewhere in the mime-types.xml file. > >> > >> <mime-type type="product/fei-out"> > >> <glob pattern="*.out"/> > >> </mime-type> > >> > >> <mime-type type="product/fei-ecsv"> > >> <glob pattern="*.ecsv"/> > >> </mime-type> > >> > >> <mime-type type="product/fei-sfdu"> > >> <glob pattern="*.sfdu"/> > >> </mime-type> > >> > >> I'm a newbie with Java and I can't guarantee I would be able to build > >> a JUnit test program very easily. But I will continue to investigate > >> and see what I can do. > >> > >> Thanks! > >> > >> Val > >> > >> > >> > >> > >> Valerie A. Mallder > >> New Horizons Deputy Mission System Engineer Johns Hopkins > >> University/Applied Physics Laboratory > >> > >> > >> > -----Original Message----- > >> > From: Tyler Palsulich [mailto:[email protected]] > >> > Sent: Wednesday, January 21, 2015 5:13 PM > >> > To: dev > >> > Subject: Re: Tyler - I may need your help > >> > > >> > Hi Val, > >> > > >> > Hmm... Is there a particular (wrong) mime-type that keeps getting > >> > detected (like text/plain, or something)? I'm curious if the type > >> > is just returning a default. Or, is it a seemingly random file > >> > type? What > >> are the contents of your mime-types.xml file? > >> > If it's different than > >> > https://raw.githubusercontent.com/apache/tika/trunk/tika- > >> > core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml, > >> > can you try copying it over? > >> > > >> > I'm not sure I'll be able to replicate your error on my computer > >> > without a bit of difficulty. Do you think there is any way you > >> > could create a JUnit test case with the problem? > >> > > >> > Tyler > >> > > >> > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie < > >> > [email protected]> > >> > wrote: > >> > > >> > > Hi Tyler, > >> > > > >> > > I'm have been looking into an issue that cropped up in my OODT > >> > > system when I upgraded to OODT 0.8. The issue is, my > >> > > AutoDetectProductCrawler, which is launched from a > >> > > PGETaskInstance is unable to determine the mime-type for my > >> > > product files. I am using the same filemgr/etc/mime-types.xml > >> > > file that I was using with OODT 0.7, and I am using the same > >> > > oodt/extensions/policy/mime-extractor-map.xml file that I was > >> > > using with OODT 0.7, but now, in > >> > > MimeTypeRepo::getExtractorSpecsForFile, > >> > > the call to > >> > > this.mimeRepo.getMimeType(file) is returning the wrong mime-types > >> > > for all of my files, and so the AutoDetectProductCrawler is > >> > > telling me I have no extractor specs for my files. > >> > > > >> > > I noticed that you did some work on MimeTypeUtils for OODT-630 in > >> > > OODT 0.8. At first glance, it doesn't' look like any of this work > >> > > would be directly responsible. Can you think of anything that > >> > > might be causing this to happen? I don't know anything about > >> > > tika. Do I need to make any changes to my policy files to remain > compatible. > >> > > Just looking for clues on how to resolve this. I have verified > >> > > by adding log messages throughout the code that, prior to > >> > > launching the AutoDetectProductCrawler, all of the policy files are > >> > > read > correctly. > >> > > The MimeExtractorConfigReader is reading the correct > >> > > mim-extractor-map.xml file, and it is calling setMimeRepoFile > >> > > with the correct mime-types.xml file, and it is setting the > >> > > correct extractor config file, etc. But, once > >> > > AutoDetectProductCrawler starts crawling it try to > >> > > getExtractorSpecsForFile but determines the wrong mime type and > >> > > then > >> > can't find the extractor spec. > >> > > > >> > > Thanks, > >> > > Val > >> > > > >> > > > >> > > > >> > > Valerie A. Mallder > >> > > > >> > > New Horizons Deputy Mission System Engineer The Johns Hopkins > >> > > University/Applied Physics Laboratory > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry) > >> > > > >> > > > >> > > > >
