Hi Tyler,
I have defined a few custom mime types in my filemgr/etc/mime-types.xml file.
The contents of my file looks exactly like the contents of
http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resources/mime-types.xml
with the addition of project-specific mime-types . The tika-mimetypes.xml
file you pointed me to has ~2000 additional lines in it as compared to the
http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resources/mime-types.xml
file and the
http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/radix/src/main/resources/archetype-resources/filemgr/src/main/resources/etc/mime-types.xml
file. So, it is definitely different than the one I've been using. But, I
copied it over and added my mime types to it, and it didn't help. The mime
types it is returning are 'reasonable' mime-types to return, they are just not
the mime-types that I defined them as. For instance, I have *.sfdu files and
*.out files that contain binary data, and tika says they are
"application/octet-stream" files. I also have *.ecsv files that contain text,
and tika says they are "text/plain" files.
But here are the mime-types I defined for these files for my project, and these
are the mime-types that have defined extractors for. None of these filename
extensions "*.out, *.ecsv, and *.sfdu" are defined elsewhere in the
mime-types.xml file.
<mime-type type="product/fei-out">
<glob pattern="*.out"/>
</mime-type>
<mime-type type="product/fei-ecsv">
<glob pattern="*.ecsv"/>
</mime-type>
<mime-type type="product/fei-sfdu">
<glob pattern="*.sfdu"/>
</mime-type>
I'm a newbie with Java and I can't guarantee I would be able to build a JUnit
test program very easily. But I will continue to investigate and see what I can
do.
Thanks!
Val
Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory
> -----Original Message-----
> From: Tyler Palsulich [mailto:[email protected]]
> Sent: Wednesday, January 21, 2015 5:13 PM
> To: dev
> Subject: Re: Tyler - I may need your help
>
> Hi Val,
>
> Hmm... Is there a particular (wrong) mime-type that keeps getting detected
> (like
> text/plain, or something)? I'm curious if the type is just returning a
> default. Or, is it
> a seemingly random file type? What are the contents of your mime-types.xml
> file?
> If it's different than
> https://raw.githubusercontent.com/apache/tika/trunk/tika-
> core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml,
> can you try copying it over?
>
> I'm not sure I'll be able to replicate your error on my computer without a
> bit of
> difficulty. Do you think there is any way you could create a JUnit test case
> with the
> problem?
>
> Tyler
>
>
> On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> [email protected]>
> wrote:
>
> > Hi Tyler,
> >
> > I'm have been looking into an issue that cropped up in my OODT system
> > when I upgraded to OODT 0.8. The issue is, my
> > AutoDetectProductCrawler, which is launched from a PGETaskInstance is
> > unable to determine the mime-type for my product files. I am using
> > the same filemgr/etc/mime-types.xml file that I was using with OODT
> > 0.7, and I am using the same
> > oodt/extensions/policy/mime-extractor-map.xml file that I was using
> > with OODT 0.7, but now, in MimeTypeRepo::getExtractorSpecsForFile, the
> > call to
> > this.mimeRepo.getMimeType(file) is returning the wrong mime-types for
> > all of my files, and so the AutoDetectProductCrawler is telling me I
> > have no extractor specs for my files.
> >
> > I noticed that you did some work on MimeTypeUtils for OODT-630 in OODT
> > 0.8. At first glance, it doesn't' look like any of this work would be
> > directly responsible. Can you think of anything that might be causing
> > this to happen? I don't know anything about tika. Do I need to make
> > any changes to my policy files to remain compatible. Just looking for
> > clues on how to resolve this. I have verified by adding log messages
> > throughout the code that, prior to launching the
> > AutoDetectProductCrawler, all of the policy files are read correctly.
> > The MimeExtractorConfigReader is reading the correct
> > mim-extractor-map.xml file, and it is calling setMimeRepoFile with the
> > correct mime-types.xml file, and it is setting the correct extractor
> > config file, etc. But, once AutoDetectProductCrawler starts crawling
> > it try to getExtractorSpecsForFile but determines the wrong mime type and
> > then
> can't find the extractor spec.
> >
> > Thanks,
> > Val
> >
> >
> >
> > Valerie A. Mallder
> >
> > New Horizons Deputy Mission System Engineer The Johns Hopkins
> > University/Applied Physics Laboratory
> > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> >
> >