[..snip..] > return type.getName(); > } > > > The NPE was being thrown on the last line, so I did some tracing and > found out that the call to MimeType.clean(typeName) [typeName <- > "text/html] worked fine, but the next line caused a problem. The > this.mimeTypes.getRepository.forName(cleanedMimeType) was returning > null. My problem was that I downloaded the trunk and it didn't have a > MimeUtils anymore so I had no way to trace this.
Yes, this class was removed as part of NUTCH-562. Its usage was replaced with the class of the same name within the Tika API, which is based on the Nutch API for mime types. > > Anyway, after an hour or so of banging my head against the wail I > realized the update to Nutch didn't have the correct .xml file > describing mime types in the conf/ directory. Thus, I unzipped the Tika > jar, grabbed the .xml file and changed nutch-default.xml to point to > that xml for mime types and it started working. This is strange: as part of the patch for NUTCH-562, there was a file called tika-mimetypes.xml, that was committed to the conf/ folder within the trunk. Do you not have this file? The nutch-default.xml file within the conf/ folder in the nutch trunk points to the tika-mimetypes.xml, so that should have worked. I'm wondering if you had an old version of the /conf directory and neglected to svn up it? > > Sorry again for being so vague. I'm not sure if I should submit a JIRA > issue for this, but I'm happy to do so if anyone else has seen this issue. No problem: let's discuss the JIRA issue once we get an answer to the above questions. Thanks for being more descriptive and looking forward to your response. Cheers, Chris > > Thanks, > Ned > > > Chris Mattmann wrote: >> Hi Ned, >> >> Glad to see you're poking around with the Tika software and its use in >> Nutch. To start, you probably want to go to the website for Tika: >> >> http://incubator.apache.org/tika/ >> >> On that website, you should see the links to the SVN repository. The >> version of Tika that was used was a version that I built the same day I >> committed the fix for NUTCH-562: >> >> http://issues.apache.org/jira/browse/NUTCH-562 >> >> Which appears to be a version of Tika built on October 8th. The API for the >> mime framework has changed a bit since then (to its betterment), however, I >> neglected to upgrade the Nutch API because of the strong objection I >> received from Andrzej and input from Dennis Kubes regarding the use of the >> Tika API in Nutch. I stand by my email I sent in reply to the objections: >> >> http://www.nabble.com/forum/ViewPost.jtp?post=13142174&framed=y >> >> However, out of respect for the other committers, neglected to make any >> updates to the Nutch use of the Tika API since I never heard back from >> anyone after my response. >> >> That said, could you be a bit more specific Ned as to the exact problem >> you're having, e.g., "I tried visiting this site (URL here), the content >> type was (content/type here), and then it got into Content.java, and on line >> XXX it seems that the MimeType is getting set to null when it tries to...". >> With that info, I could probably help you quite a bit more. Also, depending >> upon how the rest of the Nutch committers want to handle the use of Tika >> (revert and remain stagnant, or use Tika and leverage the updates we're >> making to the Mime framework there), then we could come up with a strategy >> to help you out with the issue you're having. >> >> Thanks! >> >> Cheers, >> Chris >> >> >> >> On 11/6/07 3:47 PM, "Ned Rockson" <[EMAIL PROTECTED]> wrote: >> >> >>> I think there may be a bug in the Content.java when it tries to convert >>> the textual representation of the type to a MimeType. It always returns >>> null. I'm trying to fix it but I can't find an API for Tika (or even >>> src). Can someone point me in the right direction? >>> >>> Thanks, >>> Ned >>> >> >> ______________________________________________ >> Chris Mattmann, Ph.D. >> [EMAIL PROTECTED] >> _________________________________________________ >> Jet Propulsion Laboratory Pasadena, CA >> Office: 171-266B Mailstop: 171-246 >> _______________________________________________________ >> >> Disclaimer: The opinions presented within are my own and do not reflect >> those of either NASA, JPL, or the California Institute of Technology. >> >> >> >> > ______________________________________________ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
