[ http://issues.apache.org/jira/browse/NUTCH-33?page=comments#action_62195 ] John Xing commented on NUTCH-33: --------------------------------
Just skimmed the code. The xml approach looks good. Two minor comments: (1) make magic check an option with a boolean property such as mime.type.magic (true/false) in nutch-default.xml (2) use org.apache.nutch.util.mime I think there are codes in Hari Kodungallu's tarball that cover primary/sub types. Thanks, John > MIME content type detector (using magic char sequences) > ------------------------------------------------------- > > Key: NUTCH-33 > URL: http://issues.apache.org/jira/browse/NUTCH-33 > Project: Nutch > Type: New Feature > Reporter: Jerome Charron > Priority: Minor > Attachments: NUTCH-33.patch, mime-types.tar.gz > > Extension based content-type detector is not suffisant in some cases. > The solution is to add a content type detector based on some magic char > sequences like in apache httpd for instance. > (Note: I created this issue only to keep a trace, but I'm currently working > on it) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira
