[ 
https://issues.apache.org/jira/browse/NUTCH-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann closed NUTCH-562.
-----------------------------------


- Patch applied to trunk in r583016

> Port mime type framework to use Tika mime detection framework
> -------------------------------------------------------------
>
>                 Key: NUTCH-562
>                 URL: https://issues.apache.org/jira/browse/NUTCH-562
>             Project: Nutch
>          Issue Type: Improvement
>          Components: mime_type_detector
>    Affects Versions: 1.0.0
>         Environment: Mac Book Pro, Intel Core Duo 2.0 Ghz, 2.0 GB RAM, Mac OS 
> X 10.4 although improvement is indep of env
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>         Attachments: NUTCH-562.Mattmann.patch.txt, tika-0.1-dev.jar
>
>
> With Tika (http://incubator.apache.org/tika/) nearing  a stable 0.1 release 
> candidate, I think it would be a good time to patch Nutch to use Tika's mime 
> detection system (an improvement over the existing Nutch one written 
> primarily by Jerome). Tika's mime system is based on the mime system from 
> Freedesktop.org and includes several improvements over the existing Nutch 
> mime system such as:
> 1. reliable XML-based content detection (a clear issue plaguing Nutch for 
> some time now), ability to delineate between RSS, XML, ATOM, etc.
> 2. mime magic pattern matching, including support for multiple patterns
> 3. glob pattern matches (ability to support > 1)
> I'll get together a patch and then attach it to the list once it's relatively 
> stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to