[ https://issues.apache.org/jira/browse/NUTCH-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann closed NUTCH-562. ----------------------------------- - Patch applied to trunk in r583016 > Port mime type framework to use Tika mime detection framework > ------------------------------------------------------------- > > Key: NUTCH-562 > URL: https://issues.apache.org/jira/browse/NUTCH-562 > Project: Nutch > Issue Type: Improvement > Components: mime_type_detector > Affects Versions: 1.0.0 > Environment: Mac Book Pro, Intel Core Duo 2.0 Ghz, 2.0 GB RAM, Mac OS > X 10.4 although improvement is indep of env > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Priority: Minor > Attachments: NUTCH-562.Mattmann.patch.txt, tika-0.1-dev.jar > > > With Tika (http://incubator.apache.org/tika/) nearing a stable 0.1 release > candidate, I think it would be a good time to patch Nutch to use Tika's mime > detection system (an improvement over the existing Nutch one written > primarily by Jerome). Tika's mime system is based on the mime system from > Freedesktop.org and includes several improvements over the existing Nutch > mime system such as: > 1. reliable XML-based content detection (a clear issue plaguing Nutch for > some time now), ability to delineate between RSS, XML, ATOM, etc. > 2. mime magic pattern matching, including support for multiple patterns > 3. glob pattern matches (ability to support > 1) > I'll get together a patch and then attach it to the list once it's relatively > stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.