[ http://issues.apache.org/jira/browse/NUTCH-57?page=all ]

Jerome Charron updated NUTCH-57:
--------------------------------

    Attachment: NUTCH-57-050509.patch

The problem was: ContentType optional parameters were not removed from the 
subtype. And when the validity of the subtype was checked an exception was 
raised.

The patch:
* Removes the optional parameters from the content-type subtype.
* Some unitary tests added to test the correction.

> text and html files unrecognized
> --------------------------------
>
>          Key: NUTCH-57
>          URL: http://issues.apache.org/jira/browse/NUTCH-57
>      Project: Nutch
>         Type: Bug
>   Components: indexer
>  Environment: Nutch 0.7Dev 
>     Reporter: Marc Delerue
>  Attachments: NUTCH-57-050509.patch
>
> While crawling : 
> http://XXX.XXX.XXX.XXX/yyyyy.txtorg.apache.nutch.util.mime.MimeTypeException 
> : invalid Sub Type plain 
> and
> http://XXX.XXX.XXX.XXX/yyyyy.htmlorg.apache.nutch.util.mime.MimeTypeException 
> : invalid Sub Type html
> The html and text files are fetched but not indexed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to