[ http://issues.apache.org/jira/browse/NUTCH-57?page=all ]
Jerome Charron updated NUTCH-57:
--------------------------------
Attachment: NUTCH-57-050509.patch
The problem was: ContentType optional parameters were not removed from the
subtype. And when the validity of the subtype was checked an exception was
raised.
The patch:
* Removes the optional parameters from the content-type subtype.
* Some unitary tests added to test the correction.
> text and html files unrecognized
> --------------------------------
>
> Key: NUTCH-57
> URL: http://issues.apache.org/jira/browse/NUTCH-57
> Project: Nutch
> Type: Bug
> Components: indexer
> Environment: Nutch 0.7Dev
> Reporter: Marc Delerue
> Attachments: NUTCH-57-050509.patch
>
> While crawling :
> http://XXX.XXX.XXX.XXX/yyyyy.txtorg.apache.nutch.util.mime.MimeTypeException
> : invalid Sub Type plain
> and
> http://XXX.XXX.XXX.XXX/yyyyy.htmlorg.apache.nutch.util.mime.MimeTypeException
> : invalid Sub Type html
> The html and text files are fetched but not indexed.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira