[
https://issues.apache.org/jira/browse/NUTCH-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634129#comment-13634129
]
Julien Nioche commented on NUTCH-1558:
--------------------------------------
This patch is based on an antique version of Nutch (1.2) and refers to JSP
files. I won't have the time to look at it in details within the next 48 hours,
but since it is not a trivial one I'd suggest that you wait to get some proper
reviews of this before committing.
At a quick glance and apart from the fact that it does calls to
e.printStackTrace() there seems to be quite a lot added to Content and I
suspect that some of the stuff is taken from somewhere else in which case we
should factorize the code, not multiply it, e.g. the method
sniffCharacterEncoding() probably lives somewhere else.
I see regular expressions being done in the Content class which feels quite
wrong. It probably does what it is supposed to but probably not in the right
way.
BTW I am all for using github but I'd rather have patches converted to the SVN
format and attached to JIRA.
Thanks
Julien
> CharEncodingForConversion in ParseData's ParseMeta, not in ParseData's
> ContentMeta
> ----------------------------------------------------------------------------------
>
> Key: NUTCH-1558
> URL: https://issues.apache.org/jira/browse/NUTCH-1558
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.7
>
>
> This patch from GitHub user ysc fixes two bugs related to character encoding:
> * CharEncodingForConversion in ParseData's ParseMeta, not in ParseData's
> ContentMeta
> * if http response Header Content-Type return wrong codingļ¼then get coding
> from the original content of the page
> Information about this pull request is here: http://s.apache.org/VOP
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira