[ 
https://issues.apache.org/jira/browse/NUTCH-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015888#comment-13015888
 ] 

Markus Jelsma commented on NUTCH-974:
-------------------------------------

Hi,

Nutch 1.1 and 1.2 don't ship with the same configuration. The crawl and parse 
works as expected using the configuration shipped with the releases and using 
the bin/nutch shell commands.

> Parsing Error in Nutch 1.2 on Windows7
> --------------------------------------
>
>                 Key: NUTCH-974
>                 URL: https://issues.apache.org/jira/browse/NUTCH-974
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.2
>         Environment: Windows7 64-bit, Cygwin 1.7.9-1
>            Reporter: Niksa Jakovljevic
>            Assignee: Markus Jelsma
>
> Hello World example of crawling does not work with Nutch 1.2 libs, but works 
> fine with Nutch 1.1 libs. Note that same configuration is used in both Nutch 
> 1.2 and Nutch 1.1.
> Nutch 1.2 always throws following exception:
> 2011-04-01 16:33:45,177 WARN  parse.ParseUtil - Unable to successfully parse 
> content http://www.test.com/ of type text/html
> 2011-04-01 16:33:45,177 WARN  fetcher.Fetcher - Error parsing: 
> http://www.test.com/: failed(2,200): org.apache.nutch.parse.ParseException: 
> Unable to successfully parse content
> Thanks,
> Niksa Jakovljevic

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to