[ https://issues.apache.org/jira/browse/NUTCH-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-745. ---------------------------------------- Resolution: Invalid close of legacy issue > MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run > ------------------------------------------------------------------------ > > Key: NUTCH-745 > URL: https://issues.apache.org/jira/browse/NUTCH-745 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.0.0 > Environment: JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0 > Reporter: jcore_XiaTian > > MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run > public ParseResult getParse(Content content) { > return ParseResult.createParseResult(content.getUrl(), new > ParseStatus(ParseStatus.FAILED, > ParseStatus.FAILED_MISSING_CONTENT, > "No textual content available").getEmptyParse(conf)); > > // return null; > } > ========nutch-site.xml======= > <property> > <name>plugin.includes</name> > > <value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value> > <description><![CDATA[ > > ]]> </description> > </property> > ==========parse-plugins.xml============ > <mimeType name="text/html"> > <plugin id="parse-myHtml" /> > <plugin id="parse-html" /> > </mimeType> > <alias name="parse-myHtml" > extension-id="org.apache.nutch.parse.html.MyHtmlParser" > /> > ===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java======== > public ParseResult getParse(Content content) { > ..... > // cannot run the code: > ParseResult filteredParse = this.htmlParseFilters.filter(content, > parseResult, > metaTags, root); > ....... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira