[ http://issues.apache.org/jira/browse/NUTCH-89?page=all ]
Michael Nebel updated NUTCH-89:
-------------------------------
Attachment: parse-rss.20050910.patch
> parse-rss null pointer exception
> --------------------------------
>
> Key: NUTCH-89
> URL: http://issues.apache.org/jira/browse/NUTCH-89
> Project: Nutch
> Type: Bug
> Components: fetcher
> Versions: 0.7, 0.8-dev
> Reporter: Michael Nebel
> Attachments: parse-rss.20050910.patch
>
> The rss-parser causes an exception. The reason is a syntax error in the page.
> Hitting this pages, the parser trys to add an outlink with "null" as anchor.
> The anchor of a outlink must no be null.
> java.lang.NullPointerException
> at org.apache.nutch.io.UTF8.writeString(UTF8.java:236)
> at org.apache.nutch.parse.Outlink.write(Outlink.java:51)
> at org.apache.nutch.parse.ParseData.write(ParseData.java:111)
> at
> org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
> at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
> at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:281)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
> Exception in thread "main" java.lang.RuntimeException: SEVERE error logged.
> Exiting fetcher.
> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:354)
> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488)
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:140)
> I suggest the following patch:
> Index: src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
> ===================================================================
> --- src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
> (revision 279397)
> +++ src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
> (working copy)
> @@ -157,11 +157,13 @@
> if (r.getLink() != null) {
> try {
> // get the outlink
> - theOutlinks.add(new Outlink(r.getLink(), r
> - .getDescription()));
> + if (r.getDescription()!= null ) {
> + theOutlinks.add(new Outlink(r.getLink(),
> r.getDescription()));
> + } else {
> + theOutlinks.add(new Outlink(r.getLink(), ""));
> + }
> } catch (MalformedURLException e) {
> - LOG
> - .info("nutch:parse-rss:RSSParser Exception:
> MalformedURL: "
> + LOG.info("nutch:parse-rss:RSSParser Exception:
> MalformedURL: "
> + r.getLink()
> + ": Attempting to continue
> processing outlinks");
> e.printStackTrace();
> @@ -185,12 +187,13 @@
>
> if (whichLink != null) {
> try {
> - theOutlinks.add(new Outlink(whichLink, theRSSItem
> - .getDescription()));
> -
> + if (theRSSItem.getDescription()!=null) {
> + theOutlinks.add(new Outlink(whichLink,
> theRSSItem.getDescription()));
> + } else {
> + theOutlinks.add(new Outlink(whichLink, ""));
> + }
> } catch (MalformedURLException e) {
> - LOG
> - .info("nutch:parse-rss:RSSParser
> Exception: MalformedURL: "
> + LOG.info("nutch:parse-rss:RSSParser Exception:
> MalformedURL: "
> + whichLink
> + ": Attempting to continue
> processing outlinks");
> e.printStackTrace();
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira