Hi Guys Did someone install parse-rss and try to fetch rss feeds? It failed on my side. I enabled the plugin and it fetched, not rss parser didnot work. My feed is http://www.craigslist.org/evs/index.rss
Here is the error: org.apache.nutch.fetcher.Fetcher$FetcherThread [11] - fetch okay, but can't parse http://beijing.craigslist.org/jjj/index.rss, reason: failed(2,203): Content-Type not text/html: application/xml; charset=ISO-8859-1 The content-type is application/xml. Mattmann's comment is this: // check that contentType is one we can handle String contentType = content.getContentType(); if (contentType != null && (!contentType.startsWith("text/xml") && !contentType.startsWith("application/rss+xml"))) return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT, "Content-Type not text/xml or application/rss+xml: " + contentType).getEmptyParse(); So, it does not "application/xml" content type yet? Thanks /Jack -- Keep Discovering ... ... http://www.jroller.com/page/jmars
