[
https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1053:
----------------------------------------
Attachment: seed.txt
I attach a seed file which I've used with the crawl command to parse and index
several feed URLs. Using the crawl command the only warning in my logs was as
follows
{code}
2011-10-10 22:10:37,853 WARN parse.ParserFactory - ParserFactory:Plugin:
org.apache.nutch.parse.feed.FeedParser mapped to contentType
application/rss+xml via parse-plugins.xml, but its plugin.xml file does not
claim to support contentType: application/rss+xml
{code}
Additionally I've used the command line to attempt to parse the feeds but I'm
getting the following. Any thoughts? Can you give a use case or an URL which
will reproduce the problem you mention with the RSS parser?
{code}
lewis@lewis:~/ASF/trunk/runtime/local$ bin/nutch plugin feed
org.apache.nutch.parse.feed.FeedParser
http://feeds.bbci.co.uk/news/scotland/rss.xml
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.nutch.plugin.PluginRepository.main(PluginRepository.java:421)
Caused by: java.io.FileNotFoundException:
http:/feeds.bbci.co.uk/news/scotland/rss.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at org.apache.nutch.parse.feed.FeedParser.main(FeedParser.java:209)
... 5 more
{code}
> Parsing of RSS feeds fails
> ---------------------------
>
> Key: NUTCH-1053
> URL: https://issues.apache.org/jira/browse/NUTCH-1053
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.4
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 1.4
>
> Attachments: seed.txt
>
>
> See discussion on
> http://lucene.472066.n3.nabble.com/RSS-feed-parsing-on-Nutch-1-3-td3166487.html
> Have been able to reproduce the problem and will look into it
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira