[ 
https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1053:
----------------------------------------

    Attachment: seed.txt

I attach a seed file which I've used with the crawl command to parse and index 
several feed URLs. Using the crawl command the only warning in my logs was as 
follows
{code}
2011-10-10 22:10:37,853 WARN  parse.ParserFactory - ParserFactory:Plugin: 
org.apache.nutch.parse.feed.FeedParser mapped to contentType 
application/rss+xml via parse-plugins.xml, but its plugin.xml file does not 
claim to support contentType: application/rss+xml
{code} 

Additionally I've used the command line to attempt to parse the feeds but I'm 
getting the following. Any thoughts? Can you give a use case or an URL which 
will reproduce the problem you mention with the RSS parser?
{code}
lewis@lewis:~/ASF/trunk/runtime/local$ bin/nutch plugin feed 
org.apache.nutch.parse.feed.FeedParser 
http://feeds.bbci.co.uk/news/scotland/rss.xml
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.nutch.plugin.PluginRepository.main(PluginRepository.java:421)
Caused by: java.io.FileNotFoundException: 
http:/feeds.bbci.co.uk/news/scotland/rss.xml (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at org.apache.nutch.parse.feed.FeedParser.main(FeedParser.java:209)
        ... 5 more
{code}
                
> Parsing of RSS feeds fails 
> ---------------------------
>
>                 Key: NUTCH-1053
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1053
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.4
>
>         Attachments: seed.txt
>
>
> See discussion on 
> http://lucene.472066.n3.nabble.com/RSS-feed-parsing-on-Nutch-1-3-td3166487.html
> Have been able to reproduce the problem and will look into it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to