[ 
https://issues.apache.org/jira/browse/NUTCH-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517962#comment-16517962
 ] 

Sebastian Nagel commented on NUTCH-2605:
----------------------------------------

{quote}
given that this metadata piece was generated by the feed parser
{quote}
That's an optimistic assumption. Feed parser uses generic keys to pass metadata 
from parser to indexing filter: "author", "tag", "published", "updated", 
"feed". If another (custom) parser or parse filter plugin uses the same key 
names, the metadata values may look unexpectedly different. The feed plugin 
should better
- use more specific keys in parse metadata (eg. "feed.author")
- and also catch possible runtime exceptions

But this is only one plausible explanation. [~ArkadiKosmynin], could you share 
a document (and also the Nutch version and configuration) to reproduce this 
problem?

> The Feed plugin causes a NumberFormatException
> ----------------------------------------------
>
>                 Key: NUTCH-2605
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2605
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, parser, plugin
>    Affects Versions: 1.14
>            Reporter: Arkadi Kosmynin
>            Priority: Major
>             Fix For: 1.15
>
>
> The Feed plugin seems to have a major problem. The line 102 inĀ  
> FeedIndexingFilter.java generated a NumberFormatException (which caused the 
> failure of the entire crawling process!) because it was trying to parse a 
> date in string format, not a number. Given that this metadata piece was 
> generated by the feed parser (same plugin), it seems that the plugin is in 
> disagreement with itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to