[
https://issues.apache.org/jira/browse/NUTCH-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517962#comment-16517962
]
Sebastian Nagel commented on NUTCH-2605:
----------------------------------------
{quote}
given that this metadata piece was generated by the feed parser
{quote}
That's an optimistic assumption. Feed parser uses generic keys to pass metadata
from parser to indexing filter: "author", "tag", "published", "updated",
"feed". If another (custom) parser or parse filter plugin uses the same key
names, the metadata values may look unexpectedly different. The feed plugin
should better
- use more specific keys in parse metadata (eg. "feed.author")
- and also catch possible runtime exceptions
But this is only one plausible explanation. [~ArkadiKosmynin], could you share
a document (and also the Nutch version and configuration) to reproduce this
problem?
> The Feed plugin causes a NumberFormatException
> ----------------------------------------------
>
> Key: NUTCH-2605
> URL: https://issues.apache.org/jira/browse/NUTCH-2605
> Project: Nutch
> Issue Type: Bug
> Components: indexer, parser, plugin
> Affects Versions: 1.14
> Reporter: Arkadi Kosmynin
> Priority: Major
> Fix For: 1.15
>
>
> The Feed plugin seems to have a major problem. The line 102 inĀ
> FeedIndexingFilter.java generated a NumberFormatException (which caused the
> failure of the entire crawling process!) because it was trying to parse a
> date in string format, not a number. Given that this metadata piece was
> generated by the feed parser (same plugin), it seems that the plugin is in
> disagreement with itself.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)