Julien Nioche created NUTCH-1777:
------------------------------------
Summary: Fetcher not getting all the entries in input
Key: NUTCH-1777
URL: https://issues.apache.org/jira/browse/NUTCH-1777
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 2.2.1
Reporter: Julien Nioche
Fix For: 2.3
See comments in [NUTCH-1714] :
bq. The Generator marks 50K entries with GENERATE_MARK but the Fetcher shows
only 49,461 as Map Input Records (and the same number as Reduce input records)
=> looks like we are not getting all the records we should be getting. I dumped
the content of the table pre-fetching and it contains the right number of
entries i.e. 50K
This was noticed after applying [NUTCH-1714] and [NUTCH-1674] but could also
have been the case before that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)