[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993515#comment-13993515
]
Julien Nioche commented on NUTCH-1714:
--------------------------------------
A few other things I noticed in my test crawl :
* the Generator marks 50K entries with GENERATE_MARK but the Fetcher shows only
49,461 as Map Input Records (and the same number as Reduce input records) =>
looks like we are not getting all the records we should be getting. I dumped
the content of the table pre-fetching and it contains the right number of
entries i.e. 50K
* The Generator displayed 'generated batch id: 1399626659-15643 containing 0
URLs' but as I just explained it marked 50K entries correctly
* The dump of the webtable contains 'markers:
org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c'. It should display the
values correctly.
Thanks
Julien
> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
> Key: NUTCH-1714
> URL: https://issues.apache.org/jira/browse/NUTCH-1714
> Project: Nutch
> Issue Type: Improvement
> Reporter: Alparslan Avcı
> Assignee: Alparslan Avcı
> Fix For: 2.3
>
> Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch,
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the
> details in this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)