[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985436#comment-13985436
]
Julien Nioche commented on NUTCH-1714:
--------------------------------------
Hi [~alparslan.avci]
I have been trying your patch and found several issues. They might not be
directly caused by it but could be related to Gora 0.4. BTW can I suggest a
change of title for this issue to "Upgrade to Gora 0.4" now that it has been
released? I am running a crawl on 1.3M URLs in pseudo-distributed mode with
HBase.
* There is no progression of the complete status of mappers : they go from 0%
to 100% for the tasks taking the input from GORA i.e not the injection
* The whole content of the webtable seems to be taken as input for mapreduce. I
assumed it wouldn't be the case for [GORA-119] and that the fetch step for
instance would get only the entries marked by the Generator. There is
[NUTCH-1674] but this should only add the batchID to the filters according to
its title.
* ./nutch readdb -crawlId MYCRAWLIDHERE -stats gets 0 docs but I can see the
corresponding table in HBase.
Thanks! Julien
> Nutch 2.x upgrade to use GORA_94 branch
> ---------------------------------------
>
> Key: NUTCH-1714
> URL: https://issues.apache.org/jira/browse/NUTCH-1714
> Project: Nutch
> Issue Type: Improvement
> Reporter: Alparslan Avcı
> Assignee: Alparslan Avcı
> Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch,
> NUTCH-1714v2.patch, NUTCH-1714v4.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the
> details in this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)