[ 
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985436#comment-13985436
 ] 

Julien Nioche commented on NUTCH-1714:
--------------------------------------

Hi [~alparslan.avci]

I have been trying your patch and found several issues. They might not be 
directly caused by it but could be related to Gora 0.4. BTW can I suggest a 
change of title for this issue to "Upgrade to Gora 0.4" now that it has been 
released? I am running a crawl on 1.3M URLs in pseudo-distributed mode with 
HBase.

* There is no progression of the complete status of mappers : they go from 0% 
to 100% for the tasks taking the input from GORA i.e not the injection
* The whole content of the webtable seems to be taken as input for mapreduce. I 
assumed it wouldn't be the case for [GORA-119] and that the fetch step for 
instance would get only the entries marked by the Generator. There is 
[NUTCH-1674] but this should only add the batchID to the filters according to 
its title.
* ./nutch readdb -crawlId MYCRAWLIDHERE  -stats gets 0 docs but I can see the 
corresponding table in HBase.

Thanks! Julien

 

> Nutch 2.x upgrade to use GORA_94 branch
> ---------------------------------------
>
>                 Key: NUTCH-1714
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1714
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, 
> NUTCH-1714v2.patch, NUTCH-1714v4.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the 
> details in this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to