[ 
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986370#comment-13986370
 ] 

Lewis John McGibbney commented on NUTCH-1714:
---------------------------------------------

Hey [~jnioche], thanks for looking in to the patch. Answers below
bq. •There is no progression of the complete status of mappers : they go from 
0% to 100% for the tasks taking the input from GORA i.e not the injection
Honestly, I have no idea here... we need to find out WTF is wrong 
bq. •The whole content of the webtable seems to be taken as input for 
mapreduce. I assumed it wouldn't be the case for GORA-119 and that the fetch 
step for instance would get only the entries marked by the Generator. There is 
NUTCH-1674 but this should only add the batchID to the filters according to its 
title.
OK so I wonder if this patch _just_ upgrades to use 0.4 or if it upgrades to 
0.4 _and_ upgrades to use the new *filter* API |0|? It is my thought that the 
former is the truth. I need to look in to the patch... which unfortunately I 
cannot do right now :( If this is true, then we need to open a separate issue 
and upgrade to use the filter API as well. This will not be difficult as we 
know the tools which use the existing Query API.
bq. •./nutch readdb -crawlId MYCRAWLIDHERE -stats gets 0 docs but I can see the 
corresponding table in HBase.
OK so when we read XML mappings (e.g. gora-hbase-mapping.xml) and *initialize* 
a Gora datastore the table is created no matter if data is written or read. Are 
you expecting to see Records? Or are you just surprised that the table is there 
and no Records?
 
|0| 
https://svn.apache.org/repos/asf/gora/trunk/gora-core/src/main/java/org/apache/gora/filter/

> Nutch 2.x upgrade to use GORA_94 branch
> ---------------------------------------
>
>                 Key: NUTCH-1714
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1714
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, 
> NUTCH-1714v2.patch, NUTCH-1714v4.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the 
> details in this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to