[ 
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994098#comment-13994098
 ] 

Lewis John McGibbney commented on NUTCH-1714:
---------------------------------------------

Hi [~alparslan.avci] and [~jnioche]: some comments
1.
bq. About this problem, I think it is not about gora-hbase-0.4 and exists from 
the beginning of Gora project for HBase.
There is nothing 'bad' here, what is wrong will become clear if you look into 
the following code
https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65

2.
bq. The code has changed since the last patch and we are now getting : 
[~jnioche], this is addressed in my new patch... must have been a trivial 
mistake/revert on [~alparslan.avci]'s patch :)

3. 
[~jnioche]
bq. when the parser fails. This is due to status.getArgs() returning null. 
I've now hopefully fixed this in my new patch. 

4. 
[~jnioche]
bq. WebTableReader should also remove the dirty field in processDumpJob 
{code:title=WebTableReader.java|borderStyle=solid}
    WebPage page = new WebPage();
    ArrayList<String> queryFields = new ArrayList<String>();
    for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) {
      queryFields.add(page.getSchema().getFields().toString());
    }
    query.setFields((String[]) queryFields.toArray());
{code}
I am not particularly happy with this (and I am actively testing it so still 
have my own comments to pass on) if you can suggest a better way to remove the 
Field at position 0 in the array then we can go with that. I also don't really 
like the cast within the call to query.setFields. WDYT?

[~jnioche], regarding your most recent _observations_, I will also add to these 
once I've seen my crawler(s) running for a bit longer over a number of 
different scenarios.
Thanks for the comments, these are excellent and this is not particularly easy 
as Gora 0.4 was a MAJOR release with many changes over all back ends. 
Persistency is something we need to get right so I don't mind taking time to 
get this right.


> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
>                 Key: NUTCH-1714
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1714
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, 
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the 
> details in this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to