[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994098#comment-13994098
]
Lewis John McGibbney commented on NUTCH-1714:
---------------------------------------------
Hi [~alparslan.avci] and [~jnioche]: some comments
1.
bq. About this problem, I think it is not about gora-hbase-0.4 and exists from
the beginning of Gora project for HBase.
There is nothing 'bad' here, what is wrong will become clear if you look into
the following code
https://github.com/apache/gora/blob/trunk/gora-hbase/src/main/java/org/apache/gora/hbase/query/HBaseScannerResult.java#L65
2.
bq. The code has changed since the last patch and we are now getting :
[~jnioche], this is addressed in my new patch... must have been a trivial
mistake/revert on [~alparslan.avci]'s patch :)
3.
[~jnioche]
bq. when the parser fails. This is due to status.getArgs() returning null.
I've now hopefully fixed this in my new patch.
4.
[~jnioche]
bq. WebTableReader should also remove the dirty field in processDumpJob
{code:title=WebTableReader.java|borderStyle=solid}
WebPage page = new WebPage();
ArrayList<String> queryFields = new ArrayList<String>();
for (int i = 1; i < WebPage._ALL_FIELDS.length; i++) {
queryFields.add(page.getSchema().getFields().toString());
}
query.setFields((String[]) queryFields.toArray());
{code}
I am not particularly happy with this (and I am actively testing it so still
have my own comments to pass on) if you can suggest a better way to remove the
Field at position 0 in the array then we can go with that. I also don't really
like the cast within the call to query.setFields. WDYT?
[~jnioche], regarding your most recent _observations_, I will also add to these
once I've seen my crawler(s) running for a bit longer over a number of
different scenarios.
Thanks for the comments, these are excellent and this is not particularly easy
as Gora 0.4 was a MAJOR release with many changes over all back ends.
Persistency is something we need to get right so I don't mind taking time to
get this right.
> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
> Key: NUTCH-1714
> URL: https://issues.apache.org/jira/browse/NUTCH-1714
> Project: Nutch
> Issue Type: Improvement
> Reporter: Alparslan Avcı
> Assignee: Alparslan Avcı
> Fix For: 2.3
>
> Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch,
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the
> details in this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)