[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987602#comment-13987602
]
Julien Nioche commented on NUTCH-1714:
--------------------------------------
bq. I do not know if you have tested the patch, but it fixes the problem with
last update.
I did test it (hence my assertion that it did not work) but must have done
something wrong, which is not surprising given that I had various patches on
the code. I tried again from a clean copy of the repo and it solves the issue
indeed. Thanks
bq. The reason for the readdb problem is that it tries to get all fields from
webpage table, and it uses WebPage._ALL_FIELDS array to achieve this. However,
this array also contains __gdirty field which is used to save dirty fields of
the persistent class. This field is not stored in database. Thus, when db is
queried with this field, no results will be returned.
Thanks for the explanation
bq. In the patch I have removed __gdirty field directly from the fields sent to
the query, since it is always at the first positon of the _ALL_FIELDS array.
This will fix the problem. However, I will also send a mail to dev@gora and
discuss if we should remove this field from persistent class' _ALL_FIELDS
array. Then, we can use WebPage._ALL_FIELDS directly in here.
Good idea.
I will comment about the filtering on NUTCH-1674 and do more testing before I
commit this patch
Thanks for your work!
Julien
> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
> Key: NUTCH-1714
> URL: https://issues.apache.org/jira/browse/NUTCH-1714
> Project: Nutch
> Issue Type: Improvement
> Reporter: Alparslan Avcı
> Assignee: Alparslan Avcı
> Fix For: 2.3
>
> Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch,
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the
> details in this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)