[
https://issues.apache.org/jira/browse/NUTCH-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170827#comment-14170827
]
Damien Raude-Morvan commented on NUTCH-1843:
--------------------------------------------
Hi all,
Thanks to your feedback [~lewismc] and [~kirilme] !
I was able to reproduce the given stacktrace on my own workstation.
So here is results of my investigation :
* Utf8 is already handled in MongoStore during #put or #get methods. As already
said by [Renato MarroquĂn Mogrovejo on
gora-dev|http://www.mail-archive.com/dev%40gora.apache.org/msg05360.html], I
was able to store a Nutch WebPage without any major issues.
* As spotted by [Lewis John
Mcgibbney|http://www.mail-archive.com/dev%40gora.apache.org/msg05356.html],
issue comes from handling of Utf8 as operands of MapFieldValueFilter. I will
provide a patch to Gora to ensure we transform Utf8 to String in MongoStore
implementation of filtering.
But since "operands" is of type "Object", we might received other types of
object, which might not be handled correctly.
I think we have to provide a more robust handling of datatype in Gora filters.
Regards,
> Upgrade to Gora 0.5
> -------------------
>
> Key: NUTCH-1843
> URL: https://issues.apache.org/jira/browse/NUTCH-1843
> Project: Nutch
> Issue Type: Improvement
> Components: build, storage
> Reporter: Lewis John McGibbney
> Assignee: Talat UYARER
> Fix For: 2.3
>
> Attachments: NUTCH-1843.patch, NUTCH-1843v2.patch
>
>
> We just released Gora 0.5
> http://www.mail-archive.com/dev%40gora.apache.org/msg05236.html
> We should upgrade before releasing Nutch 2.3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)