[ 
https://issues.apache.org/jira/browse/NUTCH-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170827#comment-14170827
 ] 

Damien Raude-Morvan commented on NUTCH-1843:
--------------------------------------------

Hi all,

Thanks to your feedback [~lewismc] and [~kirilme] !
I was able to reproduce the given stacktrace on my own workstation.

So here is results of my investigation :
* Utf8 is already handled in MongoStore during #put or #get methods. As already 
said by [Renato MarroquĂ­n Mogrovejo on 
gora-dev|http://www.mail-archive.com/dev%40gora.apache.org/msg05360.html], I 
was able to store a Nutch WebPage without any major issues.
* As spotted by [Lewis John 
Mcgibbney|http://www.mail-archive.com/dev%40gora.apache.org/msg05356.html], 
issue comes from handling of Utf8 as operands of MapFieldValueFilter. I will 
provide a patch to Gora to ensure we transform Utf8 to String in MongoStore 
implementation of filtering.

But since "operands" is of type "Object", we might received other types of 
object, which might not be handled correctly.
I think we have to provide a more robust handling of datatype in Gora filters.

Regards,

> Upgrade to Gora 0.5
> -------------------
>
>                 Key: NUTCH-1843
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1843
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build, storage
>            Reporter: Lewis John McGibbney
>            Assignee: Talat UYARER
>             Fix For: 2.3
>
>         Attachments: NUTCH-1843.patch, NUTCH-1843v2.patch
>
>
> We just released Gora 0.5 
> http://www.mail-archive.com/dev%40gora.apache.org/msg05236.html
> We should upgrade before releasing Nutch 2.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to