[ 
https://issues.apache.org/jira/browse/NUTCH-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney updated NUTCH-676:
--------------------------------

    Attachment: NUTCH-676_v2.patch

Patch for the issue.

Bumps CrawlDatum version and starts using o.a.h.io.MapWritable in CrawlDatum. 
Compatibility
is preserved by keeping nutch's MapWritable around and adding extra code for 
reading from nutch MapWritable if CrawlDatum version is 6.

Also changes CrawlDatum#toString as hadoop's MapWritable does not have a good 
toString method.

> MapWritable is written inefficiently and confusingly
> ----------------------------------------------------
>
>                 Key: NUTCH-676
>                 URL: https://issues.apache.org/jira/browse/NUTCH-676
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>         Attachments: 
> 0001-NUTCH-676-Replace-MapWritable-implementation-with-t.patch, 
> NUTCH-676_v2.patch
>
>
> The MapWritable implemention in o.a.n.crawl is written confusingly - it 
> maintains its own internal linked list which I think may have a bug somewhere 
> (I'm getting an NPE in certain cases in the code, though it's hard to track 
> down)
> Can anyone comment as to why MapWritable is written the way it is, rather 
> than just using a HashMap or a LinkedHashMap if consistent ordering is 
> important? I imagine that would improve performance.
> What about just using the Hadoop MapWritable? Obviously that would break some 
> backwards compatibility but it may be a good idea at some point to reduce 
> confusion (I didn't realize that Nutch had its own impl until a few minutes 
> ago)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to