[jira] Commented: (HADOOP-2853) Add Writable for very large lists of key / value pairs

Hadoop QA (JIRA) Thu, 21 Feb 2008 06:10:04 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571049#action_12571049
 ]


Hadoop QA commented on HADOOP-2853:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12376122/sequenceWritable-v3.patch
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included +1.  The patch appears to include 4 new or modified tests.

    javadoc -1.  The javadoc tool appears to have generated 1 warning messages.

    javac +1.  The applied patch does not generate any new javac compiler 
warnings.

    release audit +1.  The applied patch does not generate any new release 
audit warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/console

This message is automatically generated.

> Add Writable for very large lists of key / value pairs
> ------------------------------------------------------
>
>                 Key: HADOOP-2853
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2853
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>    Affects Versions: 0.17.0
>            Reporter: Andrzej Bialecki 
>             Fix For: 0.17.0
>
>         Attachments: sequenceWritable-v1.patch, sequenceWritable-v2.patch, 
> sequenceWritable-v3.patch
>
>
> Some map-reduce jobs need to aggregate and process very long lists as a 
> single value. This usually happens when keys from a large domain are mapped 
> into a small domain, and their associated values cannot be aggregated into 
> few values but need to be preserved as members of a large list. Currently 
> this can be implemented as a MapWritable or ArrayWritable - however, Hadoop 
> needs to deserialize the current key and value completely into memory, which 
> for extremely large values causes frequent OOM exceptions. This also works 
> only with lists of relatively small size (e.g. 1000 records).
> This patch is an implementation of a Writable that can handle arbitrarily 
> long lists. Initially it keeps an internal buffer (which can be 
> (de)-serialized in the ordinary way), and if the list size exceeds certain 
> threshold it is spilled to an external SequenceFile (hence the name) on a 
> configured FileSystem. The content of this Writable can be iterated, and the 
> data is pulled either from the internal buffer or from the external file in a 
> transparent way.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2853) Add Writable for very large lists of key / value pairs

Reply via email to