[
https://issues.apache.org/jira/browse/HADOOP-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573133#action_12573133
]
Hadoop QA commented on HADOOP-2853:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376611/sequenceWritable-v5.patch
against trunk revision 619744.
@author +1. The patch does not contain any @author tags.
tests included +1. The patch appears to include 4 new or modified tests.
javadoc -1. The javadoc tool appears to have generated 1 warning messages.
javac +1. The applied patch does not generate any new javac compiler
warnings.
release audit +1. The applied patch does not generate any new release
audit warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1854/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1854/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1854/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1854/console
This message is automatically generated.
> Add Writable for very large lists of key / value pairs
> ------------------------------------------------------
>
> Key: HADOOP-2853
> URL: https://issues.apache.org/jira/browse/HADOOP-2853
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Affects Versions: 0.17.0
> Reporter: Andrzej Bialecki
> Fix For: 0.17.0
>
> Attachments: sequenceWritable-v1.patch, sequenceWritable-v2.patch,
> sequenceWritable-v3.patch, sequenceWritable-v4.patch,
> sequenceWritable-v5.patch
>
>
> Some map-reduce jobs need to aggregate and process very long lists as a
> single value. This usually happens when keys from a large domain are mapped
> into a small domain, and their associated values cannot be aggregated into
> few values but need to be preserved as members of a large list. Currently
> this can be implemented as a MapWritable or ArrayWritable - however, Hadoop
> needs to deserialize the current key and value completely into memory, which
> for extremely large values causes frequent OOM exceptions. This also works
> only with lists of relatively small size (e.g. 1000 records).
> This patch is an implementation of a Writable that can handle arbitrarily
> long lists. Initially it keeps an internal buffer (which can be
> (de)-serialized in the ordinary way), and if the list size exceeds certain
> threshold it is spilled to an external SequenceFile (hence the name) on a
> configured FileSystem. The content of this Writable can be iterated, and the
> data is pulled either from the internal buffer or from the external file in a
> transparent way.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.