[ https://issues.apache.org/jira/browse/HADOOP-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571049#action_12571049 ]
Hadoop QA commented on HADOOP-2853: ----------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12376122/sequenceWritable-v3.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1827/console This message is automatically generated. > Add Writable for very large lists of key / value pairs > ------------------------------------------------------ > > Key: HADOOP-2853 > URL: https://issues.apache.org/jira/browse/HADOOP-2853 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Affects Versions: 0.17.0 > Reporter: Andrzej Bialecki > Fix For: 0.17.0 > > Attachments: sequenceWritable-v1.patch, sequenceWritable-v2.patch, > sequenceWritable-v3.patch > > > Some map-reduce jobs need to aggregate and process very long lists as a > single value. This usually happens when keys from a large domain are mapped > into a small domain, and their associated values cannot be aggregated into > few values but need to be preserved as members of a large list. Currently > this can be implemented as a MapWritable or ArrayWritable - however, Hadoop > needs to deserialize the current key and value completely into memory, which > for extremely large values causes frequent OOM exceptions. This also works > only with lists of relatively small size (e.g. 1000 records). > This patch is an implementation of a Writable that can handle arbitrarily > long lists. Initially it keeps an internal buffer (which can be > (de)-serialized in the ordinary way), and if the list size exceeds certain > threshold it is spilled to an external SequenceFile (hence the name) on a > configured FileSystem. The content of this Writable can be iterated, and the > data is pulled either from the internal buffer or from the external file in a > transparent way. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.