Hello Owen, > Keys and values can be large. They are certainly capped above by > Java's 2GB limit on byte arrays. More practically, you will have > problems running out of memory with keys or values of 100 MB. There is > no restriction that a key/value pair fits in a single hdfs block, but > performance would suffer. (In particular, the FileInputFormats split > at block sized chunks, which means you will have maps that scan an > entire block without processing anything.)
Thanks for the quick reply. Could you perhaps elaborate on that 100 MB limit ? Is that due to a limit that is caused by the Java VM heap size ? If so, could that, for example, be increased to 512MB by setting mapred.child.java.opts to '-Xmx512m' ? Regards, Leon Mergen