For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml:
<property> <name>hive.mapjoin.cache.numrows</name> <value>25000</value> <description>How many rows should be cached by jdbm for map join. </description> </property> hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA issue for that? JVS On Aug 19, 2010, at 1:07 AM, Ted Xu wrote: Hi all, I found 2 parameters which have something to do with mapjoin, that is : hive.mapjoin.cache.numrows hive.mapjoin.size.key I can't find any formal document on that 2 parameters. I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the small table in map join, and rows more than that setting will be disposed. Once I use map join with a 50000+ rows table, some records can't be joined, and I fixed the problem by increasing "hive.mapjoin.cache.numrows". However, sometimes I still get OOM exception even if the "hive.mapjoin.cache.numrows" parameter is not set (by default, 25000 I guess). Please explain me the usage of the parameters if you know, thanks. -- Best Regards, Ted Xu
