Thanks John, I'll create an issue for that. PS: So in mapjoin only the first 25000 rows in the small table will be cached by default, I'm I right? If the small table is more than 25000 rows, we will miss certain proportion of data without any warning or exception?
在 2010年8月20日 上午4:56,John Sichi <[email protected]>写道: > For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml: > > <property> > <name>hive.mapjoin.cache.numrows</name> > <value>25000</value> > <description>How many rows should be cached by jdbm for map join. > </description> > </property> > > hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA > issue for that? > > JVS > > On Aug 19, 2010, at 1:07 AM, Ted Xu wrote: > > Hi all, > > I found 2 parameters which have something to do with mapjoin, that is : > > hive.mapjoin.cache.numrows > hive.mapjoin.size.key > > I can't find any formal document on that 2 parameters. > > I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the > small table in map join, and rows more than that setting will be disposed. > Once I use map join with a 50000+ rows table, some records can't be joined, > and I fixed the problem by increasing "hive.mapjoin.cache.numrows". > > However, sometimes I still get OOM exception even if the " > hive.mapjoin.cache.numrows" parameter is not set (by default, 25000 I > guess). > > Please explain me the usage of the parameters if you know, thanks. > > -- > Best Regards, > Ted Xu > > > -- Best Regards, Ted Xu
