Re: Mapjoin parameters?

Ted Xu Thu, 19 Aug 2010 18:45:02 -0700

Thanks John, I'll create an issue for that.

PS: So in mapjoin only the first 25000 rows in the small table will be
cached by default, I'm I right? If the small table is more than 25000 rows,
we will miss certain proportion of data without any warning or exception?


在 2010年8月20日 上午4:56，John Sichi <[email protected]>写道：

> For hive.mapjoin.cache.numrows, I found this in hive/conf/hive-default.xml:
>
> <property>
>   <name>hive.mapjoin.cache.numrows</name>
>   <value>25000</value>
>   <description>How many rows should be cached by jdbm for map join.
> </description>
> </property>
>
> hive.mapjoin.size is missing from hive-default.xml; can you create a JIRA
> issue for that?
>
> JVS
>
> On Aug 19, 2010, at 1:07 AM, Ted Xu wrote:
>
> Hi all,
>
> I found 2 parameters which have something to do with mapjoin, that is :
>
> hive.mapjoin.cache.numrows
> hive.mapjoin.size.key
>
> I can't find any formal document on that 2 parameters.
>
> I guess "hive.mapjoin.cache.numrows" sets the maximum row count of the
> small table in map join, and rows more than that setting will be disposed.
> Once I use map join with a 50000+ rows table, some records can't be joined,
> and I fixed the problem by increasing "hive.mapjoin.cache.numrows".
>
> However, sometimes I still get OOM exception even if the "
> hive.mapjoin.cache.numrows" parameter is not set (by default, 25000 I
> guess).
>
> Please explain me the usage of the parameters if you know, thanks.
>
> --
> Best Regards,
> Ted Xu
>
>
>


-- 
Best Regards,
Ted Xu

Re: Mapjoin parameters?

Reply via email to