Hi, Thank you for the report!
Can you open a JIRA for this issue? It sounds like a bug. Brock On Fri, Nov 1, 2013 at 2:23 AM, Mehant Baid <baid.meh...@gmail.com> wrote: > Hey Folks, > > Could you please take a look at the below problem. We are hitting > OutOfMemoryErrors while joining tables that are not managed by Hive. > > Would appreciate any feedback. > > Thanks > Mehant > > On 10/7/13 12:04 PM, Mehant Baid wrote: > >> Hey Folks, >> >> We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The >> problem seems to be in CommonJoinResolver.java (processCurrentTask()), in >> this function we try and convert a map-reduce join to a map join if 'n-1' >> tables involved in a 'n' way join have a size below a certain threshold. >> >> If the tables are maintained by hive then we have accurate sizes of each >> table and can apply this optimization but if the tables are created using >> storage handlers, HBaseStorageHanlder in our case then the size is set to >> be zero. Due to this we assume that we can apply the optimization and >> convert the map-reduce join to a map join. So we build a in-memory hash >> table for all the keys, since our table created using the storage handler >> is large, it does not fit in memory and we hit the error. >> >> Should I open a JIRA for this? One way to fix this is to set the size of >> the table (created using storage handler) to be equal to the map join >> threshold. This way the table would be selected as the big table and we can >> proceed with the optimization if other tables in the join have size below >> the threshold. If we have multiple big tables then the optimization would >> be turned off. >> >> Thanks >> Mehant >> > > -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org