RE: Hive local mode on by default for 0.6.0

Joydeep Sen Sarma Mon, 09 Aug 2010 18:30:28 -0700

We enabled a feature called 'auto-local mode' (hive-1408). The query processor 
looks at the size of the input and decides dynamically whether local mode 
execution can be done. The determination is done on a per job level for a 
multi-job query.


We enabled it by default in trunk so it can get some coverage. Local mode 
support in 0.6 has some bugs (in fact a big part of this jira was a 
comprehensive test for local mode and small fixes for the bugs that this 
uncovered). The relevant option is:

set hive.exec.mode.local.auto=<true/false>


I have been a little worried about enabling this by default - we can turn it 
off if required. The case that worries me the most is if a lot of users refer 
to scripts (via transform clauses) that are only available in the cluster nodes 
and not in the client node. Another assumption is that mapred.local.dir is set 
to a value valid on the client side (which may not be the case if the same 
hadoop config is being shared across client and server side).

Promise to add some documentation on the wiki about this ASAP.

-----Original Message-----
From: Edward Capriolo [mailto:[email protected]] 
Sent: Monday, August 09, 2010 2:22 PM
To: <[email protected]>
Subject: Hive local mode on by default for 0.6.0

I already caugh someone on IRC who was very surprised by the local
mode in hive trunk. Is local mode on by default?

Do you think the release 0.6.0 should have this on by default? There
have been a few issues like HIVE-1520, and it seems like letting this
out in the wild without actively turning it on might find edge cases
and complications.

Regards,
Edward

RE: Hive local mode on by default for 0.6.0

Reply via email to