Skew Join Optimization in hive

Shantian Purkad Tue, 07 Jun 2011 12:36:39 -0700

Hi,

I have a query which joins 12 different tables (most of them left outer joins) 
and the query takes almost 3 hours. 90% of the time is taken by a single 
reducer. One reducer is getting bulk of the data to process.


How can I get around this and have fair distribution of data across all 
reducers? I tried to enable the skewjoin optimization but getting below NPE 
after first step of the job is executed.

Any suggestions/ideas will be or great help.

Thanks,
Shantian

2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
Ended Job = job_201106071542_0010
java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
    at 
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.ConditionalTask
hive>

Skew Join Optimization in hive

Reply via email to