Hi, I have a query which joins 12 different tables (most of them left outer joins) and the query takes almost 3 hours. 90% of the time is taken by a single reducer. One reducer is getting bulk of the data to process.
How can I get around this and have fair distribution of data across all reducers? I tried to enable the skewjoin optimization but getting below NPE after first step of the job is executed. Any suggestions/ideas will be or great help. Thanks, Shantian 2011-06-07 19:22:28,923 Stage-11 map = 100%, reduce = 85% 2011-06-07 19:22:30,932 Stage-11 map = 100%, reduce = 100% Ended Job = job_201106071542_0010 java.lang.NullPointerException at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97) at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask hive>