[
https://issues.apache.org/jira/browse/HIVE-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321789#comment-16321789
]
Hengyu Dai commented on HIVE-18441:
-----------------------------------
stack trace:
{code:java}
java.lang.NullPointerException
at
org.apache.hadoop.hive.shims.Hadoop23Shims$1.listStatus(Hadoop23Shims.java:134)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:319)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:425)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:532)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:518)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:416)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
{code}
> NullPointerException due to NullScanOptimizer doesn't compatible with Hadoop
> 2.2
> --------------------------------------------------------------------------------
>
> Key: HIVE-18441
> URL: https://issues.apache.org/jira/browse/HIVE-18441
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 2.1.1, 2.2.0, 2.3.0
> Reporter: Hengyu Dai
> Attachments: HIVE-18441.patch, hadoop2.2.jpg, hadoop2.9.jpg
>
>
> Hive 2.x is not compatible with hadoop 2.2 (maybe there is same problem in
> other hadoop version too) when NullScanOptimizer is applied.
> here is the listStatus() method in Hadoop23Shims.java
> {code:java}
> protected List<FileStatus> listStatus(JobContext job) throws IOException {
> List<FileStatus> result = super.listStatus(job);
> Iterator<FileStatus> it = result.iterator();
> while (it.hasNext()) {
> FileStatus stat = it.next();
> if (!stat.isFile() || (stat.getLen() == 0 &&
> !stat.getPath().toUri().getScheme().equals("nullscan"))) {
> it.remove();
> }
> }
> return result;
> }
> {code}
> the first line "super.listStatus(job)" get different FileStatus object from
> Hadoop 2.2 and Hadoop 2.9
> I have tested Hive2.1 with Hadoop2.2, Hive2.1 with Hadoop2.9, and NPE occurs
> in Hive2.1 with Hadoop2.2
> My test SQL is
> {code:java}
> select * from (select key from src where false) a left outer join (select key
> from srcpart limit 0) b on a.key=b.key;
> {code}
> it's from optimize_nullscan.q, table src and srcpart in the SQL is created by
> q_test_init.sql.
> the problem is, in hadoop 2.2, super.listStatus(job) returns a FileStatus
> object whose "Path" field doesn't contain a schema for "nullscan" path, so,
> "stat.getPath().toUri().getScheme()" in the if statement get NULL, and call
> null.equals("nullscan") will lead NPE.
> In contrast, super.listStatus(job) will get a valid Path whose schema is
> "nullscan".
> the debug pictures from Hadoop 2.2 and Hadoop 2.9 is attached, we can see the
> result list returned by super.listStatus(job) is different, Hadoop 2.2 gets
> "/default.srcpart/part..." and Hadoop 2.9 get
> "nullscan://null/default.srcpart/part..."
> (this bug is not happened with normal path like "hdfs://..." )
> we should take consideration of stat.getPath().toUri().getScheme() returns
> null.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)