Hengyu Dai created HIVE-18441: --------------------------------- Summary: NullPointerException due to NullScanOptimizer doesn't compatible with Hadoop 2.2 Key: HIVE-18441 URL: https://issues.apache.org/jira/browse/HIVE-18441 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.3.0, 2.2.0, 2.1.1 Reporter: Hengyu Dai
Hive 2.x is not compatible with hadoop 2.2 (maybe there is same problem in other hadoop version too) when NullScanOptimizer is applied. here is the listStatus() method in Hadoop23Shims.java {code:java} protected List<FileStatus> listStatus(JobContext job) throws IOException { List<FileStatus> result = super.listStatus(job); Iterator<FileStatus> it = result.iterator(); while (it.hasNext()) { FileStatus stat = it.next(); if (!stat.isFile() || (stat.getLen() == 0 && !stat.getPath().toUri().getScheme().equals("nullscan"))) { it.remove(); } } return result; } {code} the first line "super.listStatus(job)" get different FileStatus object from Hadoop 2.2 and Hadoop 2.9 I have tested Hive2.1 with Hadoop2.2, Hive2.1 with Hadoop2.9, and NPE occurs in Hive2.1 with Hadoop2.2 My test SQL is {code:java} select * from (select key from src where false) a left outer join (select key from srcpart limit 0) b on a.key=b.key; {code} it's from optimize_nullscan.q, table src and srcpart in the SQL is created by q_test_init.sql. the problem is, in hadoop 2.2, super.listStatus(job) returns a FileStatus object whose "Path" field doesn't contain a schema for "nullscan" path, so, "stat.getPath().toUri().getScheme()" in the if statement get NULL, and call null.equals("nullscan") will lead NPE. In contrast, super.listStatus(job) will get a valid Path whose schema is "nullscan". the debug pictures from Hadoop 2.2 and Hadoop 2.9 is attached, we can see the result list returned by super.listStatus(job) is different, Hadoop 2.2 gets "/default.srcpart/part..." and Hadoop 2.9 get "nullscan://null/default.srcpart/part..." (this bug is not happened with normal path like "hdfs://..." ) we should take consideration of stat.getPath().toUri().getScheme() returns null. -- This message was sent by Atlassian JIRA (v6.4.14#64029)