Hengyu Dai created HIVE-18441:
---------------------------------

             Summary: NullPointerException due to NullScanOptimizer doesn't 
compatible with Hadoop 2.2
                 Key: HIVE-18441
                 URL: https://issues.apache.org/jira/browse/HIVE-18441
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 2.3.0, 2.2.0, 2.1.1
            Reporter: Hengyu Dai


Hive 2.x is not compatible with hadoop 2.2 (maybe there is same problem in 
other hadoop version too) when NullScanOptimizer is applied.
here is the listStatus() method in Hadoop23Shims.java

{code:java}
protected List<FileStatus> listStatus(JobContext job) throws IOException {
        List<FileStatus> result = super.listStatus(job);
        Iterator<FileStatus> it = result.iterator();
        while (it.hasNext()) {
          FileStatus stat = it.next();
          if (!stat.isFile() || (stat.getLen() == 0 && 
!stat.getPath().toUri().getScheme().equals("nullscan"))) {
            it.remove();
          }
        }
        return result;
      }
{code}
the first line "super.listStatus(job)" get different FileStatus object from 
Hadoop 2.2 and Hadoop 2.9

I have tested Hive2.1 with Hadoop2.2, Hive2.1 with Hadoop2.9, and NPE occurs in 
Hive2.1 with Hadoop2.2
My test SQL is 
{code:java}
select * from (select key from src where false) a left outer join (select key 
from srcpart limit 0) b on a.key=b.key;
{code}
it's from optimize_nullscan.q, table src and srcpart in the SQL is created by 
q_test_init.sql.

the problem is, in hadoop 2.2, super.listStatus(job) returns a FileStatus 
object whose "Path" field doesn't contain a schema for "nullscan" path, so, 
"stat.getPath().toUri().getScheme()" in the if statement get NULL, and call 
null.equals("nullscan") will lead NPE.
In contrast, super.listStatus(job) will get a valid Path whose schema is 
"nullscan".

the debug pictures from Hadoop 2.2 and Hadoop 2.9 is attached, we can see the 
result list returned by super.listStatus(job) is different, Hadoop 2.2 gets 
"/default.srcpart/part..." and Hadoop 2.9 get 
"nullscan://null/default.srcpart/part..."
(this bug is not happened with normal path like "hdfs://..." )

we should take consideration of stat.getPath().toUri().getScheme() returns null.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to