Redis Liu created HIVE-22936: -------------------------------- Summary: NPE in SymbolicInputFormat Key: HIVE-22936 URL: https://issues.apache.org/jira/browse/HIVE-22936 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 3.1.2 Reporter: Redis Liu Attachments: npe-symbolic-inputformat.patch
h2. Symptom I was running Hive over AWS S3 Inventory Report, which uses SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 file, like : {code:java} s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code} When I have the following setting: {code:java} set hive.rework.mapredwork=true; {code} The job fails with *NullPointException*, without stack trace. h2. Cause The content of symlink may be arbitrary full qualified FS path, while SymbolicInputFormat uses the default FS instance to get the status of the data files, which fails (and returns null) when the schema of data file differs from Hive's default FS. Code point: [https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78] {code:java} // "fileSystem" may not be able to list status for given file uri. FileStatus[] matches = fileSystem.globStatus(new Path(line));{code} h2. Fix Please check attached npe-symbolic-inputformat.patch -- This message was sent by Atlassian Jira (v8.3.4#803005)