[
https://issues.apache.org/jira/browse/HIVE-22936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-22936:
----------------------------------
Labels: pull-request-available (was: )
> NPE in SymbolicInputFormat
> --------------------------
>
> Key: HIVE-22936
> URL: https://issues.apache.org/jira/browse/HIVE-22936
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 3.1.2
> Reporter: Redis Liu
> Priority: Major
> Labels: pull-request-available
> Attachments: npe-symbolic-inputformat.patch
>
>
> h2. Symptom
> I was running Hive over AWS S3 Inventory Report, which uses
> SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each
> s3 file, like :
> {code:java}
> s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
> s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
> When I have the following setting:
> {code:java}
> set hive.rework.mapredwork=true;
> {code}
> The job fails with *NullPointException*, without stack trace.
> h2. Cause
> The content of symlink may be arbitrary full qualified FS path, while
> SymbolicInputFormat uses the default FS instance to get the status of the
> data files, which fails (and returns null) when the schema of data file
> differs from Hive's default FS.
> Code point:
> [https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]
> {code:java}
> // "fileSystem" may not be able to list status for given file
> uri.
> FileStatus[] matches = fileSystem.globStatus(new
> Path(line));{code}
> h2. Fix
> Please check attached npe-symbolic-inputformat.patch
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)