[ 
https://issues.apache.org/jira/browse/HIVE-22936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22936:
----------------------------------
    Labels: pull-request-available  (was: )

> NPE in SymbolicInputFormat
> --------------------------
>
>                 Key: HIVE-22936
>                 URL: https://issues.apache.org/jira/browse/HIVE-22936
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 3.1.2
>            Reporter: Redis Liu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: npe-symbolic-inputformat.patch
>
>
> h2. Symptom
> I was running Hive over AWS S3 Inventory Report, which uses 
> SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each 
> s3 file, like :
> {code:java}
> s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
> s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
> When I have the following setting:
> {code:java}
> set hive.rework.mapredwork=true;  
> {code}
> The job fails with *NullPointException*, without stack trace.
> h2. Cause
> The content of symlink may be arbitrary full qualified FS path, while 
> SymbolicInputFormat uses the default FS instance to get the status of the 
> data files, which fails (and returns null) when the schema of data file 
> differs from Hive's default FS.
> Code point:
> [https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]
> {code:java}
>               // "fileSystem" may not be able to list status for given file 
> uri.
>               FileStatus[] matches = fileSystem.globStatus(new 
> Path(line));{code}
> h2. Fix
> Please check attached npe-symbolic-inputformat.patch
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to