W.H. created HADOOP-12999:
-----------------------------
Summary: NPE when accessing (meta-)data via Hive query from S3
bucket
Key: HADOOP-12999
URL: https://issues.apache.org/jira/browse/HADOOP-12999
Project: Hadoop Common
Issue Type: Bug
Components: tools
Affects Versions: 2.7.2, 2.8.0
Environment: JDK8, Hive 2.0.0, Hadoop 2.7.2, also happens with Hadoop
2.8.0-SNAPSHOT (git revision ab67b50543e2e9dc48f2dcc00de18c2e2c6b4647)
Reporter: W.H.
Querying data stored in S3 via Hive 2.0.0 causes a NPE. The exception occurs
when Hive Metastore uses hadoop-aws tools to query the bucket structure in S3.
Example Hive query:
{code}
create external table if not exists test_table_2 (
id STRING, name STRING
)
LOCATION 's3://my-bucket/test/test_insert2/';
{code}
The required bucket folder exists properly in S3. (Also, there is a $folder$
entry on the same directory level which is an often used workaround for S3
tools that cannot handle empty folders):
{code}
$ s3cmd ls s3://my-bucket/test/test_insert2
DIR s3://my-bucket/test/test_insert2/
2016-04-04 10:44 0 s3://my-bucket/test/test_insert2_$folder$
{code}
The following is an excerpt of the stack trace in Hive console log:
{code}
exec.DDLTask (DDLTask.java:failed(541)) -
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:java.lang.NullPointerException)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4032)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:322)
...
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.fs.s3.S3FileSystem.makeAbsolute(S3FileSystem.java:132)
at
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:342)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:518)
at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:201)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1317)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1369)
... 39 more
```
{code}
After digging into the code, it appears that the root cause for this issue is
not in Hive, but in the way that hadoop-aws is querying the bucket information
from S3. We have verified that the Path parameter passed into
org.apache.hadoop.hive.common.FileUtils.mkdir(...) is indeed NOT null.
The issue can be easily reproduced using the following standalone piece of code:
{code}
FileSystem fs = new org.apache.hadoop.fs.s3.S3FileSystem();
Configuration conf = new Configuration();
conf.set("fs.s3.awsAccessKeyId", "...");
conf.set("fs.s3.awsSecretAccessKey", "...");
String url = "s3://my-bucket/test/test_insert2/";
fs.initialize(new URI(url), conf);
Path f1 = new Path(url);
boolean inheritPerms = true;
org.apache.hadoop.hive.common.FileUtils.mkdir(fs, f1, inheritPerms, conf);
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)