Teng Qiu created SPARK-2700:
-------------------------------
Summary: Hidden files (such as .impala_insert_staging) should be
filtered out by sqlContext.parquetFile
Key: SPARK-2700
URL: https://issues.apache.org/jira/browse/SPARK-2700
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 1.0.1
Reporter: Teng Qiu
when creating a table in impala, a hidden folder .impala_insert_staging will be
created in the folder of table.
if we want to load such a table using Spark SQL API sqlContext.parquetFile,
this hidden folder makes trouble, spark try to get metadata from this folder,
you will see the exception:
{code:borderStyle=solid}
Caused by: java.io.IOException: Could not read footer for file
FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging;
isDirectory=true; modification_time=1406333729252; access_time=0; owner=hdfs;
group=hdfs; permission=rwxr-xr-x; isSymlink=false}
...
...
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is
not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
{code}
and impala side do not think this is their problem, so maybe we should filter
out these hidden folder/file by reading parquet tables
--
This message was sent by Atlassian JIRA
(v6.2#6252)