Teng Qiu created SPARK-2700:
-------------------------------

             Summary: Hidden files (such as .impala_insert_staging) should be 
filtered out by sqlContext.parquetFile
                 Key: SPARK-2700
                 URL: https://issues.apache.org/jira/browse/SPARK-2700
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 1.0.1
            Reporter: Teng Qiu


when creating a table in impala, a hidden folder .impala_insert_staging will be 
created in the folder of table.

if we want to load such a table using Spark SQL API sqlContext.parquetFile, 
this hidden folder makes trouble, spark try to get metadata from this folder, 
you will see the exception:

{code:borderStyle=solid}
Caused by: java.io.IOException: Could not read footer for file 
FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging;
 isDirectory=true; modification_time=1406333729252; access_time=0; owner=hdfs; 
group=hdfs; permission=rwxr-xr-x; isSymlink=false}
...
...
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is 
not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
{code}

and impala side do not think this is their problem, so maybe we should filter 
out these hidden folder/file by reading parquet tables



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to