[jira] [Commented] (SPARK-2700) Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile

Sean Owen (JIRA) Sat, 26 Jul 2014 02:02:33 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075309#comment-14075309
 ]


Sean Owen commented on SPARK-2700:
----------------------------------

(As a generic aside, yes, in general apps should never consume or read hidden 
"." files in HDFS by default. The convention is the same as in Linux. It's not 
an Impala thing.)

> Hidden files (such as .impala_insert_staging) should be filtered out by 
> sqlContext.parquetFile
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2700
>                 URL: https://issues.apache.org/jira/browse/SPARK-2700
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.0.1
>            Reporter: Teng Qiu
>
> when creating a table in impala, a hidden folder .impala_insert_staging will 
> be created in the folder of table.
> if we want to load such a table using Spark SQL API sqlContext.parquetFile, 
> this hidden folder makes trouble, spark try to get metadata from this folder, 
> you will see the exception:
> {code:borderStyle=solid}
> Caused by: java.io.IOException: Could not read footer for file 
> FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging;
>  isDirectory=true; modification_time=1406333729252; access_time=0; 
> owner=hdfs; group=hdfs; permission=rwxr-xr-x; isSymlink=false}
> ...
> ...
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is 
> not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
> {code}
> and impala side do not think this is their problem: 
> https://issues.cloudera.org/browse/IMPALA-837 (IMPALA-837 Delete 
> .impala_insert_staging directory after INSERT)
> so maybe we should filter out these hidden folder/file by reading parquet 
> tables



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2700) Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile

Reply via email to