[ https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091256#comment-14091256 ]
Teng Qiu commented on SPARK-2700: --------------------------------- Hi [~srowen] and [~marmbrus] , what do you think about this patch? should be merged in 1.1.0 ? > Hidden files (such as .impala_insert_staging) should be filtered out by > sqlContext.parquetFile > ---------------------------------------------------------------------------------------------- > > Key: SPARK-2700 > URL: https://issues.apache.org/jira/browse/SPARK-2700 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.0.1 > Reporter: Teng Qiu > > when creating a table in impala, a hidden folder .impala_insert_staging will > be created in the folder of table. > if we want to load such a table using Spark SQL API sqlContext.parquetFile, > this hidden folder makes trouble, spark try to get metadata from this folder, > you will see the exception: > {code:borderStyle=solid} > Caused by: java.io.IOException: Could not read footer for file > FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging; > isDirectory=true; modification_time=1406333729252; access_time=0; > owner=hdfs; group=hdfs; permission=rwxr-xr-x; isSymlink=false} > ... > ... > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is > not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging > {code} > and impala side do not think this is their problem: > https://issues.cloudera.org/browse/IMPALA-837 (IMPALA-837 Delete > .impala_insert_staging directory after INSERT) > so maybe we should filter out these hidden folder/file by reading parquet > tables -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org