Marius Soutier created SPARK-6648:
-------------------------------------
Summary: Reading Parquet files with different sub-files doesn't
work
Key: SPARK-6648
URL: https://issues.apache.org/jira/browse/SPARK-6648
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.2.1
Reporter: Marius Soutier
When reading from multiple parquet files (via
sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files
were created using a different coalesce, the reading fails with:
ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading
parquet file
java.lang.IllegalArgumentException: Could not find Parquet metadata at path
<path>
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at scala.Option.getOrElse(Option.scala:120)
~[org.scala-lang.scala-library-2.10.4.jar:na]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
I haven't tested with Spark 1.3 yet but will report back after upgrading to
1.3.1 (as soon as it's released).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]