[
https://issues.apache.org/jira/browse/SPARK-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marius Soutier updated SPARK-6648:
----------------------------------
Description:
When reading from multiple parquet files (via
sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the parquet
files is being overwritten using a different coalesce (e.g. one only contains
part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the
reading fails with:
ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading
parquet file
java.lang.IllegalArgumentException: Could not find Parquet metadata at path
<path>
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at scala.Option.getOrElse(Option.scala:120)
~[org.scala-lang.scala-library-2.10.4.jar:na]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
I haven't tested with Spark 1.3 yet but will report back after upgrading to
1.3.1 (as soon as it's released).
was:
When reading from multiple parquet files (via
sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files
were created using a different coalesce (e.g. one only contains
part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the
reading fails with:
ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading
parquet file
java.lang.IllegalArgumentException: Could not find Parquet metadata at path
<path>
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at scala.Option.getOrElse(Option.scala:120)
~[org.scala-lang.scala-library-2.10.4.jar:na]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at
org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165)
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
I haven't tested with Spark 1.3 yet but will report back after upgrading to
1.3.1 (as soon as it's released).
> Reading Parquet files with different sub-files doesn't work
> -----------------------------------------------------------
>
> Key: SPARK-6648
> URL: https://issues.apache.org/jira/browse/SPARK-6648
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.1
> Reporter: Marius Soutier
>
> When reading from multiple parquet files (via
> sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the
> parquet files is being overwritten using a different coalesce (e.g. one only
> contains part-r-1.parquet, the other also part-r-2.parquet,
> part-r-3.parquet), the reading fails with:
> ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading
> parquet file
> java.lang.IllegalArgumentException: Could not find Parquet metadata at path
> <path>
> at
> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> at
> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> at scala.Option.getOrElse(Option.scala:120)
> ~[org.scala-lang.scala-library-2.10.4.jar:na]
> at
> org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> at
> org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> at
> org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165)
> ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
> I haven't tested with Spark 1.3 yet but will report back after upgrading to
> 1.3.1 (as soon as it's released).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]