Re: 1.3.1: Persisting RDD in parquet - Conflicting partition column names

2015-04-30 Thread Cheng Lian

Did you have a directory layout like this?

base/
 | data-file-2.parquet
 | batch_id=1/
 |  | data-file-1.parquet

Cheng

On 4/28/15 11:20 AM, sranga wrote:

Hi

I am getting the following error when persisting an RDD in parquet format to
an S3 location. This is code that was working in the 1.2 version. The
version that it is failing to work is 1.3.1.
Any help is appreciated.

Caused by: java.lang.AssertionError: assertion failed: Conflicting partition
column names detected:
 ArrayBuffer(batch_id)
ArrayBuffer()
 at scala.Predef$.assert(Predef.scala:179)
 at
org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933)
 at
org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851)
 at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311)
 at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303)
 at scala.Option.getOrElse(Option.scala:120)
 at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303)
 at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:692)
 at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
 at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
 at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
 at
org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:995)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-3-1-Persisting-RDD-in-parquet-Conflicting-partition-column-names-tp22678.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: 1.3.1: Persisting RDD in parquet - Conflicting partition column names

2015-04-28 Thread ayan guha
Can you show your code please?
On 28 Apr 2015 13:20, sranga sra...@gmail.com wrote:

 Hi

 I am getting the following error when persisting an RDD in parquet format
 to
 an S3 location. This is code that was working in the 1.2 version. The
 version that it is failing to work is 1.3.1.
 Any help is appreciated.

 Caused by: java.lang.AssertionError: assertion failed: Conflicting
 partition
 column names detected:
 ArrayBuffer(batch_id)
 ArrayBuffer()
 at scala.Predef$.assert(Predef.scala:179)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303)
 at scala.Option.getOrElse(Option.scala:120)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303)
 at
 org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:692)
 at

 org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
 at
 org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
 at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
 at
 org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:995)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/1-3-1-Persisting-RDD-in-parquet-Conflicting-partition-column-names-tp22678.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




1.3.1: Persisting RDD in parquet - Conflicting partition column names

2015-04-27 Thread sranga
Hi

I am getting the following error when persisting an RDD in parquet format to
an S3 location. This is code that was working in the 1.2 version. The
version that it is failing to work is 1.3.1.
Any help is appreciated. 

Caused by: java.lang.AssertionError: assertion failed: Conflicting partition
column names detected:
ArrayBuffer(batch_id)
ArrayBuffer()
at scala.Predef$.assert(Predef.scala:179)
at
org.apache.spark.sql.parquet.ParquetRelation2$.resolvePartitions(newParquet.scala:933)
at
org.apache.spark.sql.parquet.ParquetRelation2$.parsePartitions(newParquet.scala:851)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:311)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$7.apply(newParquet.scala:303)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:303)
at
org.apache.spark.sql.parquet.ParquetRelation2.insert(newParquet.scala:692)
at
org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:129)
at
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240)
at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)
at
org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:995)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/1-3-1-Persisting-RDD-in-parquet-Conflicting-partition-column-names-tp22678.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org