I suspect that Bookings and Customerdetails both have a PolicyType
field, one is string and the other is an int.
Cheng
On 6/8/15 9:15 PM, Bipin Nag wrote:
Hi Jeetendra, Cheng
I am using following code for joining
val Bookings = sqlContext.load("/home/administrator/stageddata/Bookings")
val Customerdetails =
sqlContext.load("/home/administrator/stageddata/Customerdetails")
val CD = Customerdetails.
where($"CreatedOn" > "2015-04-01 00:00:00.0").
where($"CreatedOn" < "2015-05-01 00:00:00.0")
//Bookings by CD
val r1 = Bookings.
withColumnRenamed("ID","ID2")
val r2 = CD.
join(r1,CD.col("CustomerID") === r1.col("ID2"),"left")
r2.saveAsParquetFile("/home/administrator/stageddata/BOOKING_FULL");
@Cheng I am not appending the joined table to an existing parquet
file, it is a new file.
@Jitender I have a rather large parquet file and it also contains some
confidential data. Can you tell me what you need to check in it.
Thanks
On 8 June 2015 at 16:47, Jeetendra Gangele <gangele...@gmail.com
<mailto:gangele...@gmail.com>> wrote:
Parquet file when are you loading these file?
can you please share the code where you are passing parquet file
to spark?.
On 8 June 2015 at 16:39, Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
Are you appending the joined DataFrame whose PolicyType is
string to an existing Parquet file whose PolicyType is int?
The exception indicates that Parquet found a column with
conflicting data types.
Cheng
On 6/8/15 5:29 PM, bipin wrote:
Hi I get this error message when saving a table:
parquet.io <http://parquet.io>.ParquetDecodingException:
The requested schema is not compatible
with the file schema. incompatible types: optional binary
PolicyType (UTF8)
!= optional int32 PolicyType
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
at
parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
at
parquet.schema.MessageType.accept(MessageType.java:55)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
at
parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
at
parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
at
parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org
<http://org.apache.spark.sql.parquet.ParquetRelation2.org>$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I joined two tables both loaded from parquet file, the
joined table when
saved throws this error. I could not find anything about
this error. Could
this be a bug ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
Sent from the Apache Spark User List mailing list archive
at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail:
user-h...@spark.apache.org <mailto:user-h...@spark.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
--
Hi,
Find my attached resume. I have total around 7 years of work
experience.
I worked for Amazon and Expedia in my previous assignments and
currently I am working with start- up technology company called
Insideview in hyderabad.
Regards
Jeetendra