Re: Error in using saveAsParquetFile

Bipin Nag Mon, 08 Jun 2015 23:53:23 -0700

Cheng you were right. It works when I remove the field from either one. I
should have checked the types beforehand. What confused me is that Spark
attempted to join it and midway threw the error. It isn't quite there yet.
Thanks for the help.


On Mon, Jun 8, 2015 at 8:29 PM Cheng Lian <lian.cs....@gmail.com> wrote:

>  I suspect that Bookings and Customerdetails both have a PolicyType field,
> one is string and the other is an int.
>
>
> Cheng
>
>
> On 6/8/15 9:15 PM, Bipin Nag wrote:
>
>  Hi Jeetendra, Cheng
>
>  I am using following code for joining
>
> val Bookings = sqlContext.load("/home/administrator/stageddata/Bookings")
> val Customerdetails =
> sqlContext.load("/home/administrator/stageddata/Customerdetails")
>
> val CD = Customerdetails.
>     where($"CreatedOn" > "2015-04-01 00:00:00.0").
>     where($"CreatedOn" < "2015-05-01 00:00:00.0")
>
> //Bookings by CD
> val r1 = Bookings.
>     withColumnRenamed("ID","ID2")
> val r2 = CD.
>     join(r1,CD.col("CustomerID") === r1.col("ID2"),"left")
>
> r2.saveAsParquetFile("/home/administrator/stageddata/BOOKING_FULL");
>
>  @Cheng I am not appending the joined table to an existing parquet file,
> it is a new file.
>  @Jitender I have a rather large parquet file and it also contains some
> confidential data. Can you tell me what you need to check in it.
>
>  Thanks
>
>
> On 8 June 2015 at 16:47, Jeetendra Gangele <gangele...@gmail.com> wrote:
>
>> Parquet file when are you loading these file?
>> can you please share the code where you are passing parquet file to
>> spark?.
>>
>> On 8 June 2015 at 16:39, Cheng Lian <lian.cs....@gmail.com> wrote:
>>
>>> Are you appending the joined DataFrame whose PolicyType is string to an
>>> existing Parquet file whose PolicyType is int? The exception indicates that
>>> Parquet found a column with conflicting data types.
>>>
>>> Cheng
>>>
>>>
>>> On 6/8/15 5:29 PM, bipin wrote:
>>>
>>>> Hi I get this error message when saving a table:
>>>>
>>>> parquet.io.ParquetDecodingException: The requested schema is not
>>>> compatible
>>>> with the file schema. incompatible types: optional binary PolicyType
>>>> (UTF8)
>>>> != optional int32 PolicyType
>>>>         at
>>>>
>>>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
>>>>         at
>>>>
>>>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
>>>>         at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
>>>>         at
>>>>
>>>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
>>>>         at
>>>>
>>>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
>>>>         at parquet.schema.MessageType.accept(MessageType.java:55)
>>>>         at
>>>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
>>>>         at
>>>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
>>>>         at
>>>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
>>>>         at
>>>>
>>>> parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
>>>>         at
>>>>
>>>> parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
>>>>         at
>>>> parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
>>>>         at
>>>>
>>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
>>>>         at
>>>>
>>>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
>>>>         at
>>>> org.apache.spark.sql.parquet.ParquetRelation2.org
>>>> $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
>>>>         at
>>>>
>>>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>>>         at
>>>>
>>>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
>>>>         at
>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>>>         at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>>>         at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>         at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> I joined two tables both loaded from parquet file, the joined table when
>>>> saved throws this error. I could not find anything about this error.
>>>> Could
>>>> this be a bug ?
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>>   --
>>  Hi,
>>
>>  Find my attached resume. I have total around 7 years of work experience.
>> I worked for Amazon and Expedia in my previous assignments and currently
>> I am working with start- up technology company called Insideview in
>> hyderabad.
>>
>>  Regards
>> Jeetendra
>>
>
>
>

Re: Error in using saveAsParquetFile

Reply via email to