[jira] [Comment Edited] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

Cheng Lian (JIRA) Mon, 01 Feb 2016 13:03:54 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127021#comment-15127021
 ]


Cheng Lian edited comment on SPARK-13101 at 2/1/16 9:03 PM:
------------------------------------------------------------

[~deenar] Just had an offline discussion with [~marmbrus] and [~cloud_fan]. At 
last, we agreed that nullability should only be considered as an optimization 
rather than part of the type system. So in your case, this kind of conversion 
should be allowed as long as the underlying data doesn't really contain nulls. 
So the proposed changes are:
# Allow converting nullable DF fields to non-nullable DS fields
# Add runtime nullability check for array fields and map fields (currently we 
only do the runtime nullability check for product type fields)

And I like the idea of adding an option to disable the non-nullable-to-nullable 
Parquet writer conversion behavior, but that can be tracked by another ticket.


was (Author: lian cheng):
[~deenar] Just had an offline discussion with [~marmbrus] and [~cloud_fan]. At 
last, we agreed that nullability should only be considered as an optimization 
rather than part of the type system. So in your case, this kind of conversion 
should be allowed as long as the underlying data doesn't really contain nulls. 
So the proposed changes are:
# Allow converting nullable DF fields to non-nullable DS fields
# Add runtime nullability check for array fields and map fields (currently we 
only do the runtime nullability check for product type fields)

And I like the idea of adding an option to disable the non-nullable-to0nullable 
conversion behavior, but that can be tracked by another ticket.

> Dataset complex types mapping to DataFrame  (element nullability) mismatch
> --------------------------------------------------------------------------
>
>                 Key: SPARK-13101
>                 URL: https://issues.apache.org/jira/browse/SPARK-13101
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Deenar Toraskar
>            Priority: Blocker
>
> There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By 
> default a scala {{Seq\[Double\]}} is mapped by Spark as an ArrayType with 
> nullable element
> {noformat}
>  |-- valuations: array (nullable = true)
>  |    |-- element: double (containsNull = true)
> {noformat}
> This could be read back to as a Dataset in Spark 1.6.0
> {code}
>     val df = sqlContext.table("valuations").as[Valuation]
> {code}
> But with Spark 1.6.1 the same fails with
> {code}
>     val df = sqlContext.table("valuations").as[Valuation]
> org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as 
> array<double>)' due to data type mismatch: cannot cast 
> ArrayType(DoubleType,true) to ArrayType(DoubleType,false);
> {code}
> Here's the classes I am using
> {code}
> case class Valuation(tradeId : String,
>                      counterparty: String,
>                      nettingAgreement: String,
>                      wrongWay: Boolean,
>                      valuations : Seq[Double], /* one per scenario */
>                      timeInterval: Int,
>                      jobId: String)  /* used for hdfs partitioning */
> val vals : Seq[Valuation] = Seq()
> val valsDF = sqlContext.sparkContext.parallelize(vals).toDF
> valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations")
> {code}
> even the following gives the same result
> {code}
> val valsDF = vals.toDS.toDF
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

Reply via email to