Hyukjin Kwon created SPARK-18246:
------------------------------------

             Summary: Throws an exception before execution for unsupported 
types in Json, CSV and text functionailities
                 Key: SPARK-18246
                 URL: https://issues.apache.org/jira/browse/SPARK-18246
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Hyukjin Kwon


* Case 1

{code}
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": 
"str$i"}""")
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
{code}

should throw an exception before the execution.


* Case 2

{code}
val path = "/tmp/a"
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": 
"str$i"}""").saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
{code}

should throw an exception before the execution.

* Case 3

{code}
val path = "/tmp/b"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
{code}

should throw an exception before the execution.

* Case 4

{code}
val path = "/tmp/c"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", LongType)
spark.read.schema(schema).text(path).show()
{code}

should throw an exception before the execution rather than printing incorrect 
values.

{code}
+-----------+
|          a|
+-----------+
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476739|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
+-----------+
{code}


* Case 5

{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq("""{"a" 1}""").toDS()
val schema = new StructType().add("a", CalendarIntervalType)
df.select(from_json($"value", schema)).collect()
{code}

prints

{code}
+-------------------+
|jsontostruct(value)|
+-------------------+
|               null|
+-------------------+
{code}

This should throw analysis exception as {{CalendarIntervalType}} is not 
supported.


Likewise {{to_json}} throws an analysis error, for example,

{code}
val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
  .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
df.select(to_json($"c")).collect()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to