Hyukjin Kwon created SPARK-18246:
------------------------------------
Summary: Throws an exception before execution for unsupported
types in Json, CSV and text functionailities
Key: SPARK-18246
URL: https://issues.apache.org/jira/browse/SPARK-18246
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Hyukjin Kwon
* Case 1
{code}
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a":
"str$i"}""")
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
{code}
should throw an exception before the execution.
* Case 2
{code}
val path = "/tmp/a"
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a":
"str$i"}""").saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
{code}
should throw an exception before the execution.
* Case 3
{code}
val path = "/tmp/b"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
{code}
should throw an exception before the execution.
* Case 4
{code}
val path = "/tmp/c"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", LongType)
spark.read.schema(schema).text(path).show()
{code}
should throw an exception before the execution rather than printing incorrect
values.
{code}
+-----------+
| a|
+-----------+
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476739|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
+-----------+
{code}
* Case 5
{code}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._
val df = Seq("""{"a" 1}""").toDS()
val schema = new StructType().add("a", CalendarIntervalType)
df.select(from_json($"value", schema)).collect()
{code}
prints
{code}
+-------------------+
|jsontostruct(value)|
+-------------------+
| null|
+-------------------+
{code}
This should throw analysis exception as {{CalendarIntervalType}} is not
supported.
Likewise {{to_json}} throws an analysis error, for example,
{code}
val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
.select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
df.select(to_json($"c")).collect()
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]