[ 
https://issues.apache.org/jira/browse/SPARK-34441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285618#comment-17285618
 ] 

Jean-Francis Roy edited comment on SPARK-34441 at 2/17/21, 3:17 AM:
--------------------------------------------------------------------

[~hyukjin.kwon] of course, here is an example :

 

 
{code:java}
scala> case class Foo(a: String)
scala> val ds = List("", "{", "{}", """{"a"}""", """{"a": "bar"}""", """{"a": 
42}""").toDS
scala> import org.apache.spark.sql.types._
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))))).show()
+------------+---------+
|       value|converted|
+------------+---------+
|            |     null|
|           {|       []|
|          {}|       []|
|       {"a"}|       []|
|{"a": "bar"}|    [bar]|
|   {"a": 42}|     [42]|
+------------+---------+{code}
We see above that faulty JSON will often result in a structure with `null` 
fields instead of a `null` directly, which is a big change of behavior between 
Spark 2 and Spark 3. The documentation still states that the behavior is Spark 
2's.

Moreover, I cannot reproduce Spark 2's behavior. I do want faulty input to be 
converted to null.

I can make the code throw using the `FAILFAST` mode:

 
{code:java}
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"FAILFAST"))).show()
{code}
 

 

But I cannot use the `DROPMALFORMED` mode as it is not supported:
{code:java}
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"DROPMALFORMED"))).show()
 java.lang.IllegalArgumentException: from_json() doesn't support the 
DROPMALFORMED mode. Acceptable modes are PERMISSIVE and FAILFAST.
{code}


  


was (Author: jeanfrancisroy):
[~hyukjin.kwon] of course, here is an example :

 

 
{code:java}
scala> case class Foo(a: String)
scala> val ds = List("", "{", "{}", """{"a"}""", """{"a": "bar"}""", """{"a": 
42}""").toDS
scala> import org.apache.spark.sql.types._
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))))).show()
+------------+---------+
|       value|converted|
+------------+---------+
|            |     null|
|           {|       []|
|          {}|       []|
|       {"a"}|       []|
|{"a": "bar"}|    [bar]|
|   {"a": 42}|     [42]|
+------------+---------+{code}
We see above that faulty JSON will often result in a structure with `null` 
fields instead of a `null` directly, which is a big change of behavior between 
Spark 2 and Spark 3. The documentation still states that the behavior is Spark 
2's.

Moreover, I cannot reproduce Spark 2's behavior. I do want faulty input to be 
converted to null.

I can make the code throw using the `FAILFAST` mode:

 
{code:java}
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"FAILFAST"))).show()
{code}
 

 

But I cannot use the `DROPMALFORMED` mode as it is not supported:
scala> ds.withColumn("converted", from_json($"value", 
StructType(Array(StructField("a", StringType))), Map("mode" -> 
"DROPMALFORMED"))).show()
java.lang.IllegalArgumentException: from_json() doesn't support the 
DROPMALFORMED mode. Acceptable modes are PERMISSIVE and FAILFAST.
 

> from_json documentation is wrong about malformed JSONs output
> -------------------------------------------------------------
>
>                 Key: SPARK-34441
>                 URL: https://issues.apache.org/jira/browse/SPARK-34441
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 3.0.0, 3.0.1
>            Reporter: Jean-Francis Roy
>            Priority: Minor
>
> The documentation of the `from_json` function states that malformed json will 
> return a `null` value, which is not the case anymore after 
> https://issues.apache.org/jira/browse/SPARK-25243.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to