Maxim Gekk created SPARK-26303:
----------------------------------

             Summary: Return partial results for bad JSON records
                 Key: SPARK-26303
                 URL: https://issues.apache.org/jira/browse/SPARK-26303
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Maxim Gekk


Currently, JSON datasource and JSON functions return row with all null for a 
malformed JSON string in the PERMISSIVE mode when specified schema has the 
struct type. All nulls are returned even some of fields were parsed and 
converted to desired types successfully. The ticket aims to solve the problem 
by returning already parsed fields. The corrupted column specified via JSON 
option `columnNameOfCorruptRecord` or SQL config should contain whole original 
JSON string. 

For example, if the input has one JSON string:
{code:json}
{"a":0.1,"b":{},"c":"def"}
{code}
and specified schema is:
{code:sql}
a DOUBLE, b ARRAY<INT>, c STRING, _corrupt_record STRIN
{code}
expected output of `from_json` in the PERMISSIVE mode:
{code}
+---+----+---+--------------------------+
|a  |b   |c  |_corrupt_record           |
+---+----+---+--------------------------+
|0.1|null|def|{"a":0.1,"b":{},"c":"def"}|
+---+----+---+--------------------------+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to