[ 
https://issues.apache.org/jira/browse/SPARK-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cazen Lee updated SPARK-12537:
------------------------------
       Priority: Major  (was: Minor)
    Description: 
We can provides the option to choose JSON parser can be enabled to accept 
quoting of all character or not.

For example, if JSON file that includes not listed by JSON backslash quoting 
specification, it returns corrupt_record

{code:title=JSON File|borderStyle=solid}
{"name": "Cazen Lee", "price": "$10"}
{"name": "John Doe", "price": "\$20"}
{"name": "Tracy", "price": "$10"}
{code}

corrupt_record(returns null)
{code}
scala> df.show
+--------------------+---------+-----+
|     _corrupt_record|     name|price|
+--------------------+---------+-----+
|                null|Cazen Lee|  $10|
|{"name": "John Do...|     null| null|
|                null|    Tracy|  $10|
+--------------------+---------+-----+
{code}

And after apply this patch, we can enable allowBackslashEscapingAnyCharacter 
option like below

{code}
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", 
"true").json("/user/Cazen/test/test2.txt")
df: org.apache.spark.sql.DataFrame = [name: string, price: string]

scala> df.show
+---------+-----+
|     name|price|
+---------+-----+
|Cazen Lee|  $10|
| John Doe|  $20|
|    Tracy|  $10|
+---------+-----+
{code}

This issue similar to HIVE-11825, HIVE-12717.


  was:
We can provides the option to choose JSON parser can be enabled to accept 
quoting of all character or not.

For example, if JSON file that includes not listed by JSON backslash quoting 
specification, it returns corrupt_record

{code:title=JSON File|borderStyle=solid}
{"name": "Cazen Lee", "price": "$10"}
{"name": "John Doe", "price": "\$20"}
{"name": "Tracy", "price": "$10"}
<code>

corrupt_record(returns null)
<code>
scala> df.show
+--------------------+---------+-----+
|     _corrupt_record|     name|price|
+--------------------+---------+-----+
|                null|Cazen Lee|  $10|
|{"name": "John Do...|     null| null|
|                null|    Tracy|  $10|
+--------------------+---------+-----+
<code>

And after apply this patch, we can enable allowBackslashEscapingAnyCharacter 
option like below

<code>
scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", 
"true").json("/user/Cazen/test/test2.txt")
df: org.apache.spark.sql.DataFrame = [name: string, price: string]

scala> df.show
+---------+-----+
|     name|price|
+---------+-----+
|Cazen Lee|  $10|
| John Doe|  $20|
|    Tracy|  $10|
+---------+-----+
<code>

This issue similar to HIVE-11825, HIVE-12717.


> Add option to accept quoting of all character backslash quoting mechanism
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12537
>                 URL: https://issues.apache.org/jira/browse/SPARK-12537
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Cazen Lee
>
> We can provides the option to choose JSON parser can be enabled to accept 
> quoting of all character or not.
> For example, if JSON file that includes not listed by JSON backslash quoting 
> specification, it returns corrupt_record
> {code:title=JSON File|borderStyle=solid}
> {"name": "Cazen Lee", "price": "$10"}
> {"name": "John Doe", "price": "\$20"}
> {"name": "Tracy", "price": "$10"}
> {code}
> corrupt_record(returns null)
> {code}
> scala> df.show
> +--------------------+---------+-----+
> |     _corrupt_record|     name|price|
> +--------------------+---------+-----+
> |                null|Cazen Lee|  $10|
> |{"name": "John Do...|     null| null|
> |                null|    Tracy|  $10|
> +--------------------+---------+-----+
> {code}
> And after apply this patch, we can enable allowBackslashEscapingAnyCharacter 
> option like below
> {code}
> scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", 
> "true").json("/user/Cazen/test/test2.txt")
> df: org.apache.spark.sql.DataFrame = [name: string, price: string]
> scala> df.show
> +---------+-----+
> |     name|price|
> +---------+-----+
> |Cazen Lee|  $10|
> | John Doe|  $20|
> |    Tracy|  $10|
> +---------+-----+
> {code}
> This issue similar to HIVE-11825, HIVE-12717.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to