[
https://issues.apache.org/jira/browse/SPARK-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Hao updated SPARK-4856:
-----------------------------
Description:
We have data like:
{noformat}
TestSQLContext.sparkContext.parallelize(
"""{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}"""
::
"""{"ip":"27.31.100.29","headers":{}}""" ::
"""{"ip":"27.31.100.29","headers":""}""" :: Nil)
{noformat}
As empty string (the "headers") will be considered as String, and it ignores
the real nested data type (struct type "headers" in line 1), and then we will
get the "headers" as String Type, which is not our expectation.
was:
We have data like:
{noformat}
TestSQLContext.sparkContext.parallelize(
"""{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}"""
::
"""{"ip":"27.31.100.29","headers":{}}""" ::
"""{"ip":"27.31.100.29","headers":""}""" :: Nil)
{noformat}
As empty string (the "headers") will be considered as String in the beginning
(in line 2 and 3), it ignores the real nested data type (struct type "headers"
in line 1), and also take the line 1 (the "headers") as String Type, which is
not our expected.
> Null & empty string should not be considered as StringType at begining in
> Json schema inferring
> -----------------------------------------------------------------------------------------------
>
> Key: SPARK-4856
> URL: https://issues.apache.org/jira/browse/SPARK-4856
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Cheng Hao
>
> We have data like:
> {noformat}
> TestSQLContext.sparkContext.parallelize(
>
> """{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}"""
> ::
> """{"ip":"27.31.100.29","headers":{}}""" ::
> """{"ip":"27.31.100.29","headers":""}""" :: Nil)
> {noformat}
> As empty string (the "headers") will be considered as String, and it ignores
> the real nested data type (struct type "headers" in line 1), and then we will
> get the "headers" as String Type, which is not our expectation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]