[ 
https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed ZAROUI updated SPARK-23448:
---------------------------------
    Description: 
I have the following json file that contains some noisy data(String instead of 
Array):

 
{code:java}
{"attr1":"val1","attr2":"[\"val2\"]"}
{"attr1":"val1","attr2":["val2"]}
{code}
And i need to specify schema programatically like this:

 
{code:java}
implicit val spark = SparkSession
  .builder()
  .master("local[*]")
  .config("spark.ui.enabled", false)
  .config("spark.sql.caseSensitive", "True")
  .getOrCreate()
import spark.implicits._

val 
schema=StructType(Seq(StructField("attr1",StringType,true),StructField("attr2",ArrayType(StringType,true),true)))
  spark.read.schema(schema).json(input).collect().foreach(println)
{code}
The result given by this code is:
{code:java}
[null,null]
[val1,WrappedArray(val2)]
{code}
Instead of putting null in corrupted column, all columns of the first message 
are null

 

 

  was:
I have the following json file that contains some noisy data(String instead of 
Array):

 
{code:java}
{"attr1":"val1","attr2":["val2"]} 
{"attr1":"val1","attr2":"[\"val2\"]"}
{code}
And i need to specify schema programatically like this:

 
{code:java}
implicit val spark = SparkSession
  .builder()
  .master("local[*]")
  .config("spark.ui.enabled", false)
  .config("spark.sql.caseSensitive", "True")
  .getOrCreate()
import spark.implicits._

val 
schema=StructType(Seq(StructField("attr1",StringType,true),StructField("attr2",ArrayType(StringType,true),true)))
  spark.read.schema(schema).json(input).collect().foreach(println)
{code}
The result given by this code is:
{code:java}
[null,null]
[val1,WrappedArray(val2)]
{code}
Instead of putting null in corrupted column, all columns of the first message 
are null

 

 


> Data encoding problem when not finding the right type
> -----------------------------------------------------
>
>                 Key: SPARK-23448
>                 URL: https://issues.apache.org/jira/browse/SPARK-23448
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: Tested locally in linux machine
>            Reporter: Ahmed ZAROUI
>            Priority: Major
>
> I have the following json file that contains some noisy data(String instead 
> of Array):
>  
> {code:java}
> {"attr1":"val1","attr2":"[\"val2\"]"}
> {"attr1":"val1","attr2":["val2"]}
> {code}
> And i need to specify schema programatically like this:
>  
> {code:java}
> implicit val spark = SparkSession
>   .builder()
>   .master("local[*]")
>   .config("spark.ui.enabled", false)
>   .config("spark.sql.caseSensitive", "True")
>   .getOrCreate()
> import spark.implicits._
> val 
> schema=StructType(Seq(StructField("attr1",StringType,true),StructField("attr2",ArrayType(StringType,true),true)))
>   spark.read.schema(schema).json(input).collect().foreach(println)
> {code}
> The result given by this code is:
> {code:java}
> [null,null]
> [val1,WrappedArray(val2)]
> {code}
> Instead of putting null in corrupted column, all columns of the first message 
> are null
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to