[ https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371120#comment-16371120 ]
Apache Spark commented on SPARK-23448: -------------------------------------- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/20648 > Dataframe returns wrong result when column don't respect datatype > ----------------------------------------------------------------- > > Key: SPARK-23448 > URL: https://issues.apache.org/jira/browse/SPARK-23448 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2 > Environment: Local > Reporter: Ahmed ZAROUI > Priority: Major > > I have the following json file that contains some noisy data(String instead > of Array): > > {code:java} > {"attr1":"val1","attr2":"[\"val2\"]"} > {"attr1":"val1","attr2":["val2"]} > {code} > And i need to specify schema programatically like this: > > {code:java} > implicit val spark = SparkSession > .builder() > .master("local[*]") > .config("spark.ui.enabled", false) > .config("spark.sql.caseSensitive", "True") > .getOrCreate() > import spark.implicits._ > val schema = StructType( > Seq(StructField("attr1", StringType, true), > StructField("attr2", ArrayType(StringType, true), true))) > spark.read.schema(schema).json(input).collect().foreach(println) > {code} > The result given by this code is: > {code:java} > [null,null] > [val1,WrappedArray(val2)] > {code} > Instead of putting null in corrupted column, all columns of the first message > are null > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org