[jira] [Created] (SPARK-44991) Spark json datasource reader and fromJson api having inconsistent behavior

nirav patel (Jira) Mon, 28 Aug 2023 12:13:21 -0700

nirav patel created SPARK-44991:
-----------------------------------

             Summary: Spark json datasource reader and fromJson api having 
inconsistent behavior
                 Key: SPARK-44991
                 URL: https://issues.apache.org/jira/browse/SPARK-44991
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.3.2
            Reporter: nirav patel



Spark json reader can infer datatype of a fields. I am ingesting millions of 
datapoints and  generating a `DataFrameA`. what i notice that Schema inference 
mark datatype of a field with tons of Integers and Empty Strings as a Long. 
That is an okay behavior as I don't set `primitivesAsString` cause I do want 
proper primitive type inference. I store `DataFrameA` into `TableA` 

Now, this infererence behavior is not respected by `fromJson` api when I am 
trying to write new data on `TableA` generated using my schema inference 
approach. Means, if I read a chunk of input data into using 
`spark.read.schema(fromJson(getStruct(TableA)).json('/path/to/more/data')` 
reader complains that EmptyString cannot be cast to Long . `getStruct(TableA)` 
is psuedo method that returns `struct` of TableA schema somehow. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-44991) Spark json datasource reader and fromJson api having inconsistent behavior

Reply via email to