nirav patel created SPARK-44991:
-----------------------------------
Summary: Spark json datasource reader and fromJson api having
inconsistent behavior
Key: SPARK-44991
URL: https://issues.apache.org/jira/browse/SPARK-44991
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.3.2
Reporter: nirav patel
Spark json reader can infer datatype of a fields. I am ingesting millions of
datapoints and generating a `DataFrameA`. what i notice that Schema inference
mark datatype of a field with tons of Integers and Empty Strings as a Long.
That is an okay behavior as I don't set `primitivesAsString` cause I do want
proper primitive type inference. I store `DataFrameA` into `TableA`
Now, this infererence behavior is not respected by `fromJson` api when I am
trying to write new data on `TableA` generated using my schema inference
approach. Means, if I read a chunk of input data into using
`spark.read.schema(fromJson(getStruct(TableA)).json('/path/to/more/data')`
reader complains that EmptyString cannot be cast to Long . `getStruct(TableA)`
is psuedo method that returns `struct` of TableA schema somehow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]