[
https://issues.apache.org/jira/browse/SPARK-44991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nirav patel updated SPARK-44991:
--------------------------------
Summary: Spark json schema inference and fromJson api having inconsistent
behavior (was: Spark json datasource reader and fromJson api having
inconsistent behavior)
> Spark json schema inference and fromJson api having inconsistent behavior
> -------------------------------------------------------------------------
>
> Key: SPARK-44991
> URL: https://issues.apache.org/jira/browse/SPARK-44991
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.3.2
> Reporter: nirav patel
> Priority: Major
>
> Spark json reader can infer datatype of a fields. I am ingesting millions of
> datapoints and generating a `DataFrameA`. what i notice that Schema
> inference mark datatype of a field with tons of Integers and Empty Strings as
> a Long. That is an okay behavior as I don't set `primitivesAsString` cause I
> do want primitive type inference. I store `DataFrameA` into `TableA`
> Now, this inference behavior is not respected by `fromJson` of `from_json`
> api when I am trying to write new data on `TableA`. Means, if I read a chunk
> of input data into using
> `spark.read.schema(fromJson(getStruct(TableA)).json('/path/to/more/data')`
> reader complains that EmptyString cannot be cast to Long .
> `getStruct(TableA)` is psuedo method that returns `struct` of TableA schema
> somehow. and `/path/to/more/data` have some value for this fields as an empty
> string.
> I think if reader doesnt complain about Empty string during schema inference
> it shouldn't complain either on reading without inference. May be treat Empty
> as Null just like during schema inference or at least give an additional
> option - treatEmptyAsNull so it's more explicit for application users?
> ps - i marked it as bug but could be more suited as improvements.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]