Qiegang Long created SPARK-55568:
------------------------------------
Summary: Separate schema construction from field statistics
collection
Key: SPARK-55568
URL: https://issues.apache.org/jira/browse/SPARK-55568
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.1.1, 4.1.0
Reporter: Qiegang Long
Variant shredding schema inference is expensive and can take well over 100ms
per file. Propose an optimization to separate field stats collection and schema
construction to eliminate repeated schema merge and intermediate allocations
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]