[
https://issues.apache.org/jira/browse/SPARK-55568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059080#comment-18059080
]
Qiegang Long commented on SPARK-55568:
--------------------------------------
Hi [~David Cashman] & [~cloud_fan] ,
As you wrote the original version, could you take a look at my pull requst to
see if the enhancement makes sense?
[https://github.com/apache/spark/pull/54343]
Thanks!
> Separate schema construction from field statistics collection
> -------------------------------------------------------------
>
> Key: SPARK-55568
> URL: https://issues.apache.org/jira/browse/SPARK-55568
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.1.0, 4.1.1
> Reporter: Qiegang Long
> Priority: Major
> Labels: pull-request-available
>
> Variant shredding schema inference is expensive and can take well over 100ms
> per file. Propose an optimization to separate field stats collection and
> schema construction to eliminate repeated schema merge and intermediate
> allocations
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]