[
https://issues.apache.org/jira/browse/HUDI-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-8556:
--------------------------------------
Description:
As of now, out of the box, we generate col stats for all top level fields. This
could be prohibitively expensive for wider tables having 1000 columns. So, we
should trim it down to say first 32 to level columns for a good out of the box
performance.
Users will anyway have an option to override if need be.
Lets add a config to drive this, and out of the box we can set it to 32.
was:
As of now, out of the box, we generate col stats for all top level fields. This
could be prohibitively expensive for wider tables having 1000 columns. So, we
should trim it down to say first 32 to level columns for a good out of the box
performance.
Users will anyway have an option to override if need be.
> Trim the number of columns to generate col stats out of the box
> ---------------------------------------------------------------
>
> Key: HUDI-8556
> URL: https://issues.apache.org/jira/browse/HUDI-8556
> Project: Apache Hudi
> Issue Type: Improvement
> Components: dataskipping, metadata
> Reporter: sivabalan narayanan
> Assignee: Jonathan Vexler
> Priority: Blocker
> Fix For: 1.0.0
>
>
> As of now, out of the box, we generate col stats for all top level fields.
> This could be prohibitively expensive for wider tables having 1000 columns.
> So, we should trim it down to say first 32 to level columns for a good out of
> the box performance.
> Users will anyway have an option to override if need be.
>
> Lets add a config to drive this, and out of the box we can set it to 32.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)