Alexander Behm created IMPALA-6024:
--------------------------------------
Summary: Add minimum sample size for COMPUTE STATS TABLESAMPLE
Key: IMPALA-6024
URL: https://issues.apache.org/jira/browse/IMPALA-6024
Project: IMPALA
Issue Type: Sub-task
Components: Frontend
Affects Versions: Impala 2.10.0
Reporter: Alexander Behm
Assignee: Alexander Behm
We should introduce a minimum sample size in bytes for COMPUTE STATS
TABLESAMPLE. Reasons:
* For small tables sampling does not make sense. Accurate stats can be obtained
cheaply without sampling.
* Very small sample sizes mostly do not make sense - some minimum of data is
required to get meaningful stats.
I think a 1GB minimum might be a good choice and ideally this minimum sample
size would be configurable.
Many other DBMS have stats collection with sampling and in many cases a minimum
sample size is required to get any meaningful stats.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)