[
https://issues.apache.org/jira/browse/FLINK-28939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582854#comment-17582854
]
Yunhong Zheng edited comment on FLINK-28939 at 8/22/22 9:58 AM:
----------------------------------------------------------------
Hi, [~godfreyhe] , I have tested `analyze table` based on the examples of [pr
docs|[https://github.com/apache/flink/pull/20506].] Almost all cases met the
expectations, but the following problems were founded:
# When String/varchar type column `a` have `null` value, `Analyze table xxx
FOR ALL COLUMNS/ COLUMNS a` may throw error:
{code:java}
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.protocol.TProtocolException: Required field 'maxColLen' is
unset! Struct:StringColumnStatsData(maxColLen:0, avgColLen:0.0, numNulls:1,
numDVs:0) {code}
# If there are three columns named `a, b, c` with column stats already
exists, I just analyze column `a` using `Analyze table xxx FOR COLUMNS a`, the
existing column stats of `b, c` will be reset back to empty. (Is this in line
with expectations ?)
# For partition table, If I use hive catalog, after `Analyze table xxx FOR
ALL COLUMNS`, the result of hive statement `desc formatted orders
amount;`(does't specify partition) is wrong. I think the reason is that we
don't have the column stats merge logical in this FLIP like
'FlinkRecomputeStatisticsProgram' and write to catalog.
# When an error of `1.` is thrown, I find that some of the column stats are
successfully written to catalog and some of it is not. Is this in line with
expectations? Will this generate incorrect column stats?
The above are the problems I found. Screenshots of some successful examples
will be sent out by me after these problems are fixed. Thank you for your
contribution [~godfreyhe] .
was (Author: JIRAUSER287975):
Hi, [~godfreyhe] , I have tested `analyze table` based on the examples of [pr
docs|[https://github.com/apache/flink/pull/20506].] Almost all cases met the
expectations, but the following problems were founded:
# When String/varchar type column `a` have `null` value, `Analyze table xxx
FOR ALL COLUMNS/ COLUMNS a` may throw error:
{code:java}
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.protocol.TProtocolException: Required field 'maxColLen' is
unset! Struct:StringColumnStatsData(maxColLen:0, avgColLen:0.0, numNulls:1,
numDVs:0) {code}
# If there are three columns named `a, b, c` with column stats already
exists, I just analyze column `a` using `Analyze table xxx FOR COLUMNS a`, the
existing column stats of `b, c` will be reset back to empty. (Is this in line
with expectations ?)
# For partition table, If I use hive catalog, after `Analyze table xxx FOR
ALL COLUMNS`, the result of hive statement `desc formatted orders
amount;`(does't specify partition) is wrong. I think the reason is we don't
have the column stats merge logical in this FLIP like
'FlinkRecomputeStatisticsProgram' and write to catalog.
# When an error of `1.` is thrown, I find that some of the column stats are
successfully written to catalog and some of it is not. Is this in line with
expectations? Will this generate incorrect column stats?
The above are the problems I found. Screenshots of some successful examples
will be sent out by me after these problems are fixed. Thank you for your
contribution [~godfreyhe] .
> Release Testing: Verify FLIP-241 ANALYZE TABLE
> ----------------------------------------------
>
> Key: FLINK-28939
> URL: https://issues.apache.org/jira/browse/FLINK-28939
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API, Table SQL / Planner
> Affects Versions: 1.16.0
> Reporter: godfrey he
> Assignee: Yunhong Zheng
> Priority: Blocker
> Labels: release-testing
> Fix For: 1.16.0
>
>
> This issue aims to verify FLIP-240:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> We can verify it in SQL client after we build the flink-dist package.
> 1. create a partition table and a non-partition table (with/without compute
> column/metadata column, with different columns), and then insert some data
> 2. verify the different statements, please refer to the FLIP doc examples
> 3. verify the result in catalog. Currently, {{describe extended}} statement
> does not support show the statistics in catalog, we should write some code to
> get the statistics from catalog, or we can use hive cli if the catalog is
> hive catalog
> 4. verify the unsupported cases,
> 4.1 analyze non-existed table
> 4.2 analyze view
> 4.3 analyze a partition table with non-existed partition
> 4.4. analyze a non-partition table with a partition
> 4.5. analyze a non-existed column
> 4.6. analyze a computed column
> 4.6. analyze a metadata column
--
This message was sent by Atlassian Jira
(v8.20.10#820010)