[jira] [Comment Edited] (FLINK-28939) Release Testing: Verify FLIP-241 ANALYZE TABLE

Yunhong Zheng (Jira) Mon, 22 Aug 2022 02:59:57 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582854#comment-17582854
 ]


Yunhong Zheng edited comment on FLINK-28939 at 8/22/22 9:58 AM:
----------------------------------------------------------------

Hi, [~godfreyhe] , I have tested `analyze table` based on the examples of [pr 
docs|[https://github.com/apache/flink/pull/20506].] Almost all cases met the 
expectations, but the following problems were founded:
 # When String/varchar type column `a` have `null` value, `Analyze table xxx 
FOR ALL COLUMNS/ COLUMNS a` may throw error:
{code:java}
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.protocol.TProtocolException: Required field 'maxColLen' is 
unset! Struct:StringColumnStatsData(maxColLen:0, avgColLen:0.0, numNulls:1, 
numDVs:0) {code}

 #  If there are three columns named `a, b, c` with column stats already 
exists,  I just analyze column `a` using `Analyze table xxx FOR COLUMNS a`, the 
existing column stats of `b, c` will be reset back to empty. (Is this in line 
with expectations ?)
 # For partition table, If I use hive catalog, after `Analyze table xxx  FOR 
ALL COLUMNS`, the result of hive statement `desc formatted orders 
amount;`(does't specify partition) is wrong. I think the reason is that we 
don't have the column stats merge logical in this FLIP like 
'FlinkRecomputeStatisticsProgram' and write to catalog.
 # When an error of `1.` is thrown, I find that some of the column stats are 
successfully written to catalog and some of it is not. Is this in line with 
expectations? Will this generate incorrect column stats？

The above are the problems I found. Screenshots of some successful examples 
will be sent out by me after these problems are fixed. Thank you for your 
contribution [~godfreyhe] .


was (Author: JIRAUSER287975):
Hi, [~godfreyhe] , I have tested `analyze table` based on the examples of [pr 
docs|[https://github.com/apache/flink/pull/20506].] Almost all cases met the 
expectations, but the following problems were founded:
 # When String/varchar type column `a` have `null` value, `Analyze table xxx 
FOR ALL COLUMNS/ COLUMNS a` may throw error:
{code:java}
[ERROR] Could not execute SQL statement. Reason:
org.apache.thrift.protocol.TProtocolException: Required field 'maxColLen' is 
unset! Struct:StringColumnStatsData(maxColLen:0, avgColLen:0.0, numNulls:1, 
numDVs:0) {code}

 #  If there are three columns named `a, b, c` with column stats already 
exists,  I just analyze column `a` using `Analyze table xxx FOR COLUMNS a`, the 
existing column stats of `b, c` will be reset back to empty. (Is this in line 
with expectations ?)
 # For partition table, If I use hive catalog, after `Analyze table xxx  FOR 
ALL COLUMNS`, the result of hive statement `desc formatted orders 
amount;`(does't specify partition) is wrong. I think the reason is we don't 
have the column stats merge logical in this FLIP like 
'FlinkRecomputeStatisticsProgram' and write to catalog.
 # When an error of `1.` is thrown, I find that some of the column stats are 
successfully written to catalog and some of it is not. Is this in line with 
expectations? Will this generate incorrect column stats？

The above are the problems I found. Screenshots of some successful examples 
will be sent out by me after these problems are fixed. Thank you for your 
contribution [~godfreyhe] .

> Release Testing: Verify FLIP-241 ANALYZE TABLE
> ----------------------------------------------
>
>                 Key: FLINK-28939
>                 URL: https://issues.apache.org/jira/browse/FLINK-28939
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API, Table SQL / Planner
>    Affects Versions: 1.16.0
>            Reporter: godfrey he
>            Assignee: Yunhong Zheng
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.16.0
>
>
> This issue aims to verify FLIP-240: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481
> We can verify it in SQL client after we build the flink-dist package. 
> 1. create a partition table and a non-partition table (with/without compute 
> column/metadata column, with different columns), and then insert some data
> 2. verify the different statements, please refer to the FLIP doc examples
> 3. verify the result in catalog. Currently, {{describe extended}} statement 
> does not support show the statistics in catalog, we should write some code to 
> get the statistics from catalog, or we can use hive cli if the catalog is 
> hive catalog
> 4. verify the unsupported cases,
> 4.1  analyze non-existed table
> 4.2 analyze view
> 4.3 analyze a partition table with non-existed partition
> 4.4. analyze a non-partition table with a partition
> 4.5. analyze a non-existed column
> 4.6. analyze a computed column
> 4.6. analyze a metadata column



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28939) Release Testing: Verify FLIP-241 ANALYZE TABLE

Reply via email to