[ 
https://issues.apache.org/jira/browse/HUDI-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937133#comment-17937133
 ] 

Sagar Sumit commented on HUDI-8384:
-----------------------------------

In `TestColumnStatsIndex`, following tests cover the listed scenarios in 
description:
testMetadataColumnStatsIndexInitializationWithUpserts, 
testMetadataColumnStatsIndexCompactionWithSQL: Case 1, 4, 6
testMetadataColumnStatsIndexInitializationWithRollbacks, testMORDeleteBlocks: 
case 2
testPartitionStatsWithClustering - case 3
testColStatsWithCleanCOW - case 7
case 10 - we have tests for most of the write ops except - insert_overwrite, 
insert_overwrite_table
 
In summary, we need following tests:
 
Case 5: lets add a test for async compaction and validate. i.e. some log files 
are added to new phantom file slice and stats are intact.
Case 8: lets trigger rollbacks and validate. i.e. insert, update (partially 
failed). validate that only stats pertianing to inserts are reflected. trigger 
a rollback and validate its still the same. retry the updates. stats should 
reflect stats w/ updated records. Here, we can extend 
`testMetadataColumnStatsIndexInitializationWithRollbacks` to do updates after 
rollback.

Case 9: lets add one long running tests. i.e with 20+ commits and aggressive 
cleaner and archival. just for sanity. or if we can enable all kinds of index 
in an existing sanity tests, we should be good.

Case 10: only for insert_overwrite, insert_overwrite_table
 
 

> Write functional tests for Cols stats partition 
> ------------------------------------------------
>
>                 Key: HUDI-8384
>                 URL: https://issues.apache.org/jira/browse/HUDI-8384
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: Sagar Sumit
>            Priority: Critical
>             Fix For: 1.0.2
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
>  
> We need to ensure that we cover the following cases for basic col stats 
> certification:
>  # insert few records validate. update the same and validate updates are 
> reflected. repeat the updates and validate stats.
> for MOR, trigger compaction and validate.
>  # For MOR, let ensure we cover all log block types (data blocks, delete 
> blocks, and rollback blocks) 
>  # trigger clustering on top of 1 and validate stats. a. for MOR, lets 
> trigger clustering before compaction and also after compaction. ensure that 
> no stats are available for the replaced file groups.
>  # insert few records, update. and delete subset of records which should 
> impact the min and max values. validate.
>  # lets add a test for async compaction and validate. i.e. some log files are 
> added to new phantom file slice and stats are intact.
>  # lets have a test for non partitioned table.
>  # Trigger clean and ensure cleaned up files are deleted from col stats. 
> Should not even return null stats. 
>  # lets trigger rollbacks and validate. i.e. insert, update (partially 
> failed). validate that only stats pertianing to inserts are reflected. 
> trigger a rollback and validate its still the same. retry the updates. stats 
> should reflect stats w/ updated records.
>  # lets add one long running tests. i.e with 20+ commits and aggressive 
> cleaner and archival. just for sanity. or if we can enable all kinds of index 
> in an existing sanity tests, we should be good.
>  # lets test all write operations. bulk_insert, insert, upsert, delete, 
> insert_overwrite, insert_overwrite_table, delete_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to