[
https://issues.apache.org/jira/browse/HUDI-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937133#comment-17937133
]
Sagar Sumit commented on HUDI-8384:
-----------------------------------
In `TestColumnStatsIndex`, following tests cover the listed scenarios in
description:
testMetadataColumnStatsIndexInitializationWithUpserts,
testMetadataColumnStatsIndexCompactionWithSQL: Case 1, 4, 6
testMetadataColumnStatsIndexInitializationWithRollbacks, testMORDeleteBlocks:
case 2
testPartitionStatsWithClustering - case 3
testColStatsWithCleanCOW - case 7
case 10 - we have tests for most of the write ops except - insert_overwrite,
insert_overwrite_table
In summary, we need following tests:
Case 5: lets add a test for async compaction and validate. i.e. some log files
are added to new phantom file slice and stats are intact.
Case 8: lets trigger rollbacks and validate. i.e. insert, update (partially
failed). validate that only stats pertianing to inserts are reflected. trigger
a rollback and validate its still the same. retry the updates. stats should
reflect stats w/ updated records. Here, we can extend
`testMetadataColumnStatsIndexInitializationWithRollbacks` to do updates after
rollback.
Case 9: lets add one long running tests. i.e with 20+ commits and aggressive
cleaner and archival. just for sanity. or if we can enable all kinds of index
in an existing sanity tests, we should be good.
Case 10: only for insert_overwrite, insert_overwrite_table
> Write functional tests for Cols stats partition
> ------------------------------------------------
>
> Key: HUDI-8384
> URL: https://issues.apache.org/jira/browse/HUDI-8384
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: metadata
> Reporter: sivabalan narayanan
> Assignee: Sagar Sumit
> Priority: Critical
> Fix For: 1.0.2
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
>
> We need to ensure that we cover the following cases for basic col stats
> certification:
> # insert few records validate. update the same and validate updates are
> reflected. repeat the updates and validate stats.
> for MOR, trigger compaction and validate.
> # For MOR, let ensure we cover all log block types (data blocks, delete
> blocks, and rollback blocks)
> # trigger clustering on top of 1 and validate stats. a. for MOR, lets
> trigger clustering before compaction and also after compaction. ensure that
> no stats are available for the replaced file groups.
> # insert few records, update. and delete subset of records which should
> impact the min and max values. validate.
> # lets add a test for async compaction and validate. i.e. some log files are
> added to new phantom file slice and stats are intact.
> # lets have a test for non partitioned table.
> # Trigger clean and ensure cleaned up files are deleted from col stats.
> Should not even return null stats.
> # lets trigger rollbacks and validate. i.e. insert, update (partially
> failed). validate that only stats pertianing to inserts are reflected.
> trigger a rollback and validate its still the same. retry the updates. stats
> should reflect stats w/ updated records.
> # lets add one long running tests. i.e with 20+ commits and aggressive
> cleaner and archival. just for sanity. or if we can enable all kinds of index
> in an existing sanity tests, we should be good.
> # lets test all write operations. bulk_insert, insert, upsert, delete,
> insert_overwrite, insert_overwrite_table, delete_partition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)