hudi-bot opened a new issue, #17291:
URL: https://github.com/apache/hudi/issues/17291

   We need to ensure that we cover the following cases for basic col stats 
certification:
    # insert few records validate. update the same and validate updates are 
reflected. repeat the updates and validate stats.
   for MOR, trigger compaction and validate.
    # For MOR, let ensure we cover all log block types (data blocks, delete 
blocks, and rollback blocks) 
    # trigger clustering on top of 1 and validate stats. a. for MOR, lets 
trigger clustering before compaction and also after compaction. ensure that no 
stats are available for the replaced file groups.
    # insert few records, update. and delete subset of records which should 
impact the min and max values. validate.
    # lets add a test for async compaction and validate. i.e. some log files 
are added to new phantom file slice and stats are intact.
    # lets have a test for non partitioned table.
    # Trigger clean and ensure cleaned up files are deleted from col stats. 
Should not even return null stats. 
    # lets trigger rollbacks and validate. i.e. insert, update (partially 
failed). validate that only stats pertianing to inserts are reflected. trigger 
a rollback and validate its still the same. retry the updates. stats should 
reflect stats w/ updated records.
    # lets add one long running tests. i.e with 20+ commits and aggressive 
cleaner and archival. just for sanity. or if we can enable all kinds of index 
in an existing sanity tests, we should be good.
    # lets test all write operations. bulk_insert, insert, upsert, delete, 
insert_overwrite, insert_overwrite_table, delete_partition.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8384
   - Type: Sub-task
   - Parent: https://issues.apache.org/jira/browse/HUDI-8727
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   18/Oct/24 00:17;shivnarayan;Here are the follow ups from the AUDITing we did 
on col stats design and pruning. 
   
   https://issues.apache.org/jira/browse/HUDI-8388 
   
   https://issues.apache.org/jira/browse/HUDI-8389 
   
   https://issues.apache.org/jira/browse/HUDI-8390 
   
    
   
    ;;;
   
   ---
   
   20/Mar/25 14:56;codope;In `TestColumnStatsIndex`, following tests cover the 
listed scenarios in description:
   testMetadataColumnStatsIndexInitializationWithUpserts, 
testMetadataColumnStatsIndexCompactionWithSQL: Case 1, 4, 6
   testMetadataColumnStatsIndexInitializationWithRollbacks, 
testMORDeleteBlocks: case 2
   testPartitionStatsWithClustering - case 3
   testColStatsWithCleanCOW - case 7
   case 10 - we have tests for most of the write ops except - insert_overwrite, 
insert_overwrite_table
    
   In summary, we need following tests:
    
   Case 5: lets add a test for async compaction and validate. i.e. some log 
files are added to new phantom file slice and stats are intact.
   Case 8: lets trigger rollbacks and validate. i.e. insert, update (partially 
failed). validate that only stats pertianing to inserts are reflected. trigger 
a rollback and validate its still the same. retry the updates. stats should 
reflect stats w/ updated records. Here, we can extend 
`testMetadataColumnStatsIndexInitializationWithRollbacks` to do updates after 
rollback.
   
   Case 9: lets add one long running tests. i.e with 20+ commits and aggressive 
cleaner and archival. just for sanity. or if we can enable all kinds of index 
in an existing sanity tests, we should be good.
   
   Case 10: only for insert_overwrite, insert_overwrite_table
    
    ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to