[ 
https://issues.apache.org/jira/browse/HUDI-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8360:
---------------------------------
    Labels: pull-request-available  (was: )

> Add functional test for Secondary Index
> ---------------------------------------
>
>                 Key: HUDI-8360
>                 URL: https://issues.apache.org/jira/browse/HUDI-8360
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Lin Liu
>            Assignee: Lin Liu
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
>  
> We need to ensure that we cover the following cases:
>  # insert few records validate. update the same and validate updates are 
> reflected. repeat the updates and validate stats.
> for MOR, trigger compaction and validate.
>  # trigger clustering on top of 1 and validate stats. a. for MOR, lets 
> trigger clustering before compaction and also after compaction. ensure that 
> no stats are available for the replaced file groups.
>  # insert few records, update. and delete subset of records which should 
> impact the min and max values. validate.
>  # lets add a test for async compaction and validate. i.e. some log files are 
> added to new phantom file slice and stats are intact.
>  # lets have a test for non partitioned table.
>  # lets trigger rollbacks and validate. i.e. insert, update (partially 
> failed). validate that only stats pertianing to inserts are reflected. 
> trigger a rollback and validate its still the same. retry the updates. stats 
> should reflect stats w/ updated records.
>  # lets add one long running tests. i.e with 20+ commits and aggressive 
> cleaner and archival. just for sanity. or if we can enable all kinds of index 
> in an existing sanity tests, I am good.
>  # lets test all write operations. bulk_insert, insert, upsert, delete, 
> insert_overwrite, insert_overwrite_table, delete_partition.
>  # add a test for non partitioned dataset as well (for the unmerged log 
> record reading flow)
>  
> Above tests are generic for any indexes. 
> On testing secondary index per se, lets try to cover below scenarios. 
> a. add 100 entries w/ a mix of secondary index values. and
>    update 1 of them to another sec index value that already exists.
>    update 1 of them to a new sec index value.
>    delete one of the record whose sec index has other primary keys 
> referenced. 
>    delete one of the record whose secondary index value is not referenced by 
> any other primary key values. 
> b. add 100 entries w/ a mix of secondary index values. 
>     update one subset of records for a given sec index value to another. 
>     delete one subset of records for a given sec index value. 
>     insert new records to new sec index value. 
> c. add 100 entries w/ a mix of secondary index values. 
>     update one subset of records for a given sec index value to another. 
>     delete one subset of records for a given sec index value. 
>     insert new records overlapping w/ the second index value which got 
> updated and deleted in this batch. 
> d. add 100 entries w/ just 1 sec index value. 
>     update a subset to N no of sec index values. 
>     delete a subset of them. 
>  
> On the validation front: 
> Ensure to read back all records and match the entire expected list rather 
> than just 1 entry lookup. 
> Ensure MDT compaction kicks in and repeat the validation 
>  
>  
>     
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to