[
https://issues.apache.org/jira/browse/HUDI-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-8360:
---------------------------------
Labels: pull-request-available (was: )
> Add functional test for Secondary Index
> ---------------------------------------
>
> Key: HUDI-8360
> URL: https://issues.apache.org/jira/browse/HUDI-8360
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Lin Liu
> Assignee: Lin Liu
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Original Estimate: 10h
> Remaining Estimate: 10h
>
>
> We need to ensure that we cover the following cases:
> # insert few records validate. update the same and validate updates are
> reflected. repeat the updates and validate stats.
> for MOR, trigger compaction and validate.
> # trigger clustering on top of 1 and validate stats. a. for MOR, lets
> trigger clustering before compaction and also after compaction. ensure that
> no stats are available for the replaced file groups.
> # insert few records, update. and delete subset of records which should
> impact the min and max values. validate.
> # lets add a test for async compaction and validate. i.e. some log files are
> added to new phantom file slice and stats are intact.
> # lets have a test for non partitioned table.
> # lets trigger rollbacks and validate. i.e. insert, update (partially
> failed). validate that only stats pertianing to inserts are reflected.
> trigger a rollback and validate its still the same. retry the updates. stats
> should reflect stats w/ updated records.
> # lets add one long running tests. i.e with 20+ commits and aggressive
> cleaner and archival. just for sanity. or if we can enable all kinds of index
> in an existing sanity tests, I am good.
> # lets test all write operations. bulk_insert, insert, upsert, delete,
> insert_overwrite, insert_overwrite_table, delete_partition.
> # add a test for non partitioned dataset as well (for the unmerged log
> record reading flow)
>
> Above tests are generic for any indexes.
> On testing secondary index per se, lets try to cover below scenarios.
> a. add 100 entries w/ a mix of secondary index values. and
> update 1 of them to another sec index value that already exists.
> update 1 of them to a new sec index value.
> delete one of the record whose sec index has other primary keys
> referenced.
> delete one of the record whose secondary index value is not referenced by
> any other primary key values.
> b. add 100 entries w/ a mix of secondary index values.
> update one subset of records for a given sec index value to another.
> delete one subset of records for a given sec index value.
> insert new records to new sec index value.
> c. add 100 entries w/ a mix of secondary index values.
> update one subset of records for a given sec index value to another.
> delete one subset of records for a given sec index value.
> insert new records overlapping w/ the second index value which got
> updated and deleted in this batch.
> d. add 100 entries w/ just 1 sec index value.
> update a subset to N no of sec index values.
> delete a subset of them.
>
> On the validation front:
> Ensure to read back all records and match the entire expected list rather
> than just 1 entry lookup.
> Ensure MDT compaction kicks in and repeat the validation
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)