[
https://issues.apache.org/jira/browse/HUDI-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412730#comment-17412730
]
sivabalan narayanan commented on HUDI-2405:
-------------------------------------------
[~rxu]:
I checked the design proposal. Def looks good and the way we wanna go.
couple of comments
1. I feel apis like this (with10Records3PartitionsAsCommits() ) api is tad bit
rigid. I am ok having these apis. but also, we should have apis so that users
can dictate the partitions and just pass count of files. Something like
testTable.insert(commitInstant, operationType, list of new partitions to add,
list of partitions to insert/update, files to be added per partition. I feel
this will be very useful to write tests for around certain partitions like
updates, insert_overwrite, etc.
2. Not sure if its implicit in the attached doc. but would like to ensure we
have 2 diff set of apis. 1 set of apis is just about metadata management w/
just empty files. and another set of apis to operate with actual records.
> HoodieTest tables enhancement
> -----------------------------
>
> Key: HUDI-2405
> URL: https://issues.apache.org/jira/browse/HUDI-2405
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Assignee: Raymond Xu
> Priority: Major
>
> [WIP design
> doc|https://lucid.app/publicSegments/view/563d5afe-919a-4d3b-8933-bb764a89f512/image.jpeg]
>
>
>
> * Objective : test metadata table for files and timeline integrity.
> ** Manipulate commits and transitions. empty files should do. Ability to
> sync to metadata table. Commit metadata is the crux here.
> *** - Commit/DeltaCommit
> *** - Compaction
> *** - Cleaning
> *** - ReplaceCommit/Clustering
> *** - Savepoint/delete savepoint/restore savepoint
> *** - Rollback
> *** - Restore
> ** We will list using this test table and verify data integrity.
>
> Also, enhance to support actual records.
> Objective: test whole of Hoodie for data integrity. records to file locations
> are user defined or test driven.
> * Updates? Deletes. should we let callers pass in HoodieRecords w/ proper
> file location and write them directly.
> * should work for inserts, upserts, deletes, compaction, clustering,
> rollback.
> * how does cleaner plan, compaction plan would pan out?
> * can we maintain in-memory state and simulate updates, etc. anyways, its
> not distributed right. We are testing just functionality.
>
> Document what do we miss testing in actual code path if we start using this
> test tables for testing.
> * for eg: index.
> * partitioner.
> * write handles (create, append, merge).
> * ...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)