[ 
https://issues.apache.org/jira/browse/HUDI-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2405:
-----------------------------
    Description: 
Design doc

https://lucid.app/lucidchart/invitations/accept/inv_b0797e55-8e60-473d-be04-ac2003269db2
 

 

 
 * Objective : test metadata table for files and timeline integrity. 
 **  Manipulate commits and transitions. empty files should do. Ability to sync 
to metadata table. Commit metadata is the crux here. 
 *** - Commit/DeltaCommit
 *** - Compaction
 *** - Cleaning
 *** - ReplaceCommit/Clustering
 *** - Savepoint/delete savepoint/restore savepoint
 *** - Rollback
 *** - Restore
 ** We will list using this test table and verify data integrity. 

 

Also, enhance to support actual records.

Objective: test whole of Hoodie for data integrity. records to file locations 
are user defined or test driven. 
 * Updates? Deletes. should we let callers pass in HoodieRecords w/ proper file 
location and write them directly. 
 * should work for inserts, upserts, deletes, compaction, clustering, rollback. 
 * how does cleaner plan, compaction plan would pan out?
 * can we maintain in-memory state and simulate updates, etc. anyways, its not 
distributed right. We are testing just functionality. 

 

Document what do we miss testing in actual code path if we start using this 
test tables for testing.  
 * for eg: index. 
 * partitioner. 
 * write handles (create, append, merge). 
 * ...

  was:
[WIP design 
doc|https://lucid.app/publicSegments/view/563d5afe-919a-4d3b-8933-bb764a89f512/image.jpeg]

 

 

 
 * Objective : test metadata table for files and timeline integrity. 
 **  Manipulate commits and transitions. empty files should do. Ability to sync 
to metadata table. Commit metadata is the crux here. 
 *** - Commit/DeltaCommit
 *** - Compaction
 *** - Cleaning
 *** - ReplaceCommit/Clustering
 *** - Savepoint/delete savepoint/restore savepoint
 *** - Rollback
 *** - Restore
 ** We will list using this test table and verify data integrity. 

 

Also, enhance to support actual records.

Objective: test whole of Hoodie for data integrity. records to file locations 
are user defined or test driven. 
 * Updates? Deletes. should we let callers pass in HoodieRecords w/ proper file 
location and write them directly. 
 * should work for inserts, upserts, deletes, compaction, clustering, rollback. 
 * how does cleaner plan, compaction plan would pan out?
 * can we maintain in-memory state and simulate updates, etc. anyways, its not 
distributed right. We are testing just functionality. 

 

Document what do we miss testing in actual code path if we start using this 
test tables for testing.  
 * for eg: index. 
 * partitioner. 
 * write handles (create, append, merge). 
 * ...


> HoodieTest tables enhancement
> -----------------------------
>
>                 Key: HUDI-2405
>                 URL: https://issues.apache.org/jira/browse/HUDI-2405
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: Raymond Xu
>            Priority: Major
>
> Design doc
> https://lucid.app/lucidchart/invitations/accept/inv_b0797e55-8e60-473d-be04-ac2003269db2
>  
>  
>  
>  * Objective : test metadata table for files and timeline integrity. 
>  **  Manipulate commits and transitions. empty files should do. Ability to 
> sync to metadata table. Commit metadata is the crux here. 
>  *** - Commit/DeltaCommit
>  *** - Compaction
>  *** - Cleaning
>  *** - ReplaceCommit/Clustering
>  *** - Savepoint/delete savepoint/restore savepoint
>  *** - Rollback
>  *** - Restore
>  ** We will list using this test table and verify data integrity. 
>  
> Also, enhance to support actual records.
> Objective: test whole of Hoodie for data integrity. records to file locations 
> are user defined or test driven. 
>  * Updates? Deletes. should we let callers pass in HoodieRecords w/ proper 
> file location and write them directly. 
>  * should work for inserts, upserts, deletes, compaction, clustering, 
> rollback. 
>  * how does cleaner plan, compaction plan would pan out?
>  * can we maintain in-memory state and simulate updates, etc. anyways, its 
> not distributed right. We are testing just functionality. 
>  
> Document what do we miss testing in actual code path if we start using this 
> test tables for testing.  
>  * for eg: index. 
>  * partitioner. 
>  * write handles (create, append, merge). 
>  * ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to