[ 
https://issues.apache.org/jira/browse/HUDI-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911305#comment-17911305
 ] 

Y Ethan Guo commented on HUDI-8832:
-----------------------------------

The current test flow looks good to me.  A few more factors and aspects to test 
and cover:
 * Make sure the test covers both small file handling and log files (for MOR 
table); to explicitly generate log files and avoid small file handling, set 
[hoodie.merge.small.file.group.candidates.limit|https://hudi.apache.org/docs/configurations#hoodiemergesmallfilegroupcandidateslimit]=0
 (for both merge modes);
 * Test COW table type (for both merge modes);
 * For EVENT_TIME_ORDERING, test the case where the ordering value is the same 
as the record on storage, and the record should be updated or deleted. The 
expected behavior is that when the ordering value is the same, pick the record 
from the latest batch.  Cover UPDATE, DELETE, MERGE INTO statements.
 * For EVENT_TIME_ORDERING, test SQL UPDATE statement (1) without precombine 
field (this should follow commit time ordering, i.e., updates overwrite the 
latest), (2) with prcombine field SET to both higher and lower value (e.g., 
storage has records of precombine value of 100, 102, the SET statement assigns 
precombine field as 101, the expected behavior is that individual record should 
combine its own precombine field value);
 * For EVENT_TIME_ORDERING, it looks like that if the incoming update has 
smaller ordering value, MERGE INTO has unexpected behavior by showing that in 
the snapshot query (double check if this is a bug);
 * For EVENT_TIME_ORDERING and MERGE INTO statement, test deletes without 
providing the precombine field; it should follow commit time ordering for such 
deletes
 * Right now the merge mode is set as "set 
hoodie.record.merge.mode=COMMIT_TIME_ORDERING". Could you also test with table 
config with merge mode, i.e., passing in the merge mode in the CREATE TABLE 
statement ("recordMergeMode=xyz"), without setting the config at Spark session?

Also, could we transform this into a functional test or change an existing test 
to cover the same flow?

 

> Manual testing all DML meets requirement
> ----------------------------------------
>
>                 Key: HUDI-8832
>                 URL: https://issues.apache.org/jira/browse/HUDI-8832
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Davis Zhang
>            Assignee: Davis Zhang
>            Priority: Blocker
>             Fix For: 1.0.1
>
>         Attachments: COMMIT_TIME_ORDERING.txt, EVENT_TIME_ORDERING.txt
>
>   Original Estimate: 4h
>          Time Spent: 1.5h
>  Remaining Estimate: 2.5h
>
> 0.5 days
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to