[
https://issues.apache.org/jira/browse/HUDI-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911305#comment-17911305
]
Y Ethan Guo commented on HUDI-8832:
-----------------------------------
The current test flow looks good to me. A few more factors and aspects to test
and cover:
* Make sure the test covers both small file handling and log files (for MOR
table); to explicitly generate log files and avoid small file handling, set
[hoodie.merge.small.file.group.candidates.limit|https://hudi.apache.org/docs/configurations#hoodiemergesmallfilegroupcandidateslimit]=0
(for both merge modes);
* Test COW table type (for both merge modes);
* For EVENT_TIME_ORDERING, test the case where the ordering value is the same
as the record on storage, and the record should be updated or deleted. The
expected behavior is that when the ordering value is the same, pick the record
from the latest batch. Cover UPDATE, DELETE, MERGE INTO statements.
* For EVENT_TIME_ORDERING, test SQL UPDATE statement (1) without precombine
field (this should follow commit time ordering, i.e., updates overwrite the
latest), (2) with prcombine field SET to both higher and lower value (e.g.,
storage has records of precombine value of 100, 102, the SET statement assigns
precombine field as 101, the expected behavior is that individual record should
combine its own precombine field value);
* For EVENT_TIME_ORDERING, it looks like that if the incoming update has
smaller ordering value, MERGE INTO has unexpected behavior by showing that in
the snapshot query (double check if this is a bug);
* For EVENT_TIME_ORDERING and MERGE INTO statement, test deletes without
providing the precombine field; it should follow commit time ordering for such
deletes
* Right now the merge mode is set as "set
hoodie.record.merge.mode=COMMIT_TIME_ORDERING". Could you also test with table
config with merge mode, i.e., passing in the merge mode in the CREATE TABLE
statement ("recordMergeMode=xyz"), without setting the config at Spark session?
Also, could we transform this into a functional test or change an existing test
to cover the same flow?
> Manual testing all DML meets requirement
> ----------------------------------------
>
> Key: HUDI-8832
> URL: https://issues.apache.org/jira/browse/HUDI-8832
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Davis Zhang
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
> Attachments: COMMIT_TIME_ORDERING.txt, EVENT_TIME_ORDERING.txt
>
> Original Estimate: 4h
> Time Spent: 1.5h
> Remaining Estimate: 2.5h
>
> 0.5 days
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)