[
https://issues.apache.org/jira/browse/HUDI-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631065#comment-17631065
]
Raymond Xu commented on HUDI-5093:
----------------------------------
The time-consuming part mainly comes from
`org.apache.hudi.common.testutils.HoodieMetadataTestTable#doWriteOperation`
which invokes spark metadata writer to update metadata table upon a new commit.
This is a necessary process. I don't think there is much room to optimize here.
If we implement a java metadata writer, it may be faster but lose coverage
around spark metadata writer, which is the major use case.
WDYT? [~guoyihua][~shivnarayan]
> Revisiting doWriteOperation for preparing test data
> ---------------------------------------------------
>
> Key: HUDI-5093
> URL: https://issues.apache.org/jira/browse/HUDI-5093
> Project: Apache Hudi
> Issue Type: Improvement
> Components: tests-ci
> Reporter: Ethan Guo
> Assignee: Raymond Xu
> Priority: Major
> Fix For: 0.12.2
>
>
> [This
> method|https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java#L910]
> in HoodieTestTable is used to create commits and some test methods will
> create over 10 commits. Each call is taking 3-4 seconds locally for me so if
> we could cut this down to 1-2 seconds we would see a big testing performance
> improvement.
> public HoodieCommitMetadata doWriteOperation(String commitTime,
> WriteOperationType operationType,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)