[ 
https://issues.apache.org/jira/browse/HUDI-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631065#comment-17631065
 ] 

Raymond Xu commented on HUDI-5093:
----------------------------------

The time-consuming part mainly comes from 
`org.apache.hudi.common.testutils.HoodieMetadataTestTable#doWriteOperation` 
which invokes spark metadata writer to update metadata table upon a new commit.

This is a necessary process. I don't think there is much room to optimize here. 
If we implement a java metadata writer, it may be faster but lose coverage 
around spark metadata writer, which is the major use case. 

WDYT? [~guoyihua][~shivnarayan]

> Revisiting doWriteOperation for preparing test data
> ---------------------------------------------------
>
>                 Key: HUDI-5093
>                 URL: https://issues.apache.org/jira/browse/HUDI-5093
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: tests-ci
>            Reporter: Ethan Guo
>            Assignee: Raymond Xu
>            Priority: Major
>             Fix For: 0.12.2
>
>
> [This 
> method|https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java#L910]
>  in HoodieTestTable is used to create commits and some test methods will 
> create over 10 commits. Each call is taking 3-4 seconds locally for me so if 
> we could cut this down to 1-2 seconds we would see a big testing performance 
> improvement.
> public HoodieCommitMetadata doWriteOperation(String commitTime, 
> WriteOperationType operationType,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to