[
https://issues.apache.org/jira/browse/HUDI-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hui An resolved HUDI-5692.
--------------------------
> SpillableMapBasePath should be lazily loaded
> --------------------------------------------
>
> Key: HUDI-5692
> URL: https://issues.apache.org/jira/browse/HUDI-5692
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Hui An
> Assignee: Hui An
> Priority: Major
> Labels: pull-request-available
>
> If we use {{withInferFunction}} to set the default value of
> {{{}SPILLABLE_MAP_BASE_PATH{}}}, this default value will be set to
> {{{}HoodieWriteConfig{}}}'s {{{}properties{}}}, and will be serialized to all
> executors. This could introduce the issue that if the driver doesn't have the
> same temporary location with the executors side(e.g. driver: /mnt/disk1,
> executor: /mnt/disk2), the executor would throw error to create the spilled
> map path(since the executor machine doesn't have the directory /mnt/disk1).
> {code:java}
> Caused by: org.apache.hudi.exception.HoodieIOException: Unable to create
> :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238
> at
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:119)
> at
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMapNumEntries(ExternalSpillableMap.java:138)
> at org.apache.hudi.io.HoodieMergeHandle.init(HoodieMergeHandle.java:268)
> at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:129)
> at org.apache.hudi.io.HoodieMergeHandle.(HoodieMergeHandle.java:121)
> at org.apache.hudi.io.HoodieConcatHandle.(HoodieConcatHandle.java:81)
> at
> org.apache.hudi.io.HoodieMergeHandleFactory.create(HoodieMergeHandleFactory.java:60)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:386)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:363)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:330)
> ... 29 more
> Caused by: java.io.IOException: Unable to create
> :/mnt/ssd/0/yarn/nm-local-dir/usercache/test/appcache/application_1673593627114_3970647/hudi-BITCASK-e3741235-6571-4112-8b20-271408148238
> at org.apache.hudi.common.util.FileIOUtils.mkdir(FileIOUtils.java:70)
> at org.apache.hudi.common.util.collection.DiskMap.(DiskMap.java:55)
> at
> org.apache.hudi.common.util.collection.BitCaskDiskMap.(BitCaskDiskMap.java:98)
> at
> org.apache.hudi.common.util.collection.ExternalSpillableMap.getDiskBasedMap(ExternalSpillableMap.java:116)
> ... 38 more
>
> {code}
> A better solution is to calculate the temporary location when calling
> {{getSpillableMapBasePath}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)