slfan1989 commented on code in PR #9065:
URL: https://github.com/apache/hudi/pull/9065#discussion_r1244702211
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java:
##########
@@ -140,7 +144,9 @@ private HoodieMemoryConfig() {
public static String getDefaultSpillableMapBasePath() {
String[] localDirs = FileIOUtils.getConfiguredLocalDirs();
- return (localDirs != null && localDirs.length > 0) ? localDirs[0] :
"/tmp/";
+ List<String> localDirLists = Arrays.asList(localDirs);
+ Collections.shuffle(localDirLists);
+ return !localDirLists.isEmpty() ? localDirLists.get(0) : "/tmp/";
Review Comment:
Thank you very much for helping with the code review! `LOCAL_DIRS` is an
environment variable set by YARN during Container startup, taking into
consideration disk space availability and cases where a disk may be
inaccessible.
- Hudi#FileIOUtils#getYarnLocalDirs
```
private static String getYarnLocalDirs() {
String localDirs = System.getenv("LOCAL_DIRS");
if (localDirs == null) {
throw new HoodieIOException("Yarn Local dirs can't be empty");
}
return localDirs;
}
```
- YARN
1. LinuxContainerExecutor#buildContainerRuntimeContext#setLocalDir
```
private ContainerRuntimeContext buildContainerRuntimeContext(
ContainerStartContext ctx, Path pidFilePath, String resourcesOptions,
String tcCommandFile, List<String> numaArgs) {
.....
Container container = ctx.getContainer();
ContainerRuntimeContext.Builder builder = new ContainerRuntimeContext
.Builder(container);
if (prefixCommands.size() > 0) {
builder.setExecutionAttribute(CONTAINER_LAUNCH_PREFIX_COMMANDS,
prefixCommands);
}
builder.setExecutionAttribute(LOCALIZED_RESOURCES,
ctx.getLocalizedResources())
.setExecutionAttribute(RUN_AS_USER, getRunAsUser(ctx.getUser()))
.setExecutionAttribute(USER, ctx.getUser())
.setExecutionAttribute(APPID, ctx.getAppId())
.....
.setExecutionAttribute(NM_PRIVATE_TRUSTSTORE_PATH,
ctx.getNmPrivateTruststorePath())
.setExecutionAttribute(PID_FILE_PATH, pidFilePath)
// (**) Set the environment variable LocalDir
.setExecutionAttribute(LOCAL_DIRS, ctx.getLocalDirs())
.....
if (tcCommandFile != null) {
builder.setExecutionAttribute(TC_COMMAND_FILE, tcCommandFile);
}
return builder.build();
}
```
2. ContainerLaunch#call#prepareLocalDirs
```
// dirsHandler is LocalDirsHandlerService
List<String> localDirs = dirsHandler.getLocalDirs();
if (truststore != null) {
addTruststoreVars(environment, containerWorkDir);
}
```
3. LocalDirsHandlerService#getLocalDirs
```
public List<String> getLocalDirs() {
return localDirs.getGoodDirs();
}
```
4. DirectoryCollection#checkDirs
```
switch (entry.getValue().cause) {
case DISK_FULL:
fullDirs.add(entry.getKey());
break;
case OTHER:
errorDirs.add(entry.getKey());
break;
default:
LOG.warn(entry.getValue().cause + " is unknown for disk error.");
break;
}
```
YARN's NodeManager automatically scans all managed disks and excludes any
faulty disks based on criteria such as high disk utilization or disks that are
unable to write data. When using LOCAL_DIRS in Hudi, we randomly select a disk,
which should meet our requirements.
@danny0405 @XuQianJin-Stars
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]