xyu opened a new pull request, #7182:
URL: https://github.com/apache/hadoop/pull/7182

   The `whitelistedEnv()` function writes a bash export for the environment 
using a default value syntax, e.g.:
   
   ```
   export MY_VAR=${MY_VAR:-"value sent in NM env"}
   ```
   
   Which makes checking if `MY_VAR` exists in the environment passed into the 
YARN launch extraneous. The downside of this extra check is that there are some 
envvars related to Docker YARN LCE containers that appear to be set to empty 
strings. 
   
   For example `YARN_CONTAINER_RUNTIME_DOCKER_IMAGE` appears to always be set 
to an empty string unless explicitly set. In my test environment I have 
something like the following in my `yarn-env.sh`:
   
   ```
   export YARN_CONTAINER_RUNTIME_DOCKER_IMAGE='hadoop-yarn-lce:some-tag'
   ```
   
   Checking `/proc/{PID}/environ` of the NodeManager process shows that 
`YARN_CONTAINER_RUNTIME_DOCKER_IMAGE` is set as expected.
   
   I also have `YARN_CONTAINER_RUNTIME_DOCKER_IMAGE` set as a whitelisted 
envvar with `yarn.nodemanager.env-whitelist` in `yarn-site.xml`.
   
   When I try and submit a job to be run within Docker with something like:
   
   ```
   yarn \
     org.apache.hadoop.yarn.applications.distributedshell.Client \
     -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
     -shell_command sleep -shell_args 10 \
     -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker
   ```
   
   My expectation is that the default Docker image value set in the NM would be 
used however I get the following error because an empty string is causing 
`containesKey()` to true and skipping loading of the NM default value:
   
   ```
   Exception from container-launch.
   Container id: container_e5300_1732314455042_0036_01_000002
   Exit code: -1
   Exception message: YARN_CONTAINER_RUNTIME_DOCKER_IMAGE not set!
   Shell error output: <unknown>
   Shell output: <unknown>
   
   [2024-11-22 23:54:04.582]Container exited with a non-zero exit code -1.
   ```
   
   This issue happens with `YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS` and perhaps 
other envvars as well. Interestingly it does not happen with 
`YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME`
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to