Hello Hadoopy friends,

Everyone is discussing how mythos is going to find all the bugs and the
world will end in two months. Before everyone though the world was going ot
end I came to the conclusion that RHEL/CENTOS will die and I started
focusing on making Haoop work for alpine and lib MUSL.

Along the way I have found a number of "pointer" issues.

https://github.com/apache/hadoop/pull/8184

I have explained in agonizing detail why it should go forward.
------------------------------------

@cnauroth <https://github.com/cnauroth> I dont know how you say you can not
reproduce this?

   1. I have published exact instructions including a docker image that
   shows this.
   2. The output is in detail in the ticket. The directory owership does
   not allow the app cache directory to be created
   3. There are more danling pointers in the code (im fixing serious leaks
   in what is supposed to be 'secure' code)
   4. The code is CLEARLY not tested. You can see by my PR that the method
   being called is never executed by the gunit tests, so what people think it
   does and what it actually does are different things. Go in the c code and
   just put exit 0. Run the tests the function create_app_dir is NEVER called
   in tests

Again to summarize: If you run nodemanager as yarn:hadoop which is how you
are supposed to run it. It wont be able to create this appcache directory.

Here are the full configs and dockers that show the problem:
https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop/compositions/ha_rm_zk_pki_tls

This is a process that tries to launch a map-reduce job as the user "auser"
https://github.com/edwardcapriolo/edgy-ansible/blob/main/imaging/hadoop/compositions/ha_rm_zk_pki_tls/enter_auser.sh

You wont see the problem until you go to launch an map-reduce job and the
nodemanager attempts to create an appcache. Not all jobs do this.

2026-01-12T19:51:25.467962032Z Can't create directory
/yarn-root/nm-local-dir/usercache/auser/appcache - Permission denied
drwxr-s---    1 auser    hadoop         0 Jan 12 20:38 auser

It cant create the directory because the "auser" user directory has "r-s"
the nodemanager cant create a directory inside it.

Remember there are many ways to run LCE (launching jobs as nobody or a
fixed user).

Setting aside the possibility of other justifications, I'm opposed to
relaxing permissions to 770
It sounds like a security compromise that shouldn't be required. It
simply shouldn't be necessary.

I will say, that lce has a lot of security issues and a tune-able config
option is the least of the worries. Example I fixed ANOTHER leak here.

else if (primary_app_dir == NULL) {
      primary_app_dir = app_dir;

One one pointer at another, then freeing the first and returning the
second! That is undefined behavior now.

As for not reproducing. Ill gladly get on teams of google chat and I can
run the entire thing over SSH.

At *worse* this fix is implemented as a "feature knob". It is not *forcing*
770 on everyone only giving this setup (the canonical one that should work)
the ability to work. At *best* it is adding one well defined config option
and fixing another undefined pointer behavior (that has probably been there
for years).
-----------------------------------------------------------

Please let me know where to go.

Reply via email to