Hello Hadoopy friends, Everyone is discussing how mythos is going to find all the bugs and the world will end in two months. Before everyone though the world was going ot end I came to the conclusion that RHEL/CENTOS will die and I started focusing on making Haoop work for alpine and lib MUSL.
Along the way I have found a number of "pointer" issues. https://github.com/apache/hadoop/pull/8184 I have explained in agonizing detail why it should go forward. ------------------------------------ @cnauroth <https://github.com/cnauroth> I dont know how you say you can not reproduce this? 1. I have published exact instructions including a docker image that shows this. 2. The output is in detail in the ticket. The directory owership does not allow the app cache directory to be created 3. There are more danling pointers in the code (im fixing serious leaks in what is supposed to be 'secure' code) 4. The code is CLEARLY not tested. You can see by my PR that the method being called is never executed by the gunit tests, so what people think it does and what it actually does are different things. Go in the c code and just put exit 0. Run the tests the function create_app_dir is NEVER called in tests Again to summarize: If you run nodemanager as yarn:hadoop which is how you are supposed to run it. It wont be able to create this appcache directory. Here are the full configs and dockers that show the problem: https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop/compositions/ha_rm_zk_pki_tls This is a process that tries to launch a map-reduce job as the user "auser" https://github.com/edwardcapriolo/edgy-ansible/blob/main/imaging/hadoop/compositions/ha_rm_zk_pki_tls/enter_auser.sh You wont see the problem until you go to launch an map-reduce job and the nodemanager attempts to create an appcache. Not all jobs do this. 2026-01-12T19:51:25.467962032Z Can't create directory /yarn-root/nm-local-dir/usercache/auser/appcache - Permission denied drwxr-s--- 1 auser hadoop 0 Jan 12 20:38 auser It cant create the directory because the "auser" user directory has "r-s" the nodemanager cant create a directory inside it. Remember there are many ways to run LCE (launching jobs as nobody or a fixed user). Setting aside the possibility of other justifications, I'm opposed to relaxing permissions to 770 It sounds like a security compromise that shouldn't be required. It simply shouldn't be necessary. I will say, that lce has a lot of security issues and a tune-able config option is the least of the worries. Example I fixed ANOTHER leak here. else if (primary_app_dir == NULL) { primary_app_dir = app_dir; One one pointer at another, then freeing the first and returning the second! That is undefined behavior now. As for not reproducing. Ill gladly get on teams of google chat and I can run the entire thing over SSH. At *worse* this fix is implemented as a "feature knob". It is not *forcing* 770 on everyone only giving this setup (the canonical one that should work) the ability to work. At *best* it is adding one well defined config option and fixing another undefined pointer behavior (that has probably been there for years). ----------------------------------------------------------- Please let me know where to go.
