Chis, thank you very much for your quick response.
My colleague (who raised the issue) tried to use that functionality but it wasn't a good fit for Kubernetes because it didn't let us remap ports. We bind to a constant port inside the pod and let Kubernetes assign us an exposed nodeport but HDFS is hard-coded to always advertise the port of the bound socket. The multihoming feature approaches it from a kind of backwards angle (("instead of binding to the advertised address, bind to this!") Kafka (and maybe others) do it the other way around: "instead of advertising the bound address, advertise this". We need to be careful to not implement the same thing twice, I agree. But the port functionality is definitely missing. We just wanted to make sure that this is something worthwhile (we believe so) before starting the proper implementation/proposal. Cheers, Lars On Fri, Jun 24, 2022 at 6:31 PM Chris Nauroth <cnaur...@apache.org> wrote: > Hello Lars, > > I can't say I've personally run HDFS on Kubernetes with Kerberos enabled. > However, some of the issues you raise sound like they have some overlap > with the HDFS multi-homing features: > > > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html > > Have you seen this? Does anything look helpful there? > > Chris Nauroth > > > On Fri, Jun 24, 2022 at 4:55 AM Lars Francke <lars.fran...@gmail.com> > wrote: > >> Hi everyone, >> >> we're trying to get HDFS running in Kubernetes using Kerberos. >> This has some challenges as you might expect. >> We have created an issue for that including a spike: >> https://issues.apache.org/jira/browse/HDFS-16577 >> >> Currently (as of 3.2.2, but reading through the release notes this doesn't >> seem to have changed since then) DataNodes use the same properties for >> deciding which port to bind each service to, as for deciding which ports >> are included in the `DatanodeRegistration` sent to the NameNode. Further, >> NameNodes overwrite the DataNode's IP address with the incoming address >> during registration. >> >> Both of these prevent external users from connecting to DataNodes that are >> hosted behind some sort of NAT (such as Kubernetes). >> >> We'd go ahead with a proper implementation/PR but we thought about asking >> for comments/feedback first. Maybe someone else has already done some work >> here that we might have missed etc. >> >> Thank you! >> >> Cheers, >> Lars >> >