Chis,

thank you very much for your quick response.

My colleague (who raised the issue) tried to use that functionality but it
wasn't a good fit for Kubernetes because it didn't let us remap ports.
We bind to a constant port inside the pod and let Kubernetes assign us an
exposed nodeport but HDFS is hard-coded to always advertise the port of the
bound socket.

The multihoming feature approaches it from a kind of backwards angle
(("instead of binding to the advertised address, bind to this!")
Kafka (and maybe others) do it the other way around: "instead of
advertising the bound address, advertise this".

We need to be careful to not implement the same thing twice, I agree. But
the port functionality is definitely missing.

We just wanted to make sure that this is something worthwhile (we believe
so) before starting the proper implementation/proposal.

Cheers,
Lars




On Fri, Jun 24, 2022 at 6:31 PM Chris Nauroth <cnaur...@apache.org> wrote:

> Hello Lars,
>
> I can't say I've personally run HDFS on Kubernetes with Kerberos enabled.
> However, some of the issues you raise sound like they have some overlap
> with the HDFS multi-homing features:
>
>
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html
>
> Have you seen this? Does anything look helpful there?
>
> Chris Nauroth
>
>
> On Fri, Jun 24, 2022 at 4:55 AM Lars Francke <lars.fran...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> we're trying to get HDFS running in Kubernetes using Kerberos.
>> This has some challenges as you might expect.
>> We have created an issue for that including a spike:
>> https://issues.apache.org/jira/browse/HDFS-16577
>>
>> Currently (as of 3.2.2, but reading through the release notes this doesn't
>> seem to have changed since then) DataNodes use the same properties for
>> deciding which port to bind each service to, as for deciding which ports
>> are included in the `DatanodeRegistration` sent to the NameNode. Further,
>> NameNodes overwrite the DataNode's IP address with the incoming address
>> during registration.
>>
>> Both of these prevent external users from connecting to DataNodes that are
>> hosted behind some sort of NAT (such as Kubernetes).
>>
>> We'd go ahead with a proper implementation/PR but we thought about asking
>> for comments/feedback first. Maybe someone else has already done some work
>> here that we might have missed etc.
>>
>> Thank you!
>>
>> Cheers,
>> Lars
>>
>

Reply via email to