We are trying to secure our HDFS installation by blocking all the ports that HDFS requires to the outside world.
Unfortunately it's not possible to give our machines private IPs (... dont ask me why... ). So we were starting to
compile a list of ports that HDFS uses, so we can specifically block traffic to these ports. So far we found that we can
configure the following ports:
dfs.datanode.http.address – 50075
dfs.datanode.address – 50010
dfs.datanode.ipc.address – 50020
however we found via netstat -ltp that the HDFS datanode also listens on another random port and so far we've been
unable to determine what that port is used for and how to configure it to be on a fixed port. Can anyone help with this?