Hi Xuefu, I agree for HS2 since HS2 usually runs on a gateway or service node inside the cluster environment. In my case, it is actually additional security. A separate edge node (not running HS2, HS2 runs on another box) is used for HiveCLI. We don't allow data/worker nodes to talk to the edge node on random ports. All ports must be registered or explicitly specified and monitored. That's why I am asking for this feature. Otherwise, opening up 1024-65535 from data/worker node to edge node is actually a bad idea and bad practice for network security. :(
________________________________________ From: Xuefu Zhang <xzh...@cloudera.com> Sent: Monday, October 19, 2015 1:12 PM To: dev@hive.apache.org Subject: Re: Hard Coded 0 to assign RPC Server port number when hive.execution.engine=spark Hi Andrew, RpcServer is an instance launched for each user session. In case of Hive CLI, which is for a single user, what you said makes sense and the port number can be configurable. In the context of HS2, however, there are multiple user sessions and the total is unknown in advance. While +1 scheme works, there can be still a band of ports that might be eventually opened. On a different perspective, we expect that either Hive CLI or HS2 resides on a gateway node, which are in the same network with the data/worker nodes. In this configuration, firewall issue you mentioned doesn't apply. Such configuration is what we usually see in our enterprise customers, which is what we recommend. I'm not sure why you would want your Hive users to launch Hive CLI anywhere outside your cluster, which doesn't seem secure if security is your concern. Thanks, Xuefu On Mon, Oct 19, 2015 at 7:20 AM, Andrew Lee <alee...@hotmail.com> wrote: > Hi All, > > > I notice that in > > > > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > > > The port number is assigned with 0 which means it will be a random port > every time when the RPC Server is created > > to talk to Spark in the same session. > > > Any reason why this port number is not a property to be configured and > follow the same rule as +1 if the port is taken? > > Just like Spark's configuration for Spark Driver, etc.? Because of this, > this is causing problems to configure firewall between the > > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In > other word, users need to open all hive ports range > > from Data Node => HiveCLI (edge node). > > > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer<SocketChannel>() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > group); > saslHandler.rpc = newRpc; > > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > > } > }) > .option(ChannelOption.SO_BACKLOG, 1) > .option(ChannelOption.SO_REUSEADDR, true) > .childOption(ChannelOption.SO_KEEPALIVE, true) > .bind(0) > .sync() > .channel(); > this.port = ((InetSocketAddress) channel.localAddress()).getPort(); > > > Appreciate any feedback, and if a JIRA is required to keep track of this > conversation. Thanks. > >