[
https://issues.apache.org/jira/browse/CASSANDRA-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612492#comment-14612492
]
Aleksey Yeschenko commented on CASSANDRA-9603:
----------------------------------------------
bq. I'm not a reviewer. Wrong jon. =)
That's what I get for switching to something else during field autocomplete (:
> Expose private listen_address in system.local
> ---------------------------------------------
>
> Key: CASSANDRA-9603
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9603
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Piotr Kołaczkowski
> Assignee: Carl Yeksigian
> Fix For: 2.2.0 rc2, 2.1.8, 2.0.17
>
>
> We had some hopes CASSANDRA-9436 would add it, yet it added rpc_address
> instead of both rpc_address *and* listen_address. We really need
> listen_address here, because we need to get information on the private IP C*
> binds to. Knowing this we could better match Spark nodes to C* nodes and
> process data locally in environments where rpc_address != listen_address like
> EC2.
> See, Spark does not know rpc addresses nor it has a concept of broadcast
> address. It only knows the hostname / IP its workers bind to. In case of
> cloud environments, these are private IPs. Now if we give Spark a set of C*
> nodes identified by rpc_addresses, Spark doesn't recognize them as belonging
> to the same cluster. It treats them as "remote" nodes and has no idea where
> to send tasks optimally.
> Current situation (example):
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com,
> 10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [node1.blah.ec2.com,
> node2.blah.ec2.com, node3.blah.ec2.com]
> What the application sends to Spark for execution:
> Task1 - please execute on node1.blah.ec2.com
> Task2 - please execute on node2.blah.ec2.com
> Task3 - please execute on node3.blah.ec2.com
> How Spark understands it: "I have no idea what node1.blah.ec2.com is, let's
> assign Task1 it to a *random* node" :(
> Expected:
> Spark worker nodes: [10.0.0.1, 10.0.0.2, 10.0.0.3]
> C* nodes: [10.0.0.1 / node1.blah.ec2.com, 10.0.0.2 / node2.blah.ec2.com,
> 10.0.0.3 / node3.blah.ec2.com]
> What the application knows about the cluster: [10.0.0.1 / node1.blah.ec2.com,
> 10.0.0.2 / node2.blah.ec2.com, 10.0.0.3 / node3.blah.ec2.com]
> What the application sends to Spark for execution:
> Task1 - please execute on node1.blah.ec2.com or 10.0.0.1
> Task2 - please execute on node2.blah.ec2.com or 10.0.0.2
> Task3 - please execute on node3.blah.ec2.com or 10.0.0.3
> How Spark understands it: "10.0.0.1? - I have a worker on that node, lets put
> Task 1 there"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)