[ 
https://issues.apache.org/jira/browse/CASSANDRA-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870934#action_12870934
 ] 

Stu Hood commented on CASSANDRA-1124:
-------------------------------------

On closer inspection, there doesn't appear to be any way to specify the 
rack/datacenter from the InputFormat. Hadoop uses a DNSToSwitchMapping to 
resolve a hostname's rack location: implementations don't always use DNS, but 
they always run on the JobTracker.

The options for optimally running Hadoop and Cassandra together appear to be: 
run Hadoop JobTrackers on all of the Cassandra nodes (no need for datanodes) or 
extend/script a DNSToSwitchMapping that makes RPC calls to Cassandra nodes for 
EndPointSnitch information.

> Improve Cassandra to MapReduce locality sharing
> -----------------------------------------------
>
>                 Key: CASSANDRA-1124
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1124
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Priority: Minor
>
> Currently, the hadoop integration only passes the data's local node 
> information (ColumnFamilyRecordReader-RowIterator-getLocation).  Hadoop can 
> take advantage of full locality and it's possible that we have full locality 
> configured in Cassandra.
> So this improvement is for adding the full locality of the data into the 
> String in a way that hadoop can make use of it with its Job/Task Trackers.
> This will allow for jobs to be potentially on the same rack and/or datacenter 
> if possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to