[
https://issues.apache.org/jira/browse/CASSANDRA-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870877#action_12870877
]
Jeremy Hanna commented on CASSANDRA-1124:
-----------------------------------------
Based on the InputSplit.getLocations documentation, it looks like it is just a
list of hostnames.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/InputSplit.html
But we're trying to see where they try to find nearby nodes if it can't use the
node the data is local to - i.e. in the same rack or datacenter.
> Improve Cassandra to MapReduce locality sharing
> -----------------------------------------------
>
> Key: CASSANDRA-1124
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1124
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Reporter: Jeremy Hanna
> Priority: Minor
>
> Currently, the hadoop integration only passes the data's local node
> information (ColumnFamilyRecordReader-RowIterator-getLocation). Hadoop can
> take advantage of full locality and it's possible that we have full locality
> configured in Cassandra.
> So this improvement is for adding the full locality of the data into the
> String in a way that hadoop can make use of it with its Job/Task Trackers.
> This will allow for jobs to be potentially on the same rack and/or datacenter
> if possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.