[
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046450#comment-13046450
]
Mck SembWever commented on CASSANDRA-2388:
------------------------------------------
I have tested this now on data w/ RF=2.
Seems to work ~ok as far as i can see.
One side-effect of this patch is where once one could configure
ConfigHelper.setInitialAddress(conf, "localhost") this will no longer work for
tasks trying to run on the down node.
ColumnFamilyRecordReader.getLocations() will ConnectException trying to call
describe_datacenter(..). This will lead to the task failing. Hadoop re-runs the
task then on another node and eventually the job will complete. But the fall
back to replica never is used.
If the initialAddress is hardcoded to one node then we no longer have a
decentralised job.
I would like to allow a comma-separated in initialAddress, for example it could
be "localhost, node01, node02, node03". This would give preference to localhost
and avoid any centralisation.
I would also like to make ColumnFamilyRecordReader.getLocations() return an
iterator instead of an array.
The createConnection(..) and client.describe_datacenter(..) calls are an
unnecessary overhead when all nodes (or first endpoint location) are up, and
could be avoided by lazy-loading the list.
> ColumnFamilyRecordReader fails for a given split because a host is down, even
> if records could reasonably be read from other replica.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Reporter: Eldon Stegall
> Assignee: Mck SembWever
> Labels: hadoop, inputformat
> Fix For: 0.8.1
>
> Attachments: 0002_On_TException_try_next_split.patch,
> CASSANDRA-2388.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We
> should try multiple locations for a given split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira