[
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056524#comment-13056524
]
Mck SembWever commented on CASSANDRA-2388:
------------------------------------------
bq. It looks like there's a ton of effort put in to avoiding making
sortByProximity work w/ non-local nodes
Because it's only when that local node is down that we actually need to sort...
When/if DynamicEndpointSnitch's limitation is fixed (and it can sort by
non-local nodes) then CassandraServer.java need not bypass it. But this won't
simplify the code in CFRR. Now that CFIF supports multiple initialAddresses the
method sortEndpointsByProximity(..) in CFIF can be rewritten (ie any connection
to any initialAddress is all we need, no need to mess around with trying to
connect through replica's to find information about replicas...)
bq. Wait, why do we even care? "local node" IS the right host to sort against
No. "initialAddress" is the right node to sort against. And it should be "local
node". And then we don't care about the replica.
But when "initialAddress" is down, then we randomly connect to another c* node
so to find out of the replica we know about which are 1) up, 2) closest, and 3)
in the same dc. Then it is a random c* node that becomes the "local node" and
the call needs to be {{snitch.sortByProximity(initialAddress, addresses)}}.
But yes... the CFRR code is contorted. In many ways i prefer the simplicity of
the first patch (both in api and in implementation) despite it not being "as
correct". i thought of this "fallback to replica" as a last resort to keep the
m/r job running, rather than an actively used feature where
DynamicEndpointSnitch's scores will maximise performance. But then i'm only
thinking in terms of a small c* cluster and i certainly am naive about what
performance gains these scores can give...
> ColumnFamilyRecordReader fails for a given split because a host is down, even
> if records could reasonably be read from other replica.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.7.6, 0.8.0
> Reporter: Eldon Stegall
> Assignee: Jeremy Hanna
> Labels: hadoop, inputformat
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 0002_On_TException_try_next_split.patch,
> CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We
> should try multiple locations for a given split.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira