[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104179#comment-13104179
 ] 

T Jake Luciani commented on CASSANDRA-2388:
-------------------------------------------

I just want to confirm what this ticket is about.

The JT has a list of endpoints for a given split.
When a task runs it may or may not be on one of those nodes 
If other tasks are running on all those replicas the JT may put them on a 
remote node.

So we need to decide which endpoint to connect to given the chance that nodes 
are down.

1. Check if the node running CFRR is one of the replicas (we have this) this 
means JT has assigned a data-local task (good)
2. If none of these nodes are local then pick another.
3. If connection fails try the one other nodes.
4. Try to avoid endpoints in a different DC.

The biggest problem is 4.  Maybe the way todo this is change getSplits logic to 
never return replicas in another DC.  I think this would require adding DC info 
to the describe_ring call.  Then we only need to worry about 1-3.








> ColumnFamilyRecordReader fails for a given split because a host is down, even 
> if records could reasonably be read from other replica.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.6
>            Reporter: Eldon Stegall
>            Assignee: Mck SembWever
>              Labels: hadoop, inputformat
>             Fix For: 0.8.6
>
>         Attachments: 0002_On_TException_try_next_split.patch, 
> CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
> CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
> CASSANDRA-2388.patch
>
>
> ColumnFamilyRecordReader only tries the first location for a given split. We 
> should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to