[ 
https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989032#comment-12989032
 ] 

Aaron Morton commented on CASSANDRA-2081:
-----------------------------------------

I've sort of stumbled onto something similar with an 0.7 install. I need to go 
home now so cannot dig any deeper and rule out human error, but this is what I 
have.

5 node 0.7.0 install

1) Load data in using

python stress.py -d jb-cass1,jb-cass2,jb-cass3,jb-cass4,jb-cass5 -o insert -n 
1000000 -e QUORUM -t 10 -i 1 -l 3
(use all 5 users, insert 1,000,000 rows with RF 3 and QUORUM and 10 threads, 
report progress every second)

2) Read back using 

python stress.py -d jb-cass2,jb-cass3,jb-cass4,jb-cass5 -o read -n 1000000 -e 
QUORUM -t 10 -i 1
(note that jb-cass1 is removed from the list)

3) make big bang

Once the read has run a few seconds I ran "reboot -f" on node 1. I expect the 
read operations to complete, output was 

11270,1315,1315,0.00839671943578,9
11631,361,361,0.00746133188792,11
11631,0,0,NaN,12
11631,0,0,NaN,13
11631,0,0,NaN,14
11631,0,0,NaN,15
11631,0,0,NaN,16
11631,0,0,NaN,17
11631,0,0,NaN,18
11631,0,0,NaN,19
Process Reader-10:
Traceback (most recent call last):
  File "/vol/apps/python-2.6.4_64/lib/python2.6/multiprocessing/process.py", 
line 232, in _bootstrap
    self.run()
  File "stress.py", line 279, in run
    r = self.cclient.get_slice(key, parent, p, consistency)
  File 
"/local1/frameworks/cassandra/apache-cassandra-0.7.0-src/contrib/py_stress/cassandra/Cassandra.py",
 line 432, in get_slice
    return self.recv_get_slice()
  File 
"/local1/frameworks/cassandra/apache-cassandra-0.7.0-src/contrib/py_stress/cassandra/Cassandra.py",
 line 462, in recv_get_slice
    raise result.te

All clients died. stress.py is not setting a timeout on the thrift socket, so 
am guessing this is server side.

I was running DEBUG on all the nodes (but had turned off the line numbers), 
this is from one. the 114.63 machine is obviously the one I killed. 


DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 StorageService.java (line 
org.apache.cassandra.service.StorageService) Sorted endpoints are 
/192.168.114.63,jb08.wetafx.co.nz/192.168.114.67,/192.168.114.64
DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 QuorumResponseHandler.java 
(line org.apache.cassandra.service.QuorumResponseHandler) QuorumResponseHandler 
blocking for 2 responses
DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,186 StorageProxy.java (line 
org.apache.cassandra.service.StorageProxy) strongread reading digest for 
SliceFromReadCommand(table='Keyspace1', key='30323334343534', 
column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=5) from 
[email protected]/192.168.114.67
DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,187 StorageProxy.java (line 
org.apache.cassandra.service.StorageProxy) strongread reading data for 
SliceFromReadCommand(table='Keyspace1', key='30323334343534', 
column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=5) from 
6623@/192.168.114.63
DEBUG [pool-1-thread-2] 2011-02-01 17:14:08,187 StorageProxy.java (line 
org.apache.cassandra.service.StorageProxy) strongread reading digest for 
SliceFromReadCommand(table='Keyspace1', key='30323334343534', 
column_parent='QueryPath(columnFamilyName='Standard1', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=5) from 
6624@/192.168.114.64
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line 
org.apache.cassandra.db.filter.SliceQueryFilter) collecting 0 of 5: 
4330:false:34@1296532428248604
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line 
org.apache.cassandra.db.filter.SliceQueryFilter) collecting 1 of 5: 
4331:false:34@1296532428248637
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line 
org.apache.cassandra.db.filter.SliceQueryFilter) collecting 2 of 5: 
4332:false:34@1296532428248640
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line 
org.apache.cassandra.db.filter.SliceQueryFilter) collecting 3 of 5: 
4333:false:34@1296532428248642
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 SliceQueryFilter.java (line 
org.apache.cassandra.db.filter.SliceQueryFilter) collecting 4 of 5: 
4334:false:34@1296532428248656
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 ReadVerbHandler.java (line 
org.apache.cassandra.db.ReadVerbHandler) digest is 
220b82e28c2bb4be869c168243d75f01
DEBUG [ReadStage:19] 2011-02-01 17:14:08,187 ReadVerbHandler.java (line 
org.apache.cassandra.db.ReadVerbHandler) Read key 30323334343534; sending 
response to 
[email protected]/192.168.114.67
DEBUG [RequestResponseStage:13] 2011-02-01 17:14:08,188 
ResponseVerbHandler.java (line org.apache.cassandra.net.ResponseVerbHandler) 
Processing response on a callback from 
[email protected]/192.168.114.67
DEBUG [RequestResponseStage:13] 2011-02-01 17:14:08,188 
ReadResponseResolver.java (line 
org.apache.cassandra.service.ReadResponseResolver) Preprocessed digest response
DEBUG [RequestResponseStage:16] 2011-02-01 17:14:08,188 
ResponseVerbHandler.java (line org.apache.cassandra.net.ResponseVerbHandler) 
Processing response on a callback from 
7D8FA1FD-A2FE-6A54-7BB0-3B129206D1E1@/192.168.114.64
DEBUG [RequestResponseStage:16] 2011-02-01 17:14:08,188 
ReadResponseResolver.java (line 
org.apache.cassandra.service.ReadResponseResolver) Preprocessed digest response
 INFO [ScheduledTasks:1] 2011-02-01 17:14:15,438 Gossiper.java (line 
org.apache.cassandra.gms.Gossiper) InetAddress /192.168.114.63 is now dead.
DEBUG [ScheduledTasks:1] 2011-02-01 17:14:15,440 MessagingService.java (line 
org.apache.cassandra.net.MessagingService) Resetting pool for /192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:14:17,442 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [pool-1-thread-1] 2011-02-01 17:14:18,183 CassandraServer.java (line 
org.apache.cassandra.thrift.CassandraServer) ... timed out
DEBUG [pool-1-thread-2] 2011-02-01 17:14:18,189 CassandraServer.java (line 
org.apache.cassandra.thrift.CassandraServer) ... timed out
DEBUG [pool-1-thread-1] 2011-02-01 17:14:18,232 ClientState.java (line 
org.apache.cassandra.service.ClientState) logged out: #<User allow_all 
groups=[]>
DEBUG [pool-1-thread-2] 2011-02-01 17:14:18,232 ClientState.java (line 
org.apache.cassandra.service.ClientState) logged out: #<User allow_all 
groups=[]>
DEBUG [ScheduledTasks:1] 2011-02-01 17:14:29,811 StorageLoadBalancer.java (line 
org.apache.cassandra.service.StorageLoadBalancer) Disseminating load info ...
DEBUG [ScheduledTasks:1] 2011-02-01 17:15:29,814 StorageLoadBalancer.java (line 
org.apache.cassandra.service.StorageLoadBalancer) Disseminating load info ...
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:15:53,637 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:16:08,667 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:16:23,697 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [ScheduledTasks:1] 2011-02-01 17:16:29,816 StorageLoadBalancer.java (line 
org.apache.cassandra.service.StorageLoadBalancer) Disseminating load info ...
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:16:36,723 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:16:50,751 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:17:03,775 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:17:19,807 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [ScheduledTasks:1] 2011-02-01 17:17:29,818 StorageLoadBalancer.java (line 
org.apache.cassandra.service.StorageLoadBalancer) Disseminating load info ...
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:17:32,834 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:17:42,852 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63
DEBUG [WRITE-/192.168.114.63] 2011-02-01 17:17:56,880 
OutboundTcpConnection.java (line 
org.apache.cassandra.net.OutboundTcpConnection) attempting to connect to 
/192.168.114.63


need to leave work now so may not be able to get further into this until 
tomorrow. 

> Consistency QUORUM does not work anymore (hector:Could not fullfill request 
> on this host)
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2081
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: linux, hector + cassandra
>            Reporter: Thibaut
>            Priority: Blocker
>             Fix For: 0.7.1
>
>
> I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25.
> Using consistency level Quorum won't work anymore (tested it on read). 
> Consisteny level ONE still works though
> I have tried this with one dead node in my cluster.
> If I restart cassandra with an older svn revision 
> (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with 
> consistency level QUORUM again, while still using 
> apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application.
> 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed 
> intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open 
> transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: 
> No route to host
> 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host 
> retry status false with host: intr1n18(192.168.0.18):9160
> 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill 
> request on this host CassandraClient<intr1n11:9160-483>
> intr1n11 is marked as up however and I can also access the node through the 
> cassandra cli.
> 192.168.0.1     Up     Normal  8.02 GB         5.00%   0cc
> 192.168.0.2     Up     Normal  7.96 GB         5.00%   199
> 192.168.0.3     Up     Normal  8.24 GB         5.00%   266
> 192.168.0.4     Up     Normal  4.94 GB         5.00%   333
> 192.168.0.5     Up     Normal  5.02 GB         5.00%   400
> 192.168.0.6     Up     Normal  5 GB            5.00%   4cc
> 192.168.0.7     Up     Normal  5.1 GB          5.00%   599
> 192.168.0.8     Up     Normal  5.07 GB         5.00%   666
> 192.168.0.9     Up     Normal  4.78 GB         5.00%   733
> 192.168.0.10    Up     Normal  4.34 GB         5.00%   7ff
> 192.168.0.11    Up     Normal  5.01 GB         5.00%   8cc
> 192.168.0.12    Up     Normal  5.31 GB         5.00%   999
> 192.168.0.13    Up     Normal  5.56 GB         5.00%   a66
> 192.168.0.14    Up     Normal  5.82 GB         5.00%   b33
> 192.168.0.15    Up     Normal  5.57 GB         5.00%   c00
> 192.168.0.16    Up     Normal  5.03 GB         5.00%   ccc
> 192.168.0.17    Up     Normal  4.77 GB         5.00%   d99
> 192.168.0.18    Down   Normal  ?               5.00%   e66
> 192.168.0.19    Up     Normal  4.78 GB         5.00%   f33
> 192.168.0.20    Up     Normal  4.83 GB         5.00%   ffffffffffffffff

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to