I will try to reproduce problem on smaller test cluster.
It was rather easy, cluster contains 4 servers.
Log's fragment from restarted node (10.2.3.38):

DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 CassandraServer.java (line 214) get_slice DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 StorageProxy.java (line 239) weakreadlocal reading SliceFromReadCommand(table='Keyspace1', key='0000000000000000000000000000000000849706', column_parent='QueryPath(columnFamilyName='Super1', superColumnName='[...@6ca50fbe', columnName='null')', start='1', finish='0', reversed=true, count=2) DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 StorageProxy.java (line 251) weakreadremote reading SliceFromReadCommand(table='Keyspace1', key='0000000000000000000000000000000000849706', column_parent='QueryPath(columnFamilyName='Super1', superColumnName='[...@6ca50fbe', columnName='null')', start='1', finish='0', reversed=true, count=2) from [email protected]:7000
...
ERROR [pool-1-thread-64] 2009-10-15 14:18:21,281 Cassandra.java (line 679) Internal error processing get_slice java.lang.RuntimeException: error reading key 0000000000000000000000000000000000849706 at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:265) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:312) at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:95) at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:177) at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:252) at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:215) at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
    at 
org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: java.util.concurrent.TimeoutException: Operation timed out.
    at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:261)
    ... 11 more

Log's fragment from 10.2.3.40:
DEBUG [ROW-READ-STAGE:4] 2009-10-15 14:18:16,308 ReadVerbHandler.java (line 100) Read key 0000000000000000000000000000000000849706; sending response to [email protected]:7000
....
DEBUG [CONSISTENCY-MANAGER:2] 2009-10-15 14:18:16,308 ConsistencyManager.java (line 168) Reading consistency digest for 0000000000000000000000000000000000849706 from 527...@[10.3.2.39:7000, 10.3.2.41:7000]

I have full logs, but they are about half of gigabyte for each node. If it's needed I can put them somewhere accessible by http.

How to reproduce:
- configure cluster for 4 nodes, changes in storage-conf.xml:
  <ReplicationFactor>3</ReplicationFactor>
  <FlushMinThreads>8</FlushMinThreads>
  <FlushMaxThreads>16</FlushMaxThreads>
- edit attached scripts with correct node's IPs
- run  perl writecluster.pl -c 8 and wait for 10-20 minutes
- run  perl readcluster.pl
- look at error :)

--
Teodor Sigaev                                   E-mail: [email protected]
                                                   WWW: http://www.sigaev.ru/

Attachment: writecluster.pl
Description: Perl program

Attachment: readcluster.pl
Description: Perl program

Reply via email to