I Forgot to say that we use version 1.2.2 (we'll update soon but I didn't see 
any change about that in CHANGES.txt)
-- 
Cyril SCETBON

On 27 Jan 2014, at 12:01, Cyril Scetbon <cyril.scet...@free.fr> wrote:

> Hi,
> 
> When one node has crashed for system reasons, it takes more than an hour to 
> come back in the ring. During this time, no other node sees it :
> 
> Datacenter: b1
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns   Host ID                       
>         Rack
> DN  XXXXXXXXXX     ?          256     3.8%   
> 7b3d0ac4-bdf6-4e09-8a11-9794b1481c95  b05
> DN  XXXXXXXXXX     ?          256     3.1%   
> 3a1172df-0260-4398-a008-05dc77e9f763  c03
> DN  XXXXXXXXXX     ?          256     3.7%   
> 9e3cfd48-5697-4150-898e-b176d0eed4a0  b05
> DN  XXXXXXXXXX     ?          256     3.7%   
> 347df11c-0d83-429c-a7a0-8d20c21a075a  c09
> DN  XXXXXXXXXX     ?          256     3.8%   
> d4083488-c614-4786-851b-e50a407d61a9  c03
> DN  XXXXXXXXXX     ?          256     3.7%   
> 5a50d537-08fb-48cb-b8a0-829acb05b72e  b08
> DN  XXXXXXXXXX     ?          256     3.6%   
> a309c0da-aee8-4fed-aa9c-16ae103e42d3  c09
> DN  XXXXXXXXXX     ?          256     3.5%   
> 41ff6e09-fb84-46f5-9efd-33f6ade49d7f  b08
> DN  XXXXXXXXXX     ?          256     3.2%   
> ad3ba9a2-5fe4-4208-b5ae-4f1a40942bb9  b08
> DN  XXXXXXXXXX     ?          256     3.4%   
> 40140f99-e1b0-4fe0-93d2-cafdde05151f  c09
> DN  XXXXXXXXXX     ?          256     3.4%   
> f0c37b06-a335-49ab-819f-603945507ee9  b05
> DN  XXXXXXXXXX     ?          256     3.4%   
> ef1df7f6-5ae9-4ebf-bb14-e1373fc451ea  c03
> Datacenter: s1
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns   Host ID                       
>         Rack
> DN  XXXXXXXXXX     ?          256     3.5%   
> bbfbc2bb-dbee-4221-804d-9cc0760cc440  k09
> UN  XXXXXXXXXX     113.13 GB  256     3.5%   
> f6e41cf7-fffa-4a24-bc2d-0325051afd8f  h05
> DN  XXXXXXXXXX     ?          256     3.8%   
> 3f66cbb2-427b-4bb0-8521-9789ce4358fa  h05
> DN  XXXXXXXXXX     ?          256     3.5%   
> c4763e28-48cf-4581-b576-6c0b06924ec6  h05
> DN  XXXXXXXXXX     ?          256     3.2%   
> 8edb1155-990a-4946-a251-bb4cb4c59552  b05
> DN  XXXXXXXXXX     ?          256     4.2%   
> 695adecd-4d49-412b-94db-cf695e3b5298  h05
> DN  XXXXXXXXXX     ?          256     3.8%   
> 60b9b784-25ec-4a5a-ac76-d1c19bb2be72  c05
> DN  XXXXXXXXXX     ?          256     3.8%   
> 3cf22978-c8d9-474e-8f6d-dbbcb1d7784e  b05
> DN  XXXXXXXXXX     ?          256     3.5%   
> 4cfb5924-ea62-465b-8e39-b0b77a809422  k09
> DN  XXXXXXXXXX     ?          256     3.3%   
> 08d99c7c-fb6b-4731-8408-27afb6aa79e5  k09
> DN  XXXXXXXXXX     ?          256     3.1%   
> 1fb09426-3191-46f5-ab54-c2b6e980fcfe  k09
> DN  XXXXXXXXXX     ?          256     3.5%   
> 79f64055-2681-43d2-a8e3-375ca9d6b771  c05
> DN  XXXXXXXXXX     ?          256     3.7%   
> 88a8c59e-4dc9-47b2-b7d7-bb422199fa76  b05
> DN  XXXXXXXXXX     ?          256     3.7%   
> 1d6ef3e5-76bc-4cac-9151-bbfd5b5e7e0e  c05
> DN  XXXXXXXXXX     ?          256     3.4%   
> 79cf98d7-3bfe-4a94-97bd-95837dbe7623  c05
> DN  XXXXXXXXXX     ?          256     4.1%   
> 541cd94b-1f94-47a4-83d3-66ed3ffe222d  b05
> 
> there is nothing noticeable in the logs even if debug mode :
> 
>  INFO [main] 2014-01-27 10:00:21,706 TServerCustomFactory.java (line 47) 
> Using synchronous/threadpool thrift server on 0.0.0.0 : 9160
>  INFO [Thread-8] 2014-01-27 10:00:21,707 ThriftServer.java (line 110) 
> Listening for thrift clients...
>  WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,765 
> Password4LevelAuthenticator.java (line 205) PasswordAuthenticator skipped 
> default user setup: some nodes were not ready
>  WARN [NonPeriodicTasks:1] 2014-01-27 10:00:31,794 Auth.java (line 207) 
> Skipped default superuser setup: some nodes were not ready
> 
> Top threads are RMI threads :
> 
> <Screen Shot 2014-01-27 at 11.23.29.png>
> 
> and more than one hour later we see :
> 
> DEBUG [Thread-3964] 2014-01-27 11:24:18,856 IncomingTcpConnection.java (line 
> 75) Connection version 6 from /XXXXXXXXXX
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line 
> 112) Upgrading incoming connection to be compressed
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 IncomingTcpConnection.java (line 
> 120) Max version for /XXXXXXXXXX is 6
> DEBUG [Thread-3964] 2014-01-27 11:24:18,857 MessagingService.java (line 805) 
> Setting version 6 for /XXXXXXXXXX
> DEBUG [Thread-3964] 2014-01-27 11:24:18,858 IncomingTcpConnection.java (line 
> 129) set version for /XXXXXXXXXX to 6
> DEBUG [Thread-3964] 2014-01-27 11:24:18,862 MessagingService.java (line 812) 
> Reseting version for /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line 
> 75) Connection version 6 from /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,867 IncomingTcpConnection.java (line 
> 112) Upgrading incoming connection to be compressed
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line 
> 120) Max version for /XXXXXXXXXX is 6
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 MessagingService.java (line 805) 
> Setting version 6 for /XXXXXXXXXX
> DEBUG [Thread-3965] 2014-01-27 11:24:18,869 IncomingTcpConnection.java (line 
> 129) set version for /XXXXXXXXXX to 6
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,876 Gossiper.java (line 722) 
> Clearing interval times for /XXXXXXXXXX due to generation change
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) 
> Clearing interval times for /XXXXXXXXXX due to generation change
> DEBUG [GossipStage:1] 2014-01-27 11:24:18,878 Gossiper.java (line 722) 
> Clearing interval times for /XXXXXXXXXX due to generation change
> 
> We meet this issue only when the system crashes
> 
> any idea of a possible origin or a known behaviour ?
>  -- 
> Cyril SCETBON
> 

Reply via email to