Re: [ceph-users] Cluster hang

Matteo Dacrema Thu, 09 Nov 2017 08:20:37 -0800

Update:  I noticed that there was a pg that remained scrubbing from the first 
day I found the issue to when I reboot the node and problem disappeared.
Can this cause the behaviour I described before?



> Il giorno 09 nov 2017, alle ore 15:55, Matteo Dacrema <mdacr...@enter.eu> ha 
> scritto:
> 
> Hi all,
> 
> I’ve experienced a strange issue with my cluster.
> The cluster is composed by 10 HDDs nodes with 20 nodes + 4 journal each plus 
> 4 SSDs nodes with 5 SSDs each.
> All the nodes are behind 3 monitors and 2 different crush maps.
> All the cluster is on 10.2.7 
> 
> About 20 days ago I started to notice that long backups hangs with "task 
> jbd2/vdc1-8:555 blocked for more than 120 seconds” on the HDD crush map.
> About few days ago another VM start to have high iowait without doing iops 
> also on the HDD crush map.
> 
> Today about a hundreds VMs wasn’t able to read/write from many volumes all of 
> them on HDD crush map. Ceph health was ok and no significant log entries were 
> found.
> Not all the VMs experienced this problem and in the meanwhile the iops on the 
> journal and HDDs was very low even if I was able to do significant iops on 
> the working VMs.
> 
> After two hours of debug I decided to reboot one of the OSD nodes and the 
> cluster start to respond again. Now the OSD node is back in the cluster and 
> the problem is disappeared.
> 
> Can someone help me to understand what happened?
> I see strange entries in the log files like:
> 
> accept replacing existing (lossy) channel (new one lossy=1)
> fault with nothing to send, going to standby
> leveldb manual compact 
> 
> I can share all the logs that can help to identify the issue.
> 
> Thank you.
> Regards,
> 
> Matteo
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> --
> Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non 
> infetto.
> Seguire il link qui sotto per segnalarlo come spam: 
> http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=12EAC4481A.A6F60
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster hang

Reply via email to