I also have problems keeping my time in sync on VMWare virtual machines. My problems occurs most when the VM Host is oversubscribed, or when I'm doing stress tests. I ended up disabling ntpd in the guests, and enabled Host Time Sync using the VMWare Guest Tools. All of my VMWare Hosts runs ntpd, using the same ntpd servers.

That's my development cluster. For production, I'm using ntpd on real servers.



*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

On 6/18/13 05:41 , Da Chun wrote:

Thanks! Craig.
umount works.

About the time skew, I saw the log said the time difference should be less than 50ms. I setup one of my nodes as the time server, and the others sync the time with it. I don't know why the system time still changes frequently especially after reboot. Maybe it's because all my nodes are VMware virtual machines. The softclock is not accurate enough.

------------------ Original ------------------
*From: * "Craig Lewis"<[email protected]>;
*Date: * Tue, Jun 18, 2013 05:34 AM
*To: * "ceph-users"<[email protected]>;
*Subject: * Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?

If you followed the standard setup, each OSD is it's own disk + filesystem. /var/lib/ceph/osd/ceph-2 is in use, as the mount point for the OSD.2 filesystem. Double check by examining the output of the `mount` command.

I get the same error when I try to rename a directory that's used as a mount point.

Try `umount /var/lib/ceph/osd/ceph-2` instead of the mv and rm. The fuser command is telling you that the kernel has a filesystem mounted in that directory. Nothing else appears to be using it, so the umount should complete successfully.


Also, you should fix that time skew on mon.ceph-node5. The mailing list archives have several posts with good answers.


On 6/15/2013 2:14 AM, Da Chun wrote:
Hi all,
On Ubuntu 13.04 with ceph 0.61.3.
I want to remove osd.2 from my cluster. The following steps were performed:
root@ceph-node6:~# ceph osd out osd.2
marked out osd.2.
root@ceph-node6:~# ceph -w
   health HEALTH_WARN clock skew detected on mon.ceph-node5
monmap e1: 3 mons at {ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0}, election epoch 124, quorum 0,1,2 ceph-node4,ceph-node5,ceph-node6
   osdmap e414: 6 osds: 5 up, 5 in
pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail
   mdsmap e102: 1/1/1 up {0=ceph-node4=up:active}

2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail
^C
root@ceph-node6:~# stop ceph-osd id=2
ceph-osd stop/waiting
root@ceph-node6:~# ceph osd crush remove osd.2
removed item id 2 name 'osd.2' from crush map
root@ceph-node6:~# ceph auth del osd.2
updated
root@ceph-node6:~# ceph osd rm 2
removed osd.2
root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2.bak mv: cannot move ??/var/lib/ceph/osd/ceph-2?? to ??/var/lib/ceph/osd/ceph-2.bak??: Device or resource busy

Everything was working OK until the last step to remove the osd.2 directory /var/lib/ceph/osd/ceph-2.
root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2
                     USER  PID ACCESS COMMAND
/var/lib/ceph/osd/ceph-2:
root kernel mount /var/lib/ceph/osd/ceph-2 ////////////////// What does this mean?
root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#

I restarted the system, and found that the osd.2 daemon was still running:
root@ceph-node6:~# ps aux | grep osd
root 1264 1.4 12.3 550940 125732 ? Ssl 16:41 0:20 /usr/bin/ceph-osd --cluster=ceph -i 2 -f root 2876 0.0 0.0 4440 628 ? Ss 16:44 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh root 2877 4.9 18.2 613780 185676 ? Sl 16:44 1:04 /usr/bin/ceph-osd --cluster=ceph -i 5 -f

I have to take this workaround:
root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2
rm: cannot remove ??/var/lib/ceph/osd/ceph-2??: Device or resource busy
root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2
root@ceph-node6:~# shutdown -r now
....
root@ceph-node6:~# ps aux | grep osd
root 1416 0.0 0.0 4440 628 ? Ss 17:10 0:00 /bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" -f /bin/sh root 1417 8.9 5.8 468052 59868 ? Sl 17:10 0:02 /usr/bin/ceph-osd --cluster=ceph -i 5 -f
root@ceph-node6:~# rm -r /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#

Any idea? HELP!



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to