Bill Kenworthy <billk <at> iinet.net.au> writes:
> > The main thing keeping me away from CephFS is that it has no mechanism > > for resolving silent corruption. Btrfs underneath it would obviously > > help, though not for failure modes that involve CephFS itself. I'd > > feel a lot better if CephFS had some way of determining which copy was > > the right one other than "the master server always wins." The "Giant" version 0.87 is a major release with many new fixes; it may have the features you need. Currently the ongoing releases are up to : v0.91. The readings look promissing, but I'll agree it needs to be tested with non-critical data. http://ceph.com/docs/master/release-notes/#v0-87-giant http://ceph.com/docs/master/release-notes/#notable-changes > Forget ceph on btrfs for the moment - the COW kills it stone dead after > real use. When running a small handful of VMs on a raid1 with ceph - > sloooooooooooow :) I'm staying away from VMs. It's spark on top of mesos I'm after. Maybe docker or another container solution, down the road. I read where some are using a SSD with raid 1 and bcache to speed up performance and stability a bit. I do not want to add SSD to the mix right now, as the (3) node development systems all have 32 G of ram. > You can turn off COW and go single on btrfs to speed it up but bugs in > ceph and btrfs lose data real fast! Interesting idea, since I'll have raid1 underneath each node. I'll need to dig into this idea a bit more. > ceph itself (my last setup trashed itself 6 months ago and I've given > up!) will only work under real use/heavy loads with lots of discrete > systems, ideally 10G network, and small disks to spread the failure > domain. Using 3 hosts and 2x2g disks per host wasn't near big enough :( > Its design means that small scale trials just wont work. Huh. My systems are FX8350 (8)processors running at 4GHz with 32 G ram. Water coolers will allow me to crank up the speed (when/if needed) to 5 or 6 GHz. Not intel but low end either. > Its not designed for small scale/low end hardware, no matter how > attractive the idea is :( Supposedly there are tool to measure/monitor ceph better now. That is one of the things I need to research. How to manage the small cluster better and back off the throughput/load while monitoring performance on a variety of different tasks. Definitely not a production usage. I certainly appreciate your ceph_experiences. I filed a but with the version request for Giant v0.87. Did your run the 9999 version ? What versions did you experiment with? I hope to set up Anisble to facilitate rapid installations of a variety of gentoo systems used for cluster or ceph testing. That way configurations should be able to "reboot" after bad failures. Did your experienced failures with Ceph require the gentoo-btrfs based systems to be complete reinstalled from scratch, or just purge the disk of Ceph and reconfigure Ceph? I'm hoping to "configure ceph" in such a way that failures do not corrupt the gentoo-btrfs installation and only require repair to ceph; so your comments on that strategy are most welcome. > BillK James >

