Re: [ceph-users] Ceph Cluster Failures

Christian Balzer Thu, 16 Mar 2017 20:26:09 -0700

Hello,

On Fri, 17 Mar 2017 02:51:48 +0000 Rich Rocque wrote:


> Hi,
> 
> 
> I talked with the person in charge about your initial feedback and questions. 
> The thought is to switch to a new setup and I was asked to pass it on and ask 
> for thoughts on whether this would be sufficient or not.
>
I assume from the new setup that the current problematic one is also on
AWS, so I'd advice to do a proper analysis there before moving to
something "new".

If you search the ML archives you'll find (few) others that have done
similar things and as far as I can recall none were particular successful.

A virtualized Ceph is going to be harder to get "right" than a HW based
one, doubly so when dealing with AWS network vagaries. 
I'm unsure if an AWS region can consist of multiple DCs, if so the
latencies when doing writes would be bad, but then again it seems your use
case is very read-heavy.

That all said, the specs for your proposal look good from a (virtual) HW
perspective. 

Christian
 
> 
> Use case:
> Overview: Need to provide shared storage/high-availability for (usually) 
> low-volume web server instances using distributed, POSIX-compliant 
> filesystem, running in Amazon Web Services. Database storage is not part of 
> the cluster.
> Logic: We know Ceph is probably overkill for our current use (and probably 
> also for my future use), so why Ceph? It’s performance, when using CephFS, 
> and its ability to support RBD (if we ever move to a container approach for 
> web servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both 
> NFS and native client), and because of the number of small files we’re 
> working with, something that takes ~15sec. in Ceph takes several minutes 
> using other NFS or GlusterFS solutions.
> Current Load: ~100 connected clients accessing ~20GB data of e-commerce 
> related website source software.
> Expected Future Load: ~5,000 connected clients access ~1TB data
> 
> Ceph Clients:
> Primary Role: Web server & load balancer w/ SSL termination
> Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per 
> website/domain/subdomain: 2ea t2.nano instances, load balanced behind 
> haproxy, rarely manually-scaling up with new instances during expected load 
> spikes. After initial “hits,” most of the website stays in local cache, 
> resulting in generally-few iops against the Ceph cluster.)
> 
> Ceph Clusters:
> Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability 
> Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per 
> cluster.
> Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, 
> “up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root 
> (not provisioned-IOPS), Ubuntu 16.04 LTS
> Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up 
> to 10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not 
> provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, 
> ephemeral storage for OSD (co-locating journal and data)
> 
> Proposed Layout:
> AZ “A”:
> 
>   *   Server A-MM (r4.large instance):
>      *   Mon.A & MDS.A for Cluster X
>      *   Mon.A & MDS.A for Cluster Y
>      *   Mon.A & MDS.A for Cluster Z
>   *   Server A-OSD-1 (i3.large instance):
>      *   OSD.0 for Cluster X
>   *   Server A-OSD-2 (i3.large instance):
>      *   OSD.0 for Cluster Z
> 
> 
> AZ “B”:
> 
>   *   Server B-MM (r4.large instance):
>      *   Mon.B & MDS.B for Cluster X
>      *   Mon.B & MDS.B for Cluster Y
>      *   Mon.B & MDS.B for Cluster Z
>   *   Server B-OSD-1 (i3.large instance):
>      *   OSD.1 for Cluster X
>   *   Server B-OSD-2 (i3.large instance):
>      *   OSD.0 for Cluster Y
> 
> 
> AZ “C”:
> 
>   *   Server C-MM (r4.large instance):
>      *   Mon.B & MDS.B for Cluster X
>      *   Mon.B & MDS.B for Cluster Y
>      *   Mon.B & MDS.B for Cluster Z
>   *   Server C-OSD-1 (i3.large instance):
>      *   OSD.1 for Cluster Y
>   *   Server C-OSD-2 (i3.large instance):
>      *   OSD.1 for Cluster Z
> 
> 
> Alternative Layout:
> Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per 
> cluster for higher availability at the expense of disk read-write 
> performance, and increase the number of clusters to 4.
> 
> 
> Thank you for your time,
> 
> Rich
> 
> ________________________________
> From: Christian Balzer <ch...@gol.com>
> Sent: Thursday, March 16, 2017 2:30:49 AM
> To: Ceph Users
> Cc: Robin H. Johnson; Rich Rocque
> Subject: Re: [ceph-users] Ceph Cluster Failures
> 
> 
> Hello,
> 
> On Thu, 16 Mar 2017 02:44:29 +0000 Robin H. Johnson wrote:
> 
> > On Thu, Mar 16, 2017 at 02:22:08AM +0000, Rich Rocque wrote:  
> > > Has anyone else run into this or have any suggestions on how to remedy 
> > > it?  
> > We need a LOT more info.
> >  
> Indeed.
> 
> > > After a couple months of almost no issues, our Ceph cluster has
> > > started to have frequent failures. Just this week it's failed about
> > > three times.
> > >
> > > The issue appears to be than an MDS or Monitor will fail and then all
> > > clients hang. After that, all clients need to be forcibly restarted.  
> > - Can you define monitor 'failing' in this case?
> > - What do the logs contain?
> > - Is it running out of memory?
> > - Can you turn up the debug level?
> > - Has your cluster experienced continual growth and now might be
> >   undersized in some regard?
> >  
> A single MON failure should not cause any problems to boot.
> 
> "ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.
> 
> > > The architecture for our setup is:  
> > Are these virtual machines? The overall specs seem rather like VM
> > instances rather than hardware.
> >  
> There are small servers like that, but a valid question indeed.
> In particular, if it is dedicated HW, FULL specs.
> 
> > > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers  
> > What sort of SSD are the monitor datastores on? ('mon data' in the
> > config)
> >  
> He doesn't mention SSDs in the MON/MDS context, so we could be looking at
> something even slower. FULL SPECS.
> 
> 4GB RAM would be fine for a single MON, but combined with MDS it may
> be a bit tight.
> 
> > > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers  
> > 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
> > How many OSD servers, what SSDs?
> >  
> I think he means 12 individual servers. Again, there are micro servers
> like that around, like:
> https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm
> Super Micro Computer, Inc. - Products | SuperServers | 2U 
> ...<https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm>
> www.supermicro.com.tw
> 2U Black Chassis : Backplane: BPN-SAS-217HQ: 1: 24-port 2U Twin^2 CSE-217HQ 
> (6 drives per node) backplane, support up to 24x 2.5-inch SAS/SATA HDD: 
> Backplane
> 
> 
> 
> 
> IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of
> OS _and_ OSD is way too little for my taste and experience.
> 
> Christian
> 
> > What is the network setup & connectivity between them (hopefully
> > 10Gbit).
> >  
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com    Global OnLine Japan/Rakuten Communications
> http://www.gol.com/


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Cluster Failures

Reply via email to