I suspect a config like this where you only have 3 OSDs per node would be more manageable than something denser.

IE theoretically a single E5-2697v3 is enough to run 36 OSDs in a 4U super micro chassis for a semi-dense converged solution. You could attempt to restrict the OSDs to one socket and then use a second E5-2697v3 for VMs. Maybe after you've got cgroups setup properly and if you've otherwise balanced things it would work out ok. I question though how much you really benefit by doing this rather than running a 36 drive storage server with lower bin CPUs and a 2nd 1U box for VMs (which you don't need as many of because you can dedicate both sockets to VMs).

It probably depends quite a bit on how memory, network, and disk intensive the VMs are, but my take is that it's better to error on the side of simplicity rather than making things overly complicated. Every second you are screwing around trying to make the setup work right eats into any savings you might gain by going with the converged setup.

Mark

On 03/26/2015 10:12 AM, Quentin Hartman wrote:
I run a converged openstack / ceph cluster with 14 1U nodes. Each has 1
SSD (os / journals), 3 1TB spinners (1 OSD each), 16 HT cores, 10Gb NICs
for ceph network, and 72GB of RAM. I configure openstack to leave 3GB of
RAM unused on each node for OSD / OS overhead. All the VMs are backed by
ceph volumes and things generally work very well. I would prefer a
dedicated storage layer simply because it seems more "right", but I
can't say that any of the common concerns of using this kind of setup
have come up for me. Aside from shaving off that 3GB of RAM, my
deployment isn't any more complex than a split stack deployment would
be. After running like this for the better part of a year, I would have
a hard time honestly making a real business case for the extra hardware
a split stack cluster would require.

QH

On Thu, Mar 26, 2015 at 6:57 AM, Mark Nelson <mnel...@redhat.com
<mailto:mnel...@redhat.com>> wrote:

    It's kind of a philosophical question.  Technically there's nothing
    that prevents you from putting ceph and the hypervisor on the same
    boxes. It's a question of whether or not potential cost savings are
    worth increased risk of failure and contention.  You can minimize
    those things through various means (cgroups, ristricting NUMA nodes,
    etc).  What is more difficult is isolating disk IO contention (say
    if you want local SSDs for VMs), memory bus and QPI contention,
    network contention, etc. If the VMs are working really hard you can
    restrict them to their own socket, and you can even restrict memory
    usage to the local socket, but what about remote socket network or
    disk IO? (you will almost certainly want these things on the ceph
    socket)  I wonder as well about increased risk of hardware failure
    with the increased load, but I don't have any statistics.

    I'm guessing if you spent enough time at it you could make it work
    relatively well, but at least personally I question how beneficial
    it really is after all of that.  If you are going for cost savings,
    I suspect efficient compute and storage node designs will be nearly
    as good with much less complexity.

    Mark


    On 03/26/2015 07:11 AM, Wido den Hollander wrote:

        On 26-03-15 12:04, Stefan Priebe - Profihost AG wrote:

            Hi Wido,
            Am 26.03.2015 um 11:59 schrieb Wido den Hollander:

                On 26-03-15 11:52, Stefan Priebe - Profihost AG wrote:

                    Hi,

                    in the past i rwad pretty often that it's not a good
                    idea to run ceph
                    and qemu / the hypervisors on the same nodes.

                    But why is this a bad idea? You save space and can
                    better use the
                    ressources you have in the nodes anyway.


                Memory pressure during recovery *might* become a
                problem. If you make
                sure that you don't allocate more then let's say 50% for
                the guests it
                could work.


            mhm sure? I've never seen problems like that. Currently i
            ran each ceph
            node with 64GB of memory and each hypervisor node with
            around 512GB to
            1TB RAM while having 48 cores.


        Yes, it can happen. You have machines with enough memory, but if you
        overprovision the machines it can happen.

                Using cgroups you could also prevent that the OSDs eat
                up all memory or CPU.

            Never seen an OSD doing so crazy things.


        Again, it really depends on the available memory and CPU. If you
        buy big
        machines for this purpose it probably won't be a problem.

            Stefan

                So technically it could work, but memorey and CPU
                pressure is something
                which might give you problems.

                    Stefan

                    _________________________________________________
                    ceph-users mailing list
                    ceph-users@lists.ceph.com
                    <mailto:ceph-users@lists.ceph.com>
                    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
                    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>





    _________________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to