Glad I could help. ProxMox is a prebuilt kvm/qemu hypervisor with ceph
integration that may be worth looking into. Booting from rbds is definitely
something that is possible. There should be some resources on the ceph
documentation or people in the ML that know how to do it already.

On Thu, Feb 15, 2018, 7:01 AM Egoitz Aurrekoetxea <ego...@sarenet.es> wrote:

> Good morning David!!
>
>
> First all I wanted to hugely thank the mail you sent yesterday. You don't
> receive all the days these kind of advises from an expert in the area. I
> printed the mail and read it slowly for understanding properly.
>
> Basically I wanted to confirm there's no single point of failure and the
> hypervisors opinion or ideas....
>
>
> I'm trying now KVM. Although Qemu is able to create that kind of disk
> images I'm not totally sure if it could boot from them that is something
> very useful for us. In Xen I managed to access through krbd to the cluster
>
> space but you know, that's not the most optimized config as you told
> yesterday, because it should be done through librbd to be totally
> optimal... you know but am not really sure it could boot from that disks
> although
>
> have read too it can but... it seems Centos packages have the rbd support
> built in so... in fact, I can create disks... but am not just able to
> boot...
>
>
> Well I assume last topic would be finally clarified in KVM/Qemu list.
>
>
> Just wanted to send all my gratitude for your help :)
>
>
> Thanks mate,
>
> Cheers,
>
>
>
> El 2018-02-14 16:47, David Turner escribió:
>
> First off to answer your questions about mons, you need to understand that
> they work in a Paxos Quorum.  What that means is that there needs to be a
> majority of Mons that agree that they are in charge.  This is why even
> numbers of mons is a bad idea as they can potentially split themselves in
> half.  For this case, let's say you have 3 mons.  2 of them need to be up
> and communicating for them to agree that they can respond to clients.  If
> the third mon is online, but networking troubles are keeping it from
> communicating with the other 2 mons, it will realize that it isn't a part
> of the quorum and will refuse to respond to anyone that asks it questions.
> I think there might be some logic for allowing 1 mon to manage the cluster,
> but I think that works best if the other mons properly shut down informing
> the other mons that they are going offline so it isn't up to a vote for who
> is in charge.
>
> Lifecycle of a client and a mon.  When a client first communicates with a
> Ceph cluster it uses the mon_host setting in its ceph.conf file to know who
> the mons are.  It goes through the list until it gets one that will
> authoritatively respond for the cluster and give it the osd map.  Now that
> it has an osd map it can start communicating with all of the osds in the
> cluster, reading, writing, mounting, etc.  This is usually where a client
> stops talking to mons.  As a client is talking with osds, the osds will
> respond back with updated maps if there are any.  This change was made in
> the Hammer release of Ceph.  Before that, all map updates were handled by
> the mons and it was a huge burden on them causing them to prevent a cluster
> from growing larger than about 1,000 osds because the mons couldn't handle
> managing the maps for any more osds.  In Hammer, and still happening today,
> osds started updating each other's osd maps as they communicated with each
> other.  If anything is confused as to which map to use, they still ask the
> mon and the mon will tell them the right one.
>
> If a mon goes down, then the rest of the mon_host will be used to know who
> to contact.  It might fail on a down mon, but it will retry and get to one
> that is online.  Mons are the keeper of cephx auth keys and map versions,
> but other than that, they really don't impact performance much.  Everything
> else is handled by the algorithms in the osd map that tell a client where
> all objects and osds are in the cluster and the majority of map updates
> will come from the communication with the osds.
>
> Back to VMs and librbd vs krbd (which is /dev/rbd* devices).  The kernel
> driver does not have feature parity with Ceph.  Even the latest kernel does
> not support all Ceph RBD features and you will have to disable them in your
> cluster.  This disables things like object map which is how Ceph keeps
> track of which objects do and don't exist in an RBD.  Without object map
> Ceph has to assume that every object that can exist in an RBD does.  With
> object map, if you delete an RBD Ceph issues a delete to only the objects
> that exist, without it Ceph has to attempt to delete every object
> regardless if it exists.  Checking the used space of an RBD with object map
> is instant, checking it without object map can take several minutes on RBDs
> that are only 100GB in size (this is even worse if you are using snapshots
> as it has to check for every object that can possibly exist on the RBD
> itself as well as the snapshots).
>
> librbd has feature parity with Ceph as it is updated and the same version
> as Ceph with every release.  krbd is still trying to implement RBD features
> released over a year ago.  I prefer to use the Ceph libraries as often as
> possible, then the fuse drivers (except rbd-fuse because it is slower than
> dirt), and if I have no other choice then I'll use the kernel drivers.
> When it comes to choosing a hypervisor for hosting VMs on RBDs, there is no
> question in my mind that I would only look at options that use librbd.
>
> On Tue, Feb 13, 2018 at 6:13 PM Egoitz Aurrekoetxea <ego...@sarenet.es>
> wrote:
>
>> Hi David!!
>>
>> Thanks a lot for your answer. But what happens when you have... imagine
>> two monitors or more and one of them becomes unreponsive?. Another one is
>> used after a timeout or... what happens when a client wants to access to
>> some data, needs to query for that (for knowing where the info is) a
>> monitor and does not answer?. A monitor that becomes not responsive is
>> discarded for the following queries of where the data exists in the
>> cluster?.
>>
>> So saying in some way... you wont use when talking in terms of
>> performance any kind of solution not accessing through librbd?. Is the
>> performance poor or bad when using /dev/rbdX devices mounted?. Or perhaps
>> you say in terms of data integrity?.
>>
>> I was planning to use Xen with Cepth but after your advine ... 😀. Would
>> you definitively to with KVM?.
>>
>> Thanks a lot again 😉
>> Chefs,
>>
>>
>> Egoitz,
>>
>> El 13 feb 2018, a las 20:19, David Turner <drakonst...@gmail.com>
>> escribió:
>>
>> Monitors are not required for accessing data from the Ceph cluster.
>> Clients will ask a monitor for a current OSD map and then use that OSD map
>> to communicate with the OSDs directly for all reads and writes.  The map
>> includes the crush map which has all of the information a client needs to
>> know where every object is in the cluster.  Having 3 mons is a good number
>> for small deployments.  5 mons is better for better redundancy in the
>> monitor quorum.  Avoid an even number of mons always.
>>
>> librbd is definitely the way to go for accessing RBDs for a hypervisor as
>> opposed to fuse or krbd.  For a quick and easy hypervisor using Ceph, I
>> like Proxmox.  It natively has the ability to use KVM with Ceph without
>> having to configure it yourself.  It comes with a nice gui as well to see
>> the console screen for your VMs.  It also has a fairly simple guide to
>> cluster hypervisors together to provide HA support for your VMs.  For
>> larger scale VM deployments, Openstack is probably the way I would go.
>>
>> On Tue, Feb 13, 2018 at 2:11 PM Egoitz Aurrekoetxea <ego...@sarenet.es>
>> wrote:
>>
>>> Good afternoon,
>>>
>>> As I'm new to Ceph I was wondering what could be the most proper way to
>>> use it with Xen hypervisor (with a plain Linux installation, Centos, for
>>> instance). Have read the less proper one is to just
>>> mount the /dev/rbdX device in a mount point and just showing that space
>>> to the Hypervisor but I see it pretty easy and seems stable. Seems not
>>> to perform bad... Is it better to use for instance librbd
>>> with KVM?. Does it perform better?.
>>>
>>> By the way, it seems to use the monitor node in order to access to the
>>> space in the osd cluster. Have read too that Ceph has been designed
>>> keeping in mind no single points of failure but... is it possible
>>> to configure several monitor nodes, and then after a very little timeout
>>> or similar to access to the file system through the other nodes?. What
>>> could be the most proper way of configuring this for avoiding a
>>> machine to loose the storage if the monitor fails?. Could you point
>>> please me in the right direction?. Perhaps with several monitors or....
>>>
>>> By the way if you could consider it would be better to use another
>>> hypervisor or config (with librados or whatever) with Ceph, could you
>>> please suggest me too?. Help to the newbie :p :) :)
>>>
>>> Best regards,
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to