Re: [ceph-users] Ceph newbie thoughts and questions

David Turner Thu, 04 May 2017 05:48:03 -0700

For gluster, when files are written into it as a mounted network gluster
filesystem, it word a lot of metadata for each object to know everything it
needs to about it for replication purposes. If you put the data manually on
the brick then it wouldn't be able to sync.


Correct, 3 mons, 2 mds, and 3 osd nodes is a good place to start. You can
choose to use erasure coding with a 2:1 setup (default if you create the
pool with options for erasure coding) or a replica setup with size 3
(default configuration).

The mds data is stored in the cluster.  I have an erasure coded cephfs that
has 9TB of data in it and the mds service uses 8k on disk (the size of the
folder and the keyring).  This is in my home cluster and I run each node
with 3 osds, a mon, and an mds.  I have replica pools and erasure coded
pools based on which is right for the job.

Failover of the mds works seamlessly for the clients.  The docs recommend
against hyper-converging services because if you do not have enough system
resources, then your daemons can crash/hang do to resource contention.  The
times you will run into resource contention is while your cluster isn't
healthy. Most ceph daemons can use 2-3x more memory while the cluster isn't
healthy as opposed to while it's health_ok.

On Thu, May 4, 2017, 4:17 AM Marcus <[email protected]> wrote:

> Thank you very much for your answer David, just what I was after!
> Just some additional questions to make it clear to me.
> The mds do not need to be in odd numbers?
> They can be set up 1,2,3,4 aso. as needed?
>
> You made the basics clear to me so when I set up my first ceph fs I need
> as a start:
> 3 mons, 2 mds and 3 ods. (To be able to avoid single point of failure)
>
> Is there a clear ratio/relation/approximation between ods and mds?
> If I have, say, 100TB of disk for ods, do I neeed X GB disk for mds?
>
> About gluster, my machines are set up in a gluster cluster today, but the
> reason for thinking about ceph fs for these machines instead is that I have
> problems with replication that I have not been able to solve. Second of all
> is that we get indications from our organisation that data use will expand
> very quickly, and that is where I see that ceph fs will suit us. Easy
> expand as needed.
> Thanks to your description of gluster I will be able to reconfigure my
> gluster cluster and rsync to the mounted cluster. I have used rsync
> directly to the harddrive, and now this is obvious that it does not work
> (worked fine a a single distributed server, but not as a replica). I just
> haven't got this tip from anybody else. Thanks again!
>
> We will start using ceph fs, because this goes hand in hand with our
> future needs.
>
> Best regards
> Marcus
>
>
>
>
> On 04/05/17 06:30, David Turner wrote:
>
> The clients will need to be able to contact the mons and the osds.  NEVER
> use 2 mons.  Mons are a quorum and work best with odd numbers (1, 3, 5,
> etc).  1 mon is better than 2 mons.  It is better to remove the raid and
> put the individual disks as OSDs.  Ceph handles the redundancy through
> replica copies.  It is much better to have a third node for failure domain
> reasons so you can have 3 copies of your data and have 1 in each of the 3
> servers.  The OSDs store their information in broken up objects divvied up
> into PGs that are assigned to the OSDs.  You would need to set up CephFS
> and rsync the data into it to migrate the data into ceph.
>
> I don't usually recommend this, but you might prefer Gluster.  You would
> use the raided disks as the brick in each node.  Set it up to have 2 copies
> (better to have 3 but you only have 2 nodes).  Each server can be used to
> NFS map the gluster mount point.  The files are stored as flat files on the
> bricks, but you would still need to create the gluster first and then rsync
> the data into the mounted gluster instead of directly onto the disk.  With
> this you don't have to worry about the mon service, mds service, osd
> services, balancing the crush map, etc.  Gluster of course has its own
> complexities and limitations, but it might be closer to what you're looking
> for right now.
>
> On Wed, May 3, 2017 at 4:06 PM Marcus Pedersén <[email protected]>
> wrote:
>
>> Hello everybody!
>>
>> I am a newbie on ceph and I really like it and want to try it out.
>> I have a couple of thoughts and questions after reading documentation and
>> need some help to see that I am on the right path.
>>
>> Today I have two file servers in production that I want to start my ceph
>> fs on and expand from that.
>> I want these servers to function as a failover cluster and as I see it I
>> will be able to do it with ceph.
>>
>> To get a failover cluster without a single point of failure I need at
>> least 2 monitors, 2 mds and 2 osd (my existing file servers), right?
>> Today, both of the file servers use a raid on 8 disks. Do I format my
>> raid xfs and run my osds on the raid?
>> Or do I split up my raid and add the disks directly to the osds?
>>
>> When I connect clients to my ceph fs are they talking to the mds or are
>> the clients talking to the ods directly as well?
>> If the client just talk to the mds then the ods and the monitor can be in
>> a separate network and the mds connected both to the client network and the
>> local "ceph" network.
>>
>> Today, we have about 11TB data on these file servers, how do I move the
>> data to the ceph fs? Is it possible to rsync to one of the ods disks, start
>> the ods daemon and let it replicate itself?
>>
>> Is it possible to set up the ceph fs with 2 mds, 2 monitors and 1 ods and
>> add the second ods later?
>> This is to be able to have one file server in production, config ceph and
>> test with the other, swap to the ceph system and when it is up and running
>> add the second ods.
>>
>> Of course I will test this out before I bring it to production.
>>
>> Many thanks in advance!
>>
>> Best regards
>> Marcus
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> --
> ------------------------------
> *Marcus Pedersén*
> *System administrator*
>
>
> *Interbull Centre*
> Department of Animal Breeding & Genetics — SLU
> Box 7023, SE-750 07
> Uppsala, Sweden
>
> Visiting address:
> Room 55614, Ulls väg 26, Ultuna
> Uppsala
> Sweden
>
> Tel: +46-(0)18-67 1962
> [image: Interbull Logo]
>
> [image: ISO certification logo]
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph newbie thoughts and questions

Reply via email to