Or is it possible to mount one OSD directly for read file access? v
On Sun, Nov 11, 2018 at 1:47 PM Vlad Kopylov <[email protected]> wrote: > Maybe it is possible if done via gateway-nfs export? > Settings for gateway allow read osd selection? > > v > > On Sun, Nov 11, 2018 at 1:01 AM Martin Verges <[email protected]> > wrote: > >> Hello Vlad, >> >> If you want to read from the same data, then it ist not possible (as far >> I know). >> >> -- >> Martin Verges >> Managing director >> >> Mobile: +49 174 9335695 >> E-Mail: [email protected] >> Chat: https://t.me/MartinVerges >> >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> >> Web: https://croit.io >> YouTube: https://goo.gl/PGE1Bx >> >> Am Sa., 10. Nov. 2018, 03:47 hat Vlad Kopylov <[email protected]> >> geschrieben: >> >>> Maybe i missed something but FS is explicitly selecting pools to put >>> files and metadata, like I did below. >>> So if I create new pools - data in them will be different. If I apply >>> the rule dc1_primary to cfs_data pool, and client from dc3 connects to fs >>> t01 - it will start using dc1 hosts >>> >>> >>> ceph osd pool create cfs_data 100 >>> ceph osd pool create cfs_meta 100 >>> ceph fs new t01 cfs_data cfs_meta >>> sudo mount -t ceph ceph1:6789:/ /mnt/t01 -o >>> name=admin,secretfile=/home/mciadmin/admin.secret >>> >>> rule dc1_primary { >>> id 1 >>> type replicated >>> min_size 1 >>> max_size 10 >>> step take dc1 >>> step chooseleaf firstn 1 type host >>> step emit >>> step take dc2 >>> step chooseleaf firstn -2 type host >>> step emit >>> step take dc3 >>> step chooseleaf firstn -2 type host >>> step emit >>> } >>> >>> On Fri, Nov 9, 2018 at 9:32 PM Vlad Kopylov <[email protected]> wrote: >>> >>>> Just to confirm - it will still populate 3 copies in each datacenter? >>>> Thought this map was to select where to write to, guess it does write >>>> replication on the back end. >>>> >>>> I thought pools are completely separate and clients would not see each >>>> others data? >>>> >>>> Thank you Martin! >>>> >>>> >>>> >>>> >>>> On Fri, Nov 9, 2018 at 2:10 PM Martin Verges <[email protected]> >>>> wrote: >>>> >>>>> Hello Vlad, >>>>> >>>>> you can generate something like this: >>>>> >>>>> rule dc1_primary_dc2_secondary { >>>>> id 1 >>>>> type replicated >>>>> min_size 1 >>>>> max_size 10 >>>>> step take dc1 >>>>> step chooseleaf firstn 1 type host >>>>> step emit >>>>> step take dc2 >>>>> step chooseleaf firstn 1 type host >>>>> step emit >>>>> step take dc3 >>>>> step chooseleaf firstn -2 type host >>>>> step emit >>>>> } >>>>> >>>>> rule dc2_primary_dc1_secondary { >>>>> id 2 >>>>> type replicated >>>>> min_size 1 >>>>> max_size 10 >>>>> step take dc1 >>>>> step chooseleaf firstn 1 type host >>>>> step emit >>>>> step take dc2 >>>>> step chooseleaf firstn 1 type host >>>>> step emit >>>>> step take dc3 >>>>> step chooseleaf firstn -2 type host >>>>> step emit >>>>> } >>>>> >>>>> After you added such crush rules, you can configure the pools: >>>>> >>>>> ~ $ ceph osd pool set <pool_for_dc1> crush_ruleset 1 >>>>> ~ $ ceph osd pool set <pool_for_dc2> crush_ruleset 2 >>>>> >>>>> Now you place your workload from dc1 to the dc1 pool, and workload >>>>> from dc2 to the dc2 pool. You could also use HDD with SSD journal (if >>>>> your workload issn't that write intensive) and save some money in dc3 >>>>> as your client would always read from a SSD and write to Hybrid. >>>>> >>>>> Btw. all this could be done with a few simple clicks through our web >>>>> frontend. Even if you want to export it via CephFS / NFS / .. it is >>>>> possible to set it on a per folder level. Feel free to take a look at >>>>> https://www.youtube.com/watch?v=V33f7ipw9d4 to see how easy it could >>>>> be. >>>>> >>>>> -- >>>>> Martin Verges >>>>> Managing director >>>>> >>>>> Mobile: +49 174 9335695 >>>>> E-Mail: [email protected] >>>>> Chat: https://t.me/MartinVerges >>>>> >>>>> croit GmbH, Freseniusstr. 31h, 81247 Munich >>>>> CEO: Martin Verges - VAT-ID: DE310638492 >>>>> Com. register: Amtsgericht Munich HRB 231263 >>>>> >>>>> Web: https://croit.io >>>>> YouTube: https://goo.gl/PGE1Bx >>>>> >>>>> >>>>> 2018-11-09 17:35 GMT+01:00 Vlad Kopylov <[email protected]>: >>>>> > Please disregard pg status, one of test vms was down for some time >>>>> it is >>>>> > healing. >>>>> > Question only how to make it read from proper datacenter >>>>> > >>>>> > If you have an example. >>>>> > >>>>> > Thanks >>>>> > >>>>> > >>>>> > On Fri, Nov 9, 2018 at 11:28 AM Vlad Kopylov <[email protected]> >>>>> wrote: >>>>> >> >>>>> >> Martin, thank you for the tip. >>>>> >> googling ceph crush rule examples doesn't give much on rules, just >>>>> static >>>>> >> placement of buckets. >>>>> >> this all seems to be for placing data, not to giving client in >>>>> specific >>>>> >> datacenter proper read osd >>>>> >> >>>>> >> maybe something wrong with placement groups? >>>>> >> >>>>> >> I added datacenter dc1 dc2 dc3 >>>>> >> Current replicated_rule is >>>>> >> >>>>> >> rule replicated_rule { >>>>> >> id 0 >>>>> >> type replicated >>>>> >> min_size 1 >>>>> >> max_size 10 >>>>> >> step take default >>>>> >> step chooseleaf firstn 0 type host >>>>> >> step emit >>>>> >> } >>>>> >> >>>>> >> # buckets >>>>> >> host ceph1 { >>>>> >> id -3 # do not change unnecessarily >>>>> >> id -2 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item osd.0 weight 1.000 >>>>> >> } >>>>> >> datacenter dc1 { >>>>> >> id -9 # do not change unnecessarily >>>>> >> id -4 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item ceph1 weight 1.000 >>>>> >> } >>>>> >> host ceph2 { >>>>> >> id -5 # do not change unnecessarily >>>>> >> id -6 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item osd.1 weight 1.000 >>>>> >> } >>>>> >> datacenter dc2 { >>>>> >> id -10 # do not change unnecessarily >>>>> >> id -8 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item ceph2 weight 1.000 >>>>> >> } >>>>> >> host ceph3 { >>>>> >> id -7 # do not change unnecessarily >>>>> >> id -12 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item osd.2 weight 1.000 >>>>> >> } >>>>> >> datacenter dc3 { >>>>> >> id -11 # do not change unnecessarily >>>>> >> id -13 class ssd # do not change unnecessarily >>>>> >> # weight 1.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item ceph3 weight 1.000 >>>>> >> } >>>>> >> root default { >>>>> >> id -1 # do not change unnecessarily >>>>> >> id -14 class ssd # do not change unnecessarily >>>>> >> # weight 3.000 >>>>> >> alg straw2 >>>>> >> hash 0 # rjenkins1 >>>>> >> item dc1 weight 1.000 >>>>> >> item dc2 weight 1.000 >>>>> >> item dc3 weight 1.000 >>>>> >> } >>>>> >> >>>>> >> >>>>> >> #ceph pg dump >>>>> >> dumped all >>>>> >> version 29433 >>>>> >> stamp 2018-11-09 11:23:44.510872 >>>>> >> last_osdmap_epoch 0 >>>>> >> last_pg_scan 0 >>>>> >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND >>>>> BYTES LOG >>>>> >> DISK_LOG STATE STATE_STAMP >>>>> VERSION >>>>> >> REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB >>>>> SCRUB_STAMP >>>>> >> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN >>>>> >> 1.5f 0 0 0 0 0 >>>>> 0 >>>>> >> 0 0 active+clean 2018-11-09 04:35:32.320607 >>>>> 0'0 >>>>> >> 544:1317 [0,2,1] 0 [0,2,1] 0 0'0 >>>>> 2018-11-09 >>>>> >> 04:35:32.320561 0'0 2018-11-04 11:55:54.756115 >>>>> 0 >>>>> >> 2.5c 143 0 143 0 0 >>>>> 19490267 >>>>> >> 461 461 active+undersized+degraded 2018-11-08 19:02:03.873218 >>>>> 508'461 >>>>> >> 544:2100 [2,1] 2 [2,1] 2 290'380 >>>>> 2018-11-07 >>>>> >> 18:58:43.043719 64'120 2018-11-05 14:21:49.256324 >>>>> 0 >>>>> >> ..... >>>>> >> sum 15239 0 2053 2659 0 2157615019 58286 58286 >>>>> >> OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM >>>>> >> 2 3.7 GiB 28 GiB 32 GiB [0,1] 200 73 >>>>> >> 1 3.7 GiB 28 GiB 32 GiB [0,2] 200 58 >>>>> >> 0 3.7 GiB 28 GiB 32 GiB [1,2] 173 69 >>>>> >> sum 11 GiB 85 GiB 96 GiB >>>>> >> >>>>> >> #ceph pg map 2.5c >>>>> >> osdmap e545 pg 2.5c (2.5c) -> up [2,1] acting [2,1] >>>>> >> >>>>> >> #pg map 1.5f >>>>> >> osdmap e547 pg 1.5f (1.5f) -> up [0,2,1] acting [0,2,1] >>>>> >> >>>>> >> >>>>> >> On Fri, Nov 9, 2018 at 2:21 AM Martin Verges < >>>>> [email protected]> >>>>> >> wrote: >>>>> >>> >>>>> >>> Hello Vlad, >>>>> >>> >>>>> >>> Ceph clients connect to the primary OSD of each PG. If you create a >>>>> >>> crush rule for building1 and one for building2 that takes a OSD >>>>> from >>>>> >>> the same building as the first one, your reads to the pool will >>>>> always >>>>> >>> be on the same building (if the cluster is healthy) and only write >>>>> >>> request get replicated to the other building. >>>>> >>> >>>>> >>> -- >>>>> >>> Martin Verges >>>>> >>> Managing director >>>>> >>> >>>>> >>> Mobile: +49 174 9335695 >>>>> >>> E-Mail: [email protected] >>>>> >>> Chat: https://t.me/MartinVerges >>>>> >>> >>>>> >>> croit GmbH, Freseniusstr. 31h, 81247 Munich >>>>> >>> CEO: Martin Verges - VAT-ID: DE310638492 >>>>> >>> Com. register: Amtsgericht Munich HRB 231263 >>>>> >>> >>>>> >>> Web: https://croit.io >>>>> >>> YouTube: https://goo.gl/PGE1Bx >>>>> >>> >>>>> >>> >>>>> >>> 2018-11-09 4:54 GMT+01:00 Vlad Kopylov <[email protected]>: >>>>> >>> > I am trying to test replicated ceph with servers in different >>>>> >>> > buildings, and >>>>> >>> > I have a read problem. >>>>> >>> > Reads from one building go to osd in another building and vice >>>>> versa, >>>>> >>> > making >>>>> >>> > reads slower then writes! Making read as slow as slowest node. >>>>> >>> > >>>>> >>> > Is there a way to >>>>> >>> > - disable parallel read (so it reads only from the same osd node >>>>> where >>>>> >>> > mon >>>>> >>> > is); >>>>> >>> > - or give each client read restriction per osd? >>>>> >>> > - or maybe strictly specify read osd on mount; >>>>> >>> > - or have node read delay cap (for example if node time out is >>>>> larger >>>>> >>> > then 2 >>>>> >>> > ms then do not use such node for read as other replicas are >>>>> available). >>>>> >>> > - or ability to place Clients on the Crush map - so it >>>>> understands that >>>>> >>> > osd >>>>> >>> > in - for example osd in the same data-center as client has >>>>> preference, >>>>> >>> > and >>>>> >>> > pull data from it/them. >>>>> >>> > >>>>> >>> > Mounting with kernel client latest mimic. >>>>> >>> > >>>>> >>> > Thank you! >>>>> >>> > >>>>> >>> > Vlad >>>>> >>> > >>>>> >>> > _______________________________________________ >>>>> >>> > ceph-users mailing list >>>>> >>> > [email protected] >>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>> > >>>>> >>>>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
