There is zfs ocf_heartbeat agent, which let you can import/export and failover zpool with PCS.
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/ZFS ZFS based NAS guide will be good reference, except this explains creating NAS (export with NFS) but not lustre. https://github.com/ewwhite/zfs-ha/wiki when import/export from failed server to live server is done, it is straightforward to mount zfs backed lustre ost with "mount -t lustre <poolname>/<ostname> <mntpoint> " command. This can be integrated to above zfs heartbeat script. On Fri, Jan 4, 2019 at 2:42 PM ANS <[email protected]> wrote: > Thank you Jongwoo Han for the detailed explanation. > > Can any one let me know how can i configure HA for the zfs pools in CentOS > 7.4. > > As it is different from the normal lustre HA. > > Thanks, > ANS > On Fri, Jan 4, 2019 at 2:42 PM ANS <[email protected]> wrote: > Thank you Jongwoo Han for the detailed explanation. > > Can any one let me know how can i configure HA for the zfs pools in CentOS > 7.4. > > As it is different from the normal lustre HA. > > Thanks, > ANS > > On Wed, Jan 2, 2019 at 9:20 PM Jongwoo Han <[email protected]> wrote: > >> 1. "zpool list" command shows zpool size as all sum of physical drives. >> when you create raidz2 volume (which is identical to raid6) with 10 * 6TB >> drives, the zpool size counts up to 60TB (54TiB), while usable space from >> "lfs df" is 8 * 6TB = 48TB (about 42TiB). even "lfs df -h" whill show TiB >> size rounded down, you will see less. try "df -H" at OSS. >> >> 2. zpool based MDT assign inode dynamically, unlike ext4 based MDT. The >> total number of inodes number will start to grow as you create more files. >> this will be clearly visible after multi million files are created. try >> recording current inodes and compare it after creating many metadata. >> >> On Wed, Jan 2, 2019 at 4:06 PM ANS <[email protected]> wrote: >> >>> Thank you Jeff. I got the solution for this it is the variation in zfs >>> rather than the lustre because of the parity considered. >>> >>> But the metadata should occupy the 60% for the inode creation which is >>> not happening in the zfs compared with ext4 ldisk. >>> >>> Thanks, >>> ANS >>> >>> On Tue, Jan 1, 2019 at 1:05 PM ANS <[email protected]> wrote: >>> >>>> Thank you Jeff. I have created the lustre on ZFS freshly and no other >>>> is having access to it. So when mounted it on client it is showing around >>>> 40TB variation from the actual space. >>>> >>>> So what could be the reason for this variation of the size. >>>> >>>> Thanks, >>>> ANS >>>> >>>> On Tue, Jan 1, 2019 at 12:21 PM Jeff Johnson < >>>> [email protected]> wrote: >>>> >>>>> Very forward versions...especially on ZFS. >>>>> >>>>> You build OST volumes in a pool. If no other volumes are defined in a >>>>> pool then 100% of that pool will be available for the OST volume but the >>>>> way ZFS works the capacity doesn’t really belong to the OST volume until >>>>> blocks are allocated for writes. So you have a pool >>>>> Of a known size and you’re the admin. As long as nobody else can >>>>> create a ZFS volume in that pool then all of the capacity in that pool >>>>> will >>>>> go to the OST eventually when new writes occur. Keep in mind that the same >>>>> pool can contain multiple snapshots (if created) so the pool is a >>>>> “potential capacity” but that capacity could be concurrently allocated to >>>>> OST volume writes, snapshots and other ZFS volumes (if created) >>>>> >>>>> —Jeff >>>>> >>>>> >>>>> >>>>> On Mon, Dec 31, 2018 at 22:20 ANS <[email protected]> wrote: >>>>> >>>>>> Thanks Jeff. Currently i am using >>>>>> >>>>>> modinfo zfs | grep version >>>>>> version: 0.8.0-rc2 >>>>>> rhelversion: 7.4 >>>>>> >>>>>> lfs --version >>>>>> lfs 2.12.0 >>>>>> >>>>>> And this is a fresh install. So is there any other possibility to >>>>>> show the complete zpool lun has been allocated for lustre alone. >>>>>> >>>>>> Thanks, >>>>>> ANS >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 1, 2019 at 11:44 AM Jeff Johnson < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> ANS, >>>>>>> >>>>>>> Lustre on top of ZFS has to estimate capacities and it’s fairly off >>>>>>> when the OSTs are new and empty. As objects are written to OSTs and >>>>>>> capacity is consumed it gets the sizing of capacity more accurate. At >>>>>>> the >>>>>>> beginning it’s so off that it appears to be an error. >>>>>>> >>>>>>> What version are you running? Some patches have been added to make >>>>>>> this calculation more accurate. >>>>>>> >>>>>>> —Jeff >>>>>>> >>>>>>> On Mon, Dec 31, 2018 at 22:08 ANS <[email protected]> wrote: >>>>>>> >>>>>>>> Dear Team, >>>>>>>> >>>>>>>> I am trying to configure lustre with backend ZFS as file system >>>>>>>> with 2 servers in HA. But after compiling and creating zfs pools >>>>>>>> >>>>>>>> zpool list >>>>>>>> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP >>>>>>>> DEDUP HEALTH ALTROOT >>>>>>>> lustre-data 54.5T 25.8M 54.5T - 16.0E 0% 0% >>>>>>>> 1.00x ONLINE - >>>>>>>> lustre-data1 54.5T 25.1M 54.5T - 16.0E 0% 0% >>>>>>>> 1.00x ONLINE - >>>>>>>> lustre-data2 54.5T 25.8M 54.5T - 16.0E 0% 0% >>>>>>>> 1.00x ONLINE - >>>>>>>> lustre-data3 54.5T 25.8M 54.5T - 16.0E 0% 0% >>>>>>>> 1.00x ONLINE - >>>>>>>> lustre-meta 832G 3.50M 832G - 16.0E 0% 0% >>>>>>>> 1.00x ONLINE - >>>>>>>> >>>>>>>> and when mounted to client >>>>>>>> >>>>>>>> lfs df -h >>>>>>>> UUID bytes Used Available Use% >>>>>>>> Mounted on >>>>>>>> home-MDT0000_UUID 799.7G 3.2M 799.7G 0% >>>>>>>> /home[MDT:0] >>>>>>>> home-OST0000_UUID 39.9T 18.0M 39.9T 0% >>>>>>>> /home[OST:0] >>>>>>>> home-OST0001_UUID 39.9T 18.0M 39.9T 0% >>>>>>>> /home[OST:1] >>>>>>>> home-OST0002_UUID 39.9T 18.0M 39.9T 0% >>>>>>>> /home[OST:2] >>>>>>>> home-OST0003_UUID 39.9T 18.0M 39.9T 0% >>>>>>>> /home[OST:3] >>>>>>>> >>>>>>>> filesystem_summary: 159.6T 72.0M 159.6T 0% /home >>>>>>>> >>>>>>>> So out of total 54.5TX4=218TB i am getting only 159 TB usable. So >>>>>>>> can any one give the information regarding this. >>>>>>>> >>>>>>>> Also from performance prospective what are the zfs and lustre >>>>>>>> parameters to be tuned. >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks, >>>>>>>> ANS. >>>>>>>> _______________________________________________ >>>>>>>> lustre-discuss mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>>>>>> >>>>>>> -- >>>>>>> ------------------------------ >>>>>>> Jeff Johnson >>>>>>> Co-Founder >>>>>>> Aeon Computing >>>>>>> >>>>>>> [email protected] >>>>>>> www.aeoncomputing.com >>>>>>> t: 858-412-3810 x1001 f: 858-412-3845 >>>>>>> m: 619-204-9061 >>>>>>> >>>>>>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117 >>>>>>> <https://maps.google.com/?q=4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117&entry=gmail&source=g> >>>>>>> >>>>>>> High-Performance Computing / Lustre Filesystems / Scale-out Storage >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> ANS. >>>>>> >>>>> -- >>>>> ------------------------------ >>>>> Jeff Johnson >>>>> Co-Founder >>>>> Aeon Computing >>>>> >>>>> [email protected] >>>>> www.aeoncomputing.com >>>>> t: 858-412-3810 x1001 f: 858-412-3845 >>>>> m: 619-204-9061 >>>>> >>>>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117 >>>>> >>>>> High-Performance Computing / Lustre Filesystems / Scale-out Storage >>>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> ANS. >>>> >>> >>> >>> -- >>> Thanks, >>> ANS. >>> _______________________________________________ >>> lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>> >> >> >> -- >> Jongwoo Han >> +82-505-227-6108 >> > > > -- > Thanks, > ANS. > -- Jongwoo Han +82-505-227-6108
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
