Hmm, I was assuming the question was about total space as I struggled for some time to understand why do I have 99 TB total available space per OSS, after installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. The 20% difference was partially (10%) accounted by different raid6 / raidz2 configuration. But I was not able to explain the other 10%.
For question in original post, I can not make 24 TB from "available" field of df output: 207 KiB "available" on his zfs lustre, 198 KiB on ldiskfs lustre. At the same time the difference of the total space is 233548424256 -207693153280 = 25855270976 KiB = 24.09 TB. Götz, could you please tell us what did you mean by "available" ? Also, in my case the output of linux df on OSS for the zfs pool looks strange: zpool size reported as 25T (why?), and the formatted OST taking all space on this pool shows 33T: [root@lfs1 ~]# df -h /zpla-0000 /mnt/OST0000 Filesystem Size Used Avail Use% Mounted on zpla-0000 25T 256K 25T 1% /zpla-0000 zpla-0000/OST0000 33T 8.3T 25T 26% /mnt/OST0000 [root@lfs1 ~]# in bytes: [root@lfs1 ~]# df --block-size=1 /zpla-0000 /mnt/OST0000 Filesystem 1B-blocks Used Available Use% Mounted on zpla-0000 26769344561152 262144 26769344299008 1% /zpla-0000 zpla-0000/OST0000 35582552834048 9093386076160 26489164660736 26% /mnt/OST0000 same ost reported by lustre: [root@lfsa scripts]# lfs df UUID 1K-blocks Used Available Use% Mounted on lfs-MDT0000_UUID 974961920 275328 974684544 0% /mnt/lfsa[MDT:0] lfs-OST0000_UUID 34748586752 8880259840 25868324736 26% /mnt/lfsa[OST:0] ... Compare: [root@lfs1 ~]# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zpla-0000 43.5T 10.9T 32.6T - 16% 24% 1.00x ONLINE - zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE - zpla-0002 43.5T 10.8T 32.7T - 17% 24% 1.00x ONLINE - I realize zfs reports raw disk space including parity blocks (48TB = 43.5 TiB); and everything else (like metadata, space for xattr inodes). I can not explain the difference 40 TB (dec.) of data space (10*4TB drives) and 35,582,552,834,048 bytes shown by df for OST. Best regards, Alex. On Aug 24, 2015, at 7:52 PM, Christopher J. Morrone <morro...@llnl.gov> wrote: > I could be wrong, but I don't think that the original poster was asking > why the SIZE field of zpool list was wrong, but rather why the AVAIL > space in zfs list was lower than he expected. > > I would find it easier to answer the question if I knew his drive count > and drive size. > > Chris > > On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote: >> Same question here. >> >> 6TB/65TB is 11% . In our case about the same fraction was "missing." >> >> My speculation was, It may happen if at some point between zpool and linux >> the value reported in TB is interpreted as in TiB, and then converted to TB. >> Or unneeded conversion MB to MiB done twice, etc. >> >> Here is my numbers: >> We have 12* 4TB drives per pool, it is 48 TB (decimal). >> zpool created as raidz2 10+2. >> zpool reports 43.5T. >> Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, >> before raiding or after raiding). >>> From the Oracle ZFS documentation, "zpool list" returns the total space >>> without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB. >> >> In my case, it looked like conversion error/interpretation issue between TB >> and TiB: >> >> 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992 >> >> >> At disk level: >> >> ~/sas2ircu 0 display >> >> Device is a Hard disk >> Enclosure # : 2 >> Slot # : 12 >> SAS Address : 5003048-0-015a-a918 >> State : Ready (RDY) >> Size (in MB)/(in sectors) : 3815447/7814037167 >> Manufacturer : ATA >> Model Number : HGST HUS724040AL >> Firmware Revision : AA70 >> Serial No : PN2334PBJPW14T >> GUID : 5000cca23de6204b >> Protocol : SATA >> Drive Type : SATA_HDD >> >> One disk size is about 4 TB (decimal): >> >> 3815447*1024*1024 = 4000786153472 >> 7814037167*512 = 4000787029504 >> >> vdev presents whole disk to zpool. There is some overhead, some space left >> on sdq9 . >> >> [root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf >> alias s0 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90c-lun-0 >> alias s1 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90d-lun-0 >> alias s2 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90e-lun-0 >> alias s3 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90f-lun-0 >> ... >> alias s12 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa918-lun-0 >> ... >> >> [root@lfs1 scripts]# ls -l /dev/disk/by-path/ >> ... >> lrwxrwxrwx 1 root root 9 Jul 23 16:27 >> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq >> lrwxrwxrwx 1 root root 10 Jul 23 16:27 >> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1 >> lrwxrwxrwx 1 root root 10 Jul 23 16:27 >> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9 >> >> Pool report: >> >> [root@lfs1 scripts]# zpool list >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH >> ALTROOT >> zpla-0000 43.5T 10.9T 32.6T - 16% 24% 1.00x ONLINE - >> zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE - >> zpla-0002 43.5T 10.8T 32.7T - 17% 24% 1.00x ONLINE - >> [root@lfs1 scripts]# >> >> [root@lfs1 ~]# zpool list -v zpla-0001 >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT >> zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE - >> raidz2 43.5T 11.0T 32.5T - 17% 25% >> s12 - - - - - - >> s13 - - - - - - >> s14 - - - - - - >> s15 - - - - - - >> s16 - - - - - - >> s17 - - - - - - >> s18 - - - - - - >> s19 - - - - - - >> s20 - - - - - - >> s21 - - - - - - >> s22 - - - - - - >> s23 - - - - - - >> [root@lfs1 ~]# >> >> [root@lfs1 ~]# zpool get all zpla-0001 >> NAME PROPERTY VALUE SOURCE >> zpla-0001 size 43.5T - >> zpla-0001 capacity 25% - >> zpla-0001 altroot - default >> zpla-0001 health ONLINE - >> zpla-0001 guid 5472902975201420000 default >> zpla-0001 version - default >> zpla-0001 bootfs - default >> zpla-0001 delegation on default >> zpla-0001 autoreplace off default >> zpla-0001 cachefile - default >> zpla-0001 failmode wait default >> zpla-0001 listsnapshots off default >> zpla-0001 autoexpand off default >> zpla-0001 dedupditto 0 default >> zpla-0001 dedupratio 1.00x - >> zpla-0001 free 32.5T - >> zpla-0001 allocated 11.0T - >> zpla-0001 readonly off - >> zpla-0001 ashift 12 local >> zpla-0001 comment - default >> zpla-0001 expandsize - - >> zpla-0001 freeing 0 default >> zpla-0001 fragmentation 17% - >> zpla-0001 leaked 0 default >> zpla-0001 feature@async_destroy enabled local >> zpla-0001 feature@empty_bpobj active local >> zpla-0001 feature@lz4_compress active local >> zpla-0001 feature@spacemap_histogram active local >> zpla-0001 feature@enabled_txg active local >> zpla-0001 feature@hole_birth active local >> zpla-0001 feature@extensible_dataset enabled local >> zpla-0001 feature@embedded_data active local >> zpla-0001 feature@bookmarks enabled local >> >> Alex. >> >> On Aug 19, 2015, at 8:18 AM, Götz Waschk <goetz.was...@gmail.com> wrote: >> >>> Dear Lustre experts, >>> >>> I have configured two different Lustre instances, both using Lustre >>> 2.5.3, one with ldiskfs on RAID-6 hardware RAID and one using ZFS and >>> RAID-Z2, using the same type of hardware. I was wondering, why I 24 TB >>> less space available, when I should have the same amount of parity >>> used: >>> >>> # lfs df >>> UUID 1K-blocks Used Available Use% Mounted on >>> fs19-MDT0000_UUID 50322916 472696 46494784 1% >>> /testlustre/fs19[MDT:0] >>> fs19-OST0000_UUID 51923288320 12672 51923273600 0% >>> /testlustre/fs19[OST:0] >>> fs19-OST0001_UUID 51923288320 12672 51923273600 0% >>> /testlustre/fs19[OST:1] >>> fs19-OST0002_UUID 51923288320 12672 51923273600 0% >>> /testlustre/fs19[OST:2] >>> fs19-OST0003_UUID 51923288320 12672 51923273600 0% >>> /testlustre/fs19[OST:3] >>> filesystem summary: 207693153280 50688 207693094400 0% >>> /testlustre/fs19 >>> UUID 1K-blocks Used Available Use% Mounted on >>> fs18-MDT0000_UUID 47177700 482152 43550028 1% >>> /lustre/fs18[MDT:0] >>> fs18-OST0000_UUID 58387106064 6014088200 49452733560 11% >>> /lustre/fs18[OST:0] >>> fs18-OST0001_UUID 58387106064 5919753028 49547068928 11% >>> /lustre/fs18[OST:1] >>> fs18-OST0002_UUID 58387106064 5944542316 49522279640 11% >>> /lustre/fs18[OST:2] >>> fs18-OST0003_UUID 58387106064 5906712004 49560109952 11% >>> /lustre/fs18[OST:3] >>> filesystem summary: 233548424256 23785095548 198082192080 11% /lustre/fs18 >>> >>> fs18 is using ldiskfs, while fs19 is ZFS: >>> # zpool list >>> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >>> lustre-ost1 65T 18,1M 65,0T 0% 1.00x ONLINE - >>> # zfs list >>> NAME USED AVAIL REFER MOUNTPOINT >>> lustre-ost1 13,6M 48,7T 311K /lustre-ost1 >>> lustre-ost1/ost1 12,4M 48,7T 12,4M /lustre-ost1/ost1 >>> >>> >>> Any idea on why my 6TB per OST went? >>> >>> Regards, Götz Waschk >>> _______________________________________________ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> . >> > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org