Re: [lustre-discuss] free space on ldiskfs vs. zfs

Alexander I Kulyavtsev Mon, 24 Aug 2015 20:54:41 -0700

Hmm,
I was assuming the question was about total space as I struggled for some time 
to understand  why do I have 99 TB total available space per OSS, after 
installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. The 
20% difference was partially (10%) accounted by different raid6 / raidz2 
configuration. But I was not able to explain the other 10%.


For question in original post, I can not make 24 TB from "available" field of 
df output:
207 KiB "available" on his zfs lustre,  198 KiB on ldiskfs lustre.
At the same time the difference of the total space is 
233548424256 -207693153280 = 25855270976 KiB = 24.09 TB.

Götz, could you please tell us what did you mean by "available" ?

Also,
in my case the output of linux df on OSS for the zfs pool looks strange:
zpool size reported as 25T (why?), and the formatted OST taking all space on 
this pool shows 33T:

[root@lfs1 ~]# df -h  /zpla-0000  /mnt/OST0000
Filesystem         Size  Used Avail Use% Mounted on
zpla-0000           25T  256K   25T   1% /zpla-0000
zpla-0000/OST0000   33T  8.3T   25T  26% /mnt/OST0000
[root@lfs1 ~]# 

in bytes:

[root@lfs1 ~]# df --block-size=1  /zpla-0000  /mnt/OST0000
Filesystem             1B-blocks          Used      Available Use% Mounted on
zpla-0000         26769344561152        262144 26769344299008   1% /zpla-0000
zpla-0000/OST0000 35582552834048 9093386076160 26489164660736  26% /mnt/OST0000

same ost reported by lustre:
[root@lfsa scripts]# lfs df 
UUID                   1K-blocks        Used   Available Use% Mounted on
lfs-MDT0000_UUID       974961920      275328   974684544   0% /mnt/lfsa[MDT:0]
lfs-OST0000_UUID     34748586752  8880259840 25868324736  26% /mnt/lfsa[OST:0]
...

Compare:

[root@lfs1 ~]# zpool list
NAME        SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zpla-0000  43.5T  10.9T  32.6T         -    16%    24%  1.00x  ONLINE  -
zpla-0001  43.5T  11.0T  32.5T         -    17%    25%  1.00x  ONLINE  -
zpla-0002  43.5T  10.8T  32.7T         -    17%    24%  1.00x  ONLINE  -
I realize zfs reports raw disk space including parity blocks (48TB = 43.5 TiB); 
 and everything else (like metadata, space for xattr inodes).

I can not explain the difference 40 TB (dec.) of data space (10*4TB drives) and 
35,582,552,834,048 bytes shown by df for OST.

Best regards, Alex.

On Aug 24, 2015, at 7:52 PM, Christopher J. Morrone <morro...@llnl.gov> wrote:

> I could be wrong, but I don't think that the original poster was asking 
> why the SIZE field of zpool list was wrong, but rather why the AVAIL 
> space in zfs list was lower than he expected.
> 
> I would find it easier to answer the question if I knew his drive count 
> and drive size.
> 
> Chris
> 
> On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote:
>> Same question here.
>> 
>> 6TB/65TB is 11% . In our case about the same fraction was "missing."
>> 
>> My speculation was, It may happen if at some point between zpool and linux 
>> the value reported in TB is interpreted as in TiB, and then converted to TB. 
>> Or  unneeded conversion MB to MiB done twice, etc.
>> 
>> Here is my numbers:
>> We have 12* 4TB drives per pool, it is 48 TB (decimal).
>> zpool created as raidz2 10+2.
>> zpool reports  43.5T.
>> Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, 
>> before raiding or after raiding).
>>> From the Oracle ZFS documentation, "zpool list" returns the total space 
>>> without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB.
>> 
>> In my case, it looked like conversion error/interpretation issue between TB 
>> and TiB:
>> 
>> 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992
>> 
>> 
>> At disk level:
>> 
>> ~/sas2ircu 0 display
>> 
>> Device is a Hard disk
>>   Enclosure #                             : 2
>>   Slot #                                  : 12
>>   SAS Address                             : 5003048-0-015a-a918
>>   State                                   : Ready (RDY)
>>   Size (in MB)/(in sectors)               : 3815447/7814037167
>>   Manufacturer                            : ATA
>>   Model Number                            : HGST HUS724040AL
>>   Firmware Revision                       : AA70
>>   Serial No                               : PN2334PBJPW14T
>>   GUID                                    : 5000cca23de6204b
>>   Protocol                                : SATA
>>   Drive Type                              : SATA_HDD
>> 
>> One disk size is about 4 TB (decimal):
>> 
>> 3815447*1024*1024 = 4000786153472
>> 7814037167*512  = 4000787029504
>> 
>> vdev presents whole disk to zpool. There is some overhead, some space left 
>> on sdq9 .
>> 
>> [root@lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf
>> alias s0  /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90c-lun-0
>> alias s1  /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90d-lun-0
>> alias s2  /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90e-lun-0
>> alias s3  /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90f-lun-0
>> ...
>> alias s12  /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa918-lun-0
>> ...
>> 
>> [root@lfs1 scripts]# ls -l  /dev/disk/by-path/
>> ...
>> lrwxrwxrwx 1 root root  9 Jul 23 16:27 
>> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq
>> lrwxrwxrwx 1 root root 10 Jul 23 16:27 
>> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1
>> lrwxrwxrwx 1 root root 10 Jul 23 16:27 
>> pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9
>> 
>> Pool report:
>> 
>> [root@lfs1 scripts]# zpool list
>> NAME        SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  
>> ALTROOT
>> zpla-0000  43.5T  10.9T  32.6T         -    16%    24%  1.00x  ONLINE  -
>> zpla-0001  43.5T  11.0T  32.5T         -    17%    25%  1.00x  ONLINE  -
>> zpla-0002  43.5T  10.8T  32.7T         -    17%    24%  1.00x  ONLINE  -
>> [root@lfs1 scripts]#
>> 
>> [root@lfs1 ~]# zpool list -v zpla-0001
>> NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
>> zpla-0001  43.5T  11.0T  32.5T         -    17%    25%  1.00x  ONLINE  -
>>   raidz2  43.5T  11.0T  32.5T         -    17%    25%
>>     s12      -      -      -         -      -      -
>>     s13      -      -      -         -      -      -
>>     s14      -      -      -         -      -      -
>>     s15      -      -      -         -      -      -
>>     s16      -      -      -         -      -      -
>>     s17      -      -      -         -      -      -
>>     s18      -      -      -         -      -      -
>>     s19      -      -      -         -      -      -
>>     s20      -      -      -         -      -      -
>>     s21      -      -      -         -      -      -
>>     s22      -      -      -         -      -      -
>>     s23      -      -      -         -      -      -
>> [root@lfs1 ~]#
>> 
>> [root@lfs1 ~]# zpool get all zpla-0001
>> NAME       PROPERTY                    VALUE                       SOURCE
>> zpla-0001  size                        43.5T                       -
>> zpla-0001  capacity                    25%                         -
>> zpla-0001  altroot                     -                           default
>> zpla-0001  health                      ONLINE                      -
>> zpla-0001  guid                        5472902975201420000         default
>> zpla-0001  version                     -                           default
>> zpla-0001  bootfs                      -                           default
>> zpla-0001  delegation                  on                          default
>> zpla-0001  autoreplace                 off                         default
>> zpla-0001  cachefile                   -                           default
>> zpla-0001  failmode                    wait                        default
>> zpla-0001  listsnapshots               off                         default
>> zpla-0001  autoexpand                  off                         default
>> zpla-0001  dedupditto                  0                           default
>> zpla-0001  dedupratio                  1.00x                       -
>> zpla-0001  free                        32.5T                       -
>> zpla-0001  allocated                   11.0T                       -
>> zpla-0001  readonly                    off                         -
>> zpla-0001  ashift                      12                          local
>> zpla-0001  comment                     -                           default
>> zpla-0001  expandsize                  -                           -
>> zpla-0001  freeing                     0                           default
>> zpla-0001  fragmentation               17%                         -
>> zpla-0001  leaked                      0                           default
>> zpla-0001  feature@async_destroy       enabled                     local
>> zpla-0001  feature@empty_bpobj         active                      local
>> zpla-0001  feature@lz4_compress        active                      local
>> zpla-0001  feature@spacemap_histogram  active                      local
>> zpla-0001  feature@enabled_txg         active                      local
>> zpla-0001  feature@hole_birth          active                      local
>> zpla-0001  feature@extensible_dataset  enabled                     local
>> zpla-0001  feature@embedded_data       active                      local
>> zpla-0001  feature@bookmarks           enabled                     local
>> 
>> Alex.
>> 
>> On Aug 19, 2015, at 8:18 AM, Götz Waschk <goetz.was...@gmail.com> wrote:
>> 
>>> Dear Lustre experts,
>>> 
>>> I have configured two different Lustre instances, both using Lustre
>>> 2.5.3, one with ldiskfs on RAID-6 hardware RAID and one using ZFS and
>>> RAID-Z2, using the same type of hardware. I was wondering, why I 24 TB
>>> less space available, when I should have the same amount of parity
>>> used:
>>> 
>>> # lfs df
>>> UUID                   1K-blocks        Used   Available Use% Mounted on
>>> fs19-MDT0000_UUID       50322916      472696    46494784   1%
>>> /testlustre/fs19[MDT:0]
>>> fs19-OST0000_UUID    51923288320       12672 51923273600   0%
>>> /testlustre/fs19[OST:0]
>>> fs19-OST0001_UUID    51923288320       12672 51923273600   0%
>>> /testlustre/fs19[OST:1]
>>> fs19-OST0002_UUID    51923288320       12672 51923273600   0%
>>> /testlustre/fs19[OST:2]
>>> fs19-OST0003_UUID    51923288320       12672 51923273600   0%
>>> /testlustre/fs19[OST:3]
>>> filesystem summary:  207693153280       50688 207693094400   0% 
>>> /testlustre/fs19
>>> UUID                   1K-blocks        Used   Available Use% Mounted on
>>> fs18-MDT0000_UUID       47177700      482152    43550028   1%
>>> /lustre/fs18[MDT:0]
>>> fs18-OST0000_UUID    58387106064  6014088200 49452733560  11%
>>> /lustre/fs18[OST:0]
>>> fs18-OST0001_UUID    58387106064  5919753028 49547068928  11%
>>> /lustre/fs18[OST:1]
>>> fs18-OST0002_UUID    58387106064  5944542316 49522279640  11%
>>> /lustre/fs18[OST:2]
>>> fs18-OST0003_UUID    58387106064  5906712004 49560109952  11%
>>> /lustre/fs18[OST:3]
>>> filesystem summary:  233548424256 23785095548 198082192080  11% /lustre/fs18
>>> 
>>> fs18 is using ldiskfs, while fs19 is ZFS:
>>> # zpool list
>>> NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>>> lustre-ost1    65T  18,1M  65,0T     0%  1.00x  ONLINE  -
>>> # zfs list
>>> NAME               USED  AVAIL  REFER  MOUNTPOINT
>>> lustre-ost1       13,6M  48,7T   311K  /lustre-ost1
>>> lustre-ost1/ost1  12,4M  48,7T  12,4M  /lustre-ost1/ost1
>>> 
>>> 
>>> Any idea on why my 6TB per OST went?
>>> 
>>> Regards, Götz Waschk
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> .
>> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] free space on ldiskfs vs. zfs

Reply via email to