On 2014-03-22 21:32, Ask Bjørn Hansen wrote:
On Mar 22, 2014, at 10:43 AM, Gabriele Bulfon <[email protected]> wrote:
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
data2 1.62T 998G 658G - 60% 1.00x ONLINE -
zfs list shows
NAME USED AVAIL REFER MOUNTPOINT
data2 793G 293G 40.0K /data2
data2/myvolume 792G 421G 664G -
664G (or 792G) is more than 658G. According to Jim’s explanation ZFS wants to
make sure you can replace every byte on the zvol and you don’t have room for
that while keeping the snapshot.
Note that 658G is the zpool free space, which accounts raw blocks not
yet allocated for data or parity (in raidz case, not so for mirrors
which are more intuitively accounted). So in case of Gabriele's raidz
of 3 disks, 421G*1.5 = 631.5G unallocated space ROUGHLY matches the
free pool space (some overheads, 1/64 reservations, etc. in play).
Likewise, the allocation of 664G of written userdata for the pool
(in its one dataset) translates to 996G with parity (x1.5) which also
adequately matches the zpool numbers.
Another thing to see is that the pool available space is 293G, while
the zvol available space is 421G. This 128G difference is the reserved
space to guarantee filling up of the dataset to its 792G size (792-664).
These 128G are not yet allocated (via "zfs refer" nor "zpool alloc"
columns), but are already considered "used" and are unavailable for
use by any other datasets than this volume.
I think these 128G might be available for snapshots of this volume,
since the space is unused at the time of snapshot, and the reservation
for a complete rewrite of the data would be around 664G rather than
the full 792G.
very strange that snapshotting zvol would require usage space:
> isn't one of the best zfs options zero usage space snapshot at
> creation time?
It is not "usage": unlike some other systems, snapshots and clones
do not actually allocate a separate copy of the data for this.
Here is a matter of reservations done by default to guarantee that
you can (re-)use the live branch of the dataset completely, logical
setting that can be easily changed (at a risk).
You can and may tweak the (ref)reservation and/or some other zvol
dataset attributes to disable these guarantees, i.e. if you need
the snapshot in order to send out a zfs-send stream. Indeed, then
you would not be able to completely rewrite the dataset contents,
so an end-user might be surprised to see an out-of-space or some
other IO error while writing to a VMDK container with seemingly
lots of free space (according to its format headers), until you
destroy the snapshot and release the blocks referenced only by it
(overwritten in the newer branches) back into the available space.
See also the "man zfs" page, i.e.
volsize=size
For volumes, specifies the logical size of the volume.
By default, creating a volume establishes a reservation
of equal size. For storage pools with a version number
of 9 or higher, a refreservation is set instead...
The reservation is kept equal to the volume's logical
size to prevent unexpected behavior for consumers.
Without the reservation, the volume could run out of
space, resulting in undefined behavior or data corrup-
tion, depending on how the volume is used....
Though not recommended, a "sparse volume" (also known as
"thin provisioning") can be created by specifying the -s
option to the zfs create -V command, or by changing the
reservation after the volume has been created. A "sparse
volume" is a volume where the reservation is less then
the volume size. Consequently, writes to a sparse
volume can fail with ENOSPC when the pool is low on
space. For a sparse volume, changes to volsize are not
reflected in the reservation.
And a simple example, though a bit long. A picture is worth more
than words, though some are offered for comment as well :)
# zfs list rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool 44.1G 14.5G 50K /rpool
# zfs create -V5g rpool/testvol
# zfs list rpool rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 49.3G 9.31G 50K /rpool
rpool/testvol 5.16G 14.5G 16K -
# zfs snapshot rpool/testvol@0
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 49.3G 9.31G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 5.16G 14.5G 16K -
rpool/testvol@0 0 - 16K -
So (on an oi_151a8 at least) the zvol snapshot does not use space
initially.
Now I fill up some of the volume:
# dd if=/dev/zero bs=65536 count=8192 of=/dev/zvol/rdsk/rpool/testvol
8192+0 records in
8192+0 records out
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 49.3G 9.31G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 5.16G 14.0G 513M -
rpool/testvol@0 15K - 16K -
# zfs snapshot rpool/testvol@1
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 49.8G 8.81G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 5.66G 14.0G 513M -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 0 - 513M -
The pool available space as well as the space available for the
live branch volume dataset has decreased.
Now I add data to some other range:
# dd if=/dev/zero bs=65536 count=8192 seek=10240
of=/dev/zvol/rdsk/rpool/testvol
8192+0 records in
8192+0 records out
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 49.8G 8.81G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 5.66G 13.5G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
# zfs snapshot rpool/testvol@2
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 50.3G 8.31G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 6.16G 13.5G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 0 - 1.00G -
Another half a gig chomped off available spaces for pool and zvol,
and added to "refer" of the volume and later its snapshot.
Now I overwrite the same locations:
# dd if=/dev/zero bs=65536 count=4096 of=/dev/zvol/rdsk/rpool/testvol
4096+0 records in
4096+0 records out
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 50.3G 8.31G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 6.16G 13.2G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
Available space for the zvol has gone down (though it is within
reservations and has not any more affected the pool available
space), while used space (including reservations and snapshots)
remains in place...
# zfs snapshot rpool/testvol@3
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 50.5G 8.06G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 6.41G 13.2G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
rpool/testvol@3 0 - 1.00G -
Seemingly, all is the same - the live dataset and each snapshot
reference 1G. However, the "used" space has gone up, and avail
space for the pool has gone down being reserved for the snapshot
rewriting possible later on.
Here are some size-stats:
# zfs get all rpool/testvol | grep G
rpool/testvol used 6.41G -
rpool/testvol available 13.2G -
rpool/testvol referenced 1.00G -
rpool/testvol volsize 5G local
rpool/testvol refreservation 5.16G local
rpool/testvol usedbydataset 1.00G -
rpool/testvol usedbyrefreservation 5.16G -
rpool/testvol logicalused 1.25G -
rpool/testvol logicalreferenced 1.00G -
Here is an override of the reservation security example (you can
of course use some other value than zero, i.e. if you expect to
write no more than some amount of data while your zfs-send is
being done, you can cater for that by a limited reservation):
# zfs set refreservation=0g rpool/testvol
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 45.4G 13.2G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 1.25G 13.2G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
rpool/testvol@3 0 - 1.00G -
So, the pool "used" space has decreased, the space available for
other datasets has increased. Space available in the zvol remains,
though the "used" value now matches the sum of actual allocations
in the snapshots (and the live branch, which has zero new bytes).
# zfs get all rpool/testvol | grep G
rpool/testvol used 1.25G -
rpool/testvol available 13.2G -
rpool/testvol referenced 1.00G -
rpool/testvol volsize 5G local
rpool/testvol usedbydataset 1.00G -
rpool/testvol logicalused 1.25G -
rpool/testvol logicalreferenced 1.00G -
A new snapshot does not break your explicit requirements for lack
of reservations:
# zfs snapshot rpool/testvol@4
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 45.4G 13.2G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 1.25G 13.2G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
rpool/testvol@3 0 - 1.00G -
rpool/testvol@4 0 - 1.00G -
This stays valid if you add data and snapshots as well:
# dd if=/dev/zero bs=65536 count=4096 of=/dev/zvol/rdsk/rpool/testvol
4096+0 records in
4096+0 records out
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 45.6G 13.0G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 1.50G 13.0G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
rpool/testvol@3 0 - 1.00G -
rpool/testvol@4 0 - 1.00G -
# zfs snapshot rpool/testvol@5
# zfs list rpool ; zfs list -tall -r rpool/testvol
NAME USED AVAIL REFER MOUNTPOINT
rpool 45.6G 13.0G 50K /rpool
NAME USED AVAIL REFER MOUNTPOINT
rpool/testvol 1.50G 13.0G 1.00G -
rpool/testvol@0 15K - 16K -
rpool/testvol@1 17K - 513M -
rpool/testvol@2 17K - 1.00G -
rpool/testvol@3 0 - 1.00G -
rpool/testvol@4 0 - 1.00G -
rpool/testvol@5 0 - 1.00G -
root@summit-blade5:/tmp#
Hope this explains things,
//Jim Klimov
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com