Re: Biting the bullet - RAID

Craig Sanders via luv-main Tue, 22 May 2018 20:10:58 -0700

On Wed, May 23, 2018 at 01:08:22AM +1000, Craig Sanders wrote:
> BTW, this is a generally useful thing to know how to do: use half a raid
> (i.e. a "degraded" raid) to store your data while you're setting up its
> replacement.
>
> e.g. if your 2 TB array is currently mdadm RAID-1 and you want to convert it
> to ZFS, you could tell mdadm to fail one of the drives so it's a degraded
> RAID-1. Then use 'zpool create data2 /dev/disk/by-id/XXXXXXX' to create a
> ZFS pool using the artifically "failed" drive.  Then rsync /data to /data2
> with something like:


At minimum, that zpool create command should have used -o ashift=12.  e.g.

    zpool create -o ashift=12 -O compression=lz4 -o atime=off -o relatime=on 
data2 /dev/disk/by-id/XXXXXXX

The only one that really has to be specified at creation time is ashift=12,
any other attributes can be changed later with 'zfs set'.

>     rsync -avxHAXS -h -h --progress --stats --delete /data /data2

I forgot to say that it would be better to create any datasets you need
**before** you run the rsync. datasets are different filesystems, so moving
data from one dataset to another (or changing a subdirectory to a dataset)
causes mv to do a cp + rm operatioṅ....same as would be case when moving
files between separate drives or partitions or lvm logical volumes etc.

so, if you want separate datasets for, e.g., /data/video, /data/photos, or
whatever then create them before the rsync. Don't worry about setting quotas
or any other attributes at this point (except maybe disabling compression for
datasets that will mostly contain already-compressed files like videos or
photos), you can use 'zfs set' to set/change them later if needed.

e.g.

   zfs create -o compression=off data2/videos
   zfs create -o compression=off data2/photos




BTW, right at the end of section 3 in that step-by-step guide I posted a link
to, it says "An alternative to using debootstrap is to copy the entirety of a
working system into the new ZFS root."  That would be the time to rsync your
existing root fs over to the zfs root pool if you were doing that.


Also btw, that guide very briefly mentions deduplication. Just pretend that it
doesn't exist or that you never heard of it. De-duping is a nice idea (saves
space if you have multiple copies of the same data) but in practice it uses
far too much RAM to be worth doing.  It's a great way to minimuse use of cheap
disks ($60 per TB or less) by using lots of very expensive RAM ($15 per GB or
more).

A very rough rule of thumb is that de-duplication uses around 1GB of RAM per
TB of storage.  Definitely not worth it.  About the only good use case I've
seen for de-duping is a server with hundreds of GBs of RAM providing storage
for lots of mostly-duplicate clone VMs, like at an ISP or other hosting
provider.  It's only worthwile there because of the performance improvement
that comes from NOT having multiple copies of the same data-blocks (taking
more space in the ARC & L2ARC caches, and causing more seek time delays if
using spinning rust rather than SSDs).  Even then, it's debatable whether just
adding more disk would be better.


Compression's worth doing on most filesystems, though. lz4 is a very fast,
very low cpu usage algorithm, and (depending on what kind of data) on average
you'll probably get about 1/3rd to 1/2 reduction of space used by compressible
files.  e.g. some of the datasets on the machine I just built (called "hex"):

# zfs get compressratio hex hex/home hex/var/log hex/var/cache
NAME           PROPERTY       VALUE  SOURCE
hex            compressratio  1.88x  -
hex/home       compressratio  2.00x  -
hex/var/cache  compressratio  1.09x  -
hex/var/log    compressratio  4.44x  -

The first entry is the overall compression ratio for the entire pool.  1.88:1
ratio. So compression is currently saving me nearly half of my disk usage.
It's a new machine, so there's not much on it at the moment.

hex/var/cache includes /var/cache/apt/archives, which is where apt-get
downloads .deb packages to. They're already compressed, so that lowers the
overall compression ratio for that dataset.


I'd probably get even better compression on the logs (at least 6x, probably
more) if I set it to use gzip for that dataset with:

    zfs set compression=gzip hex/var/log

(note that won't re-compress existing data.  only new data will be compressed
with the new algorithm)

gzip takes much more CPU power than lz4, and generally isn't worth doing
unless you know that the data will be very compressible. Like plain text log
files.  There's a patch being tested for zfs to add a compression mode that
first tries lz4 and then, if it gets a good compression ratio, recompresses
it with gzip (on the grounds that anything that compresses well with lz4 will
compress even better with gzip).


> Finally, umount /data, use 'mdadm stop' to disable that raid device and
> 'zpool attach' to attach the ex-raid drive to the zpool.

The command to attach the mirror drive will be similar to:

    zpool attach /dev/.../current-drive-in-pool /dev/.../new-drive

craig

--
craig sanders <[email protected]>
_______________________________________________
luv-main mailing list
[email protected]
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main

Re: Biting the bullet - RAID

Reply via email to