[gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Mick
I'm starting a new thread so as to not hijack the one about alternative 
kernels, but continue with something Volker raised.

On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:

 as others have written already: ssd.
 
 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup run.
 And that is first hand experience.

I haven't yet started using SSD and have wondered what sort of a system should 
I set up to guard against such instantaneous catastrophic failures.  I am 
interested to hear what strategies people deploy to avoid data loss with SSDs, 
especially on laptops that don't have the luxury of raid redundancy.

With spinning drives I use tar and rsync at regular intervals.  There have 
been a few rare cases where a drive failed without prior notice - the last one 
after a reboot.  In such cases I am prepared to live with the risk of some 
data loss, on machines where raid is not an option.

-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Alec Ten Harmsel

On 10/27/2014 05:24 AM, Mick wrote:
 I'm starting a new thread so as to not hijack the one about alternative 
 kernels, but continue with something Volker raised.

 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:

 as others have written already: ssd.

 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup run.
 And that is first hand experience.
 I haven't yet started using SSD and have wondered what sort of a system 
 should 
 I set up to guard against such instantaneous catastrophic failures.  I am 
 interested to hear what strategies people deploy to avoid data loss with 
 SSDs, 
 especially on laptops that don't have the luxury of raid redundancy.

All the data I have on my laptop is either:

* Version Controlled
* Rsync'd from a server
* Not important

My laptop doesn't have an SSD, but it's old and probably about ready to
die in general. All of my documents are version controlled - git - and
therefore automatically backed up. I rsync other files around, like my
music and some software, so that's all backed up as well.


 With spinning drives I use tar and rsync at regular intervals.  There have 
 been a few rare cases where a drive failed without prior notice - the last 
 one 
 after a reboot.  In such cases I am prepared to live with the risk of some 
 data loss, on machines where raid is not an option.

afaik tar and rsync should continue to work for SSDs. The more places
the data is in, the better. If you regularly rsync text (say /etc), I
would consider version control.

Alec




Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Alan McKinnon
On 27/10/2014 11:24, Mick wrote:
 I'm starting a new thread so as to not hijack the one about alternative 
 kernels, but continue with something Volker raised.
 
 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:
 
 as others have written already: ssd.

 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup run.
 And that is first hand experience.
 
 I haven't yet started using SSD and have wondered what sort of a system 
 should 
 I set up to guard against such instantaneous catastrophic failures.  I am 
 interested to hear what strategies people deploy to avoid data loss with 
 SSDs, 
 especially on laptops that don't have the luxury of raid redundancy.
 
 With spinning drives I use tar and rsync at regular intervals.  There have 
 been a few rare cases where a drive failed without prior notice - the last 
 one 
 after a reboot.  In such cases I am prepared to live with the risk of some 
 data loss, on machines where raid is not an option.
 


Without some form of redundancy that would be your best strategy -
decent and frequent backups

-- 
Alan McKinnon
alan.mckin...@gmail.com




Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Philip Webb
141027 Mick wrote:
 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:
 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 I haven't yet started using SSD and wonder what sort of a system
 should I set up to guard against such instantaneous catastrophic failures.

My desktop machine is now  2 yrs old   has an SSD for immediate stuff ;
it also has an HHD with  500 GB  available.
Files I'm working on are backed daily to HHD  to USB stick ;
once/week I use a script in Krusader to copy everything in ~ or /etc
which has changed during that week onto HHD  USB
 also to an off-site system I have I/net access to ;
I also run a simple Bash script to rsync system files to an HHD copy.
Every  9 mth  I do a full back-up of everything in ~ or /etc ,
of which I keep at least  1  copy off-site  another on CD (in future DVD)
for when the Earth's poles change  everything magnetic is wiped out (grin).

 2  belts +  3  braces : HTH.

-- 
,,
SUPPORT ___//___,   Philip Webb
ELECTRIC   /] [] [] [] [] []|   Cities Centre, University of Toronto
TRANSIT`-O--O---'   purslowatchassdotutorontodotca




Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 7:11 AM, Alan McKinnon alan.mckin...@gmail.com wrote:
 On 27/10/2014 11:24, Mick wrote:
 I'm starting a new thread so as to not hijack the one about alternative
 kernels, but continue with something Volker raised.

 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:

 as others have written already: ssd.

 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup run.
 And that is first hand experience.

 I haven't yet started using SSD and have wondered what sort of a system 
 should
 I set up to guard against such instantaneous catastrophic failures.  I am
 interested to hear what strategies people deploy to avoid data loss with 
 SSDs,
 especially on laptops that don't have the luxury of raid redundancy.

 With spinning drives I use tar and rsync at regular intervals.  There have
 been a few rare cases where a drive failed without prior notice - the last 
 one
 after a reboot.  In such cases I am prepared to live with the risk of some
 data loss, on machines where raid is not an option.


 Without some form of redundancy that would be your best strategy -
 decent and frequent backups


It isn't the most mature solution, but btrfs send has a lot of
potential here.  One of the main costs of backups is the need to walk
all the data that you intend to backup to find changes.  Rsync can do
wonders with minimizing bandwidth, and something like duplicity which
uses librsync can do wonders to minimize the size of serializing that
in files, but both require reading the entire filesystem.

Btrfs send can serialize a set of changes in the filesystem by reading
only the btree nodes and extents that have changed.  It is fairly
close to a git pull in that sense, though git doesn't use balanced
trees.  That would greatly reduce the IO cost of frequent backups.
You would just periodically create a new snapshot, do a send between
the last snapshot and the new one, and once you've confirmed
successful completion of that you'd delete the old snapshot.

Of course, IO seeks aren't nearly as expensive on an SSD as they are
on a hard drive.  I haven't really done a lot of rsync on ssds while
using them so I can't really vouch for how much the IO impacts
operations.

But yes, backup and RAID are really your only options for SSD failure
as far as I can see it.  That and limiting the amount of data that
can't be re-generated.  If you just save the world file and all of
/etc you could probably rebuild a Gentoo install fairly quickly on a
new drive, and then you're just left with /home and whatever else you
happen to have installed that sticks stuff in /var that you care
about.

--
Rich



Re: [gentoo-user] Safeguarding strategies against SSD data loss : PS

2014-10-27 Thread Philip Webb
141027 Philip Webb wrote:
 ... HHD ... (4 times)

HDD (red face)

-- 
,,
SUPPORT ___//___,   Philip Webb
ELECTRIC   /] [] [] [] [] []|   Cities Centre, University of Toronto
TRANSIT`-O--O---'   purslowatchassdotutorontodotca




Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Mick
On Monday 27 Oct 2014 12:27:58 Philip Webb wrote:
 141027 Mick wrote:
  On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:
  With a caveat: if an ssd dies, it will die suddenly. Without a warning.
  
  I haven't yet started using SSD and wonder what sort of a system
  should I set up to guard against such instantaneous catastrophic
  failures.
 
 My desktop machine is now  2 yrs old   has an SSD for immediate stuff ;
 it also has an HHD with  500 GB  available.
 Files I'm working on are backed daily to HHD  to USB stick ;
 once/week I use a script in Krusader to copy everything in ~ or /etc
 which has changed during that week onto HHD  USB
  also to an off-site system I have I/net access to ;
 I also run a simple Bash script to rsync system files to an HHD copy.
 Every  9 mth  I do a full back-up of everything in ~ or /etc ,
 of which I keep at least  1  copy off-site  another on CD (in future DVD)
 for when the Earth's poles change  everything magnetic is wiped out
 (grin).
 
  2  belts +  3  braces : HTH.

I have never been that diligent with my back ups, unless we're talking about 
critical data which I will save in two separate systems/media from the start.  
Otherwise, I run a back up of the whole system every 3 months or so.

Perhaps with an SSD I will do this once a month.

-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Mick
On Monday 27 Oct 2014 13:13:00 Rich Freeman wrote:
 On Mon, Oct 27, 2014 at 7:11 AM, Alan McKinnon alan.mckin...@gmail.com 
wrote:
  On 27/10/2014 11:24, Mick wrote:
  I'm starting a new thread so as to not hijack the one about alternative
  kernels, but continue with something Volker raised.
  
  On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:
  as others have written already: ssd.
  
  With a caveat: if an ssd dies, it will die suddenly. Without a warning.
  Usually 5 minutes before the start of your weekly or monthly backup
  run. And that is first hand experience.
  
  I haven't yet started using SSD and have wondered what sort of a system
  should I set up to guard against such instantaneous catastrophic
  failures.  I am interested to hear what strategies people deploy to
  avoid data loss with SSDs, especially on laptops that don't have the
  luxury of raid redundancy.
  
  With spinning drives I use tar and rsync at regular intervals.  There
  have been a few rare cases where a drive failed without prior notice -
  the last one after a reboot.  In such cases I am prepared to live with
  the risk of some data loss, on machines where raid is not an option.
  
  Without some form of redundancy that would be your best strategy -
  decent and frequent backups
 
 It isn't the most mature solution, but btrfs send has a lot of
 potential here.  One of the main costs of backups is the need to walk
 all the data that you intend to backup to find changes.  Rsync can do
 wonders with minimizing bandwidth, and something like duplicity which
 uses librsync can do wonders to minimize the size of serializing that
 in files, but both require reading the entire filesystem.
 
 Btrfs send can serialize a set of changes in the filesystem by reading
 only the btree nodes and extents that have changed.  It is fairly
 close to a git pull in that sense, though git doesn't use balanced
 trees.  That would greatly reduce the IO cost of frequent backups.
 You would just periodically create a new snapshot, do a send between
 the last snapshot and the new one, and once you've confirmed
 successful completion of that you'd delete the old snapshot.
 
 Of course, IO seeks aren't nearly as expensive on an SSD as they are
 on a hard drive.  I haven't really done a lot of rsync on ssds while
 using them so I can't really vouch for how much the IO impacts
 operations.
 
 But yes, backup and RAID are really your only options for SSD failure
 as far as I can see it.  That and limiting the amount of data that
 can't be re-generated.  If you just save the world file and all of
 /etc you could probably rebuild a Gentoo install fairly quickly on a
 new drive, and then you're just left with /home and whatever else you
 happen to have installed that sticks stuff in /var that you care
 about.


Thanks Rich, I have been reading your posts about btrfs with interest, but 
have not yet used it on my systems.  Is btrfs agreeable with SSDs, or should I 
be using f2fs:

 http://www.phoronix.com/scan.php?page=articleitem=linux_314_ssdfsnum=1

-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 11:22 AM, Mick michaelkintz...@gmail.com wrote:

 Thanks Rich, I have been reading your posts about btrfs with interest, but
 have not yet used it on my systems.  Is btrfs agreeable with SSDs, or should I
 be using f2fs:


Btrfs will auto-detect SSDs and optimize itself differently, and is
generally considered to be fine on SSDs.  Of course, btrfs itself is
experimental and may eat your data, especially if you get it too full,
but you'll be no worse off for running it on an SSD.

I doubt you'll find any general-purpose filesystem that works as well
overall on an SSD as something like f2fs as this is log-based and
designed with SSDs in mind.  However, f2fs is also very immature and
also carries risks, and the last time I checked it was missing some
features like xattrs as well.  It also doesn't have anything like
btrfs send to serialize your data.

zfs on linux might be another option.  I don't know how well it
handles SSDs in general, and you have to fuss with FUSE and a boot
partition as I don't think grub supports it - it could be a bit of a
PITA for a single-drive system.  However, it is probably more mature
than btrfs overall, and it certainly supports send.

I just had a btrfs near-miss which caused me to rethink how I'm
managing my own storage.  I was half-tempted to blog on it - it is a
bit frustrating as I believe we're right in the middle of the shift
between the traditional filesystems and the next-generation ones.
Sticking with the old means giving up a lot of potential benefits, but
there are a lot of issues with jumping ship as well as the new systems
all lack maturity or are not feature-complete yet.  I was looking at
f2fs, btrfs, and zfs again this weekend and the issues I struggle with
are the immaturity of btrfs and f2fs, the lack of working parity raid
on btrfs, the lack of many features on f2fs, and the inability to
resize vdevs on zfs which means on a system with few drives you get
locked in.  I suspect all of those will change in time, but not yet!

--
Rich



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Pandu Poluan
On Oct 27, 2014 10:40 PM, Rich Freeman ri...@gentoo.org wrote:

 On Mon, Oct 27, 2014 at 11:22 AM, Mick michaelkintz...@gmail.com wrote:
 
  Thanks Rich, I have been reading your posts about btrfs with interest,
but
  have not yet used it on my systems.  Is btrfs agreeable with SSDs, or
should I
  be using f2fs:
 

 Btrfs will auto-detect SSDs and optimize itself differently, and is
 generally considered to be fine on SSDs.  Of course, btrfs itself is
 experimental and may eat your data, especially if you get it too full,
 but you'll be no worse off for running it on an SSD.

 I doubt you'll find any general-purpose filesystem that works as well
 overall on an SSD as something like f2fs as this is log-based and
 designed with SSDs in mind.  However, f2fs is also very immature and
 also carries risks, and the last time I checked it was missing some
 features like xattrs as well.  It also doesn't have anything like
 btrfs send to serialize your data.

 zfs on linux might be another option.  I don't know how well it
 handles SSDs in general, and you have to fuss with FUSE and a boot
 partition as I don't think grub supports it - it could be a bit of a
 PITA for a single-drive system.  However, it is probably more mature
 than btrfs overall, and it certainly supports send.

 I just had a btrfs near-miss which caused me to rethink how I'm
 managing my own storage.  I was half-tempted to blog on it - it is a
 bit frustrating as I believe we're right in the middle of the shift
 between the traditional filesystems and the next-generation ones.
 Sticking with the old means giving up a lot of potential benefits, but
 there are a lot of issues with jumping ship as well as the new systems
 all lack maturity or are not feature-complete yet.  I was looking at
 f2fs, btrfs, and zfs again this weekend and the issues I struggle with
 are the immaturity of btrfs and f2fs, the lack of working parity raid
 on btrfs, the lack of many features on f2fs, and the inability to
 resize vdevs on zfs which means on a system with few drives you get
 locked in.  I suspect all of those will change in time, but not yet!

 --
 Rich


ZoL (ZFS on Linux) nowadays is implemented using DKMS instead of FUSE, thus
running in kernelspace, and (relatively) easier to put into an initramfs.

Updating is a beeyotch on binary-based distros as it requires a recompile.
Not a big deal for us Gentooers :-)

vdevs can grow, but they can't (yet) shrink. And putting ZFS on SSDs... not
recommended. Rather, ZFS can employ SSDs to act as a 'write cache' for the
spinning HDDs.

In my personal opinion, the 'killer' feature of ZFS is that it's built from
the ground up to provide maximum data integrity. The second feature is its
high performance COW snapshot ability. You can do an obscene amount of
snapshots if you want (but don't actually do it; managing more than a
hundred snapshots is a Royal PITA). And it's also able to serialize the
snapshots, allowing perfect delta  replication to another system. This
saves a lot of time doing bit-perfect backup because only changed blocks
will be transferred. And you can ship a snapshot instead of the whole
filesystem, allowing online backup.

(And yes, actually deployed ZoL on my previous employer's email system,
with the aforementioned snapshot-shipping backup strategy).

Other features include: Much easier mounting (no need to mess with fstab),
built-in NFS support for higher throughput, and ability to easily rebuild a
pool merely by installing the drives (in any order) into a new box and let
ZFS scan for all the metadata.

The most serious drawback in my opinion is ZoL's nearly insatiable appetite
for RAM. Unless you purposefully limit its RAM usage, ZoL's cache will
consume nearly all available memory, causing memory fragmentation and
ending with OOM.

Rgds,
--


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Volker Armin Hemmann
Am 27.10.2014 um 14:13 schrieb Rich Freeman:
 On Mon, Oct 27, 2014 at 7:11 AM, Alan McKinnon alan.mckin...@gmail.com 
 wrote:
 On 27/10/2014 11:24, Mick wrote:
 I'm starting a new thread so as to not hijack the one about alternative
 kernels, but continue with something Volker raised.

 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:

 as others have written already: ssd.

 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup run.
 And that is first hand experience.
 I haven't yet started using SSD and have wondered what sort of a system 
 should
 I set up to guard against such instantaneous catastrophic failures.  I am
 interested to hear what strategies people deploy to avoid data loss with 
 SSDs,
 especially on laptops that don't have the luxury of raid redundancy.

 With spinning drives I use tar and rsync at regular intervals.  There have
 been a few rare cases where a drive failed without prior notice - the last 
 one
 after a reboot.  In such cases I am prepared to live with the risk of some
 data loss, on machines where raid is not an option.

 Without some form of redundancy that would be your best strategy -
 decent and frequent backups

 It isn't the most mature solution, but btrfs send has a lot of
 potential here.  One of the main costs of backups is the need to walk
 all the data that you intend to backup to find changes.  Rsync can do
 wonders with minimizing bandwidth, and something like duplicity which
 uses librsync can do wonders to minimize the size of serializing that
 in files, but both require reading the entire filesystem.

 Btrfs send can serialize a set of changes in the filesystem by reading
 only the btree nodes and extents that have changed.  It is fairly
 close to a git pull in that sense, though git doesn't use balanced
 trees.  That would greatly reduce the IO cost of frequent backups.
 You would just periodically create a new snapshot, do a send between
 the last snapshot and the new one, and once you've confirmed
 successful completion of that you'd delete the old snapshot.

 Of course, IO seeks aren't nearly as expensive on an SSD as they are
 on a hard drive.  I haven't really done a lot of rsync on ssds while
 using them so I can't really vouch for how much the IO impacts
 operations.

 But yes, backup and RAID are really your only options for SSD failure
 as far as I can see it.  That and limiting the amount of data that
 can't be re-generated.  If you just save the world file and all of
 /etc you could probably rebuild a Gentoo install fairly quickly on a
 new drive, and then you're just left with /home and whatever else you
 happen to have installed that sticks stuff in /var that you care
 about.

 --
 Rich

 .


what happens if that send stream becomes corrupted?



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Volker Armin Hemmann
Am 27.10.2014 um 16:22 schrieb Mick:
 On Monday 27 Oct 2014 13:13:00 Rich Freeman wrote:
 On Mon, Oct 27, 2014 at 7:11 AM, Alan McKinnon alan.mckin...@gmail.com 
 wrote:
 On 27/10/2014 11:24, Mick wrote:
 I'm starting a new thread so as to not hijack the one about alternative
 kernels, but continue with something Volker raised.

 On Sunday 26 Oct 2014 23:25:50 Volker Armin Hemmann wrote:
 as others have written already: ssd.

 With a caveat: if an ssd dies, it will die suddenly. Without a warning.
 Usually 5 minutes before the start of your weekly or monthly backup
 run. And that is first hand experience.
 I haven't yet started using SSD and have wondered what sort of a system
 should I set up to guard against such instantaneous catastrophic
 failures.  I am interested to hear what strategies people deploy to
 avoid data loss with SSDs, especially on laptops that don't have the
 luxury of raid redundancy.

 With spinning drives I use tar and rsync at regular intervals.  There
 have been a few rare cases where a drive failed without prior notice -
 the last one after a reboot.  In such cases I am prepared to live with
 the risk of some data loss, on machines where raid is not an option.
 Without some form of redundancy that would be your best strategy -
 decent and frequent backups
 It isn't the most mature solution, but btrfs send has a lot of
 potential here.  One of the main costs of backups is the need to walk
 all the data that you intend to backup to find changes.  Rsync can do
 wonders with minimizing bandwidth, and something like duplicity which
 uses librsync can do wonders to minimize the size of serializing that
 in files, but both require reading the entire filesystem.

 Btrfs send can serialize a set of changes in the filesystem by reading
 only the btree nodes and extents that have changed.  It is fairly
 close to a git pull in that sense, though git doesn't use balanced
 trees.  That would greatly reduce the IO cost of frequent backups.
 You would just periodically create a new snapshot, do a send between
 the last snapshot and the new one, and once you've confirmed
 successful completion of that you'd delete the old snapshot.

 Of course, IO seeks aren't nearly as expensive on an SSD as they are
 on a hard drive.  I haven't really done a lot of rsync on ssds while
 using them so I can't really vouch for how much the IO impacts
 operations.

 But yes, backup and RAID are really your only options for SSD failure
 as far as I can see it.  That and limiting the amount of data that
 can't be re-generated.  If you just save the world file and all of
 /etc you could probably rebuild a Gentoo install fairly quickly on a
 new drive, and then you're just left with /home and whatever else you
 happen to have installed that sticks stuff in /var that you care
 about.

 Thanks Rich, I have been reading your posts about btrfs with interest, but 
 have not yet used it on my systems.  Is btrfs agreeable with SSDs, or should 
 I 
 be using f2fs:

  http://www.phoronix.com/scan.php?page=articleitem=linux_314_ssdfsnum=1


neither. Just use ext4.

You don't want to combine the sensitive nature of a ssd with the
youthful playfulness of a young filesystem.

Also, I am a little bit concerned about that 'freshly formatted' at the
start of the article.



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Volker Armin Hemmann
Am 27.10.2014 um 16:36 schrieb Rich Freeman:
 On Mon, Oct 27, 2014 at 11:22 AM, Mick michaelkintz...@gmail.com wrote:
 Thanks Rich, I have been reading your posts about btrfs with interest, but
 have not yet used it on my systems.  Is btrfs agreeable with SSDs, or should 
 I
 be using f2fs:

 Btrfs will auto-detect SSDs and optimize itself differently, and is
 generally considered to be fine on SSDs.  Of course, btrfs itself is
 experimental and may eat your data, especially if you get it too full,
 but you'll be no worse off for running it on an SSD.

 I doubt you'll find any general-purpose filesystem that works as well
 overall on an SSD as something like f2fs as this is log-based and
 designed with SSDs in mind.  However, f2fs is also very immature and
 also carries risks, and the last time I checked it was missing some
 features like xattrs as well.  It also doesn't have anything like
 btrfs send to serialize your data.

 zfs on linux might be another option.  I don't know how well it
 handles SSDs in general, and you have to fuss with FUSE

no, you don't.
  and a boot
 partition as I don't think grub supports it - it could be a bit of a
 PITA for a single-drive system. 

nope. But I don't see any reason to use zfs with a single drive either.

  However, it is probably more mature
 than btrfs overall, and it certainly supports send.

and if your send stream is corrupted, your data is gone. That is why I
prefer cptar to backup my zfs data tank.




Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Volker Armin Hemmann
Am 27.10.2014 um 17:52 schrieb Pandu Poluan:


 On Oct 27, 2014 10:40 PM, Rich Freeman ri...@gentoo.org
 mailto:ri...@gentoo.org wrote:
 
  On Mon, Oct 27, 2014 at 11:22 AM, Mick michaelkintz...@gmail.com
 mailto:michaelkintz...@gmail.com wrote:
  
   Thanks Rich, I have been reading your posts about btrfs with
 interest, but
   have not yet used it on my systems.  Is btrfs agreeable with SSDs,
 or should I
   be using f2fs:
  
 
  Btrfs will auto-detect SSDs and optimize itself differently, and is
  generally considered to be fine on SSDs.  Of course, btrfs itself is
  experimental and may eat your data, especially if you get it too full,
  but you'll be no worse off for running it on an SSD.
 
  I doubt you'll find any general-purpose filesystem that works as well
  overall on an SSD as something like f2fs as this is log-based and
  designed with SSDs in mind.  However, f2fs is also very immature and
  also carries risks, and the last time I checked it was missing some
  features like xattrs as well.  It also doesn't have anything like
  btrfs send to serialize your data.
 
  zfs on linux might be another option.  I don't know how well it
  handles SSDs in general, and you have to fuss with FUSE and a boot
  partition as I don't think grub supports it - it could be a bit of a
  PITA for a single-drive system.  However, it is probably more mature
  than btrfs overall, and it certainly supports send.
 
  I just had a btrfs near-miss which caused me to rethink how I'm
  managing my own storage.  I was half-tempted to blog on it - it is a
  bit frustrating as I believe we're right in the middle of the shift
  between the traditional filesystems and the next-generation ones.
  Sticking with the old means giving up a lot of potential benefits, but
  there are a lot of issues with jumping ship as well as the new systems
  all lack maturity or are not feature-complete yet.  I was looking at
  f2fs, btrfs, and zfs again this weekend and the issues I struggle with
  are the immaturity of btrfs and f2fs, the lack of working parity raid
  on btrfs, the lack of many features on f2fs, and the inability to
  resize vdevs on zfs which means on a system with few drives you get
  locked in.  I suspect all of those will change in time, but not yet!
 
  --
  Rich
 

 ZoL (ZFS on Linux) nowadays is implemented using DKMS instead of FUSE,
 thus running in kernelspace, and (relatively) easier to put into an
 initramfs.

 Updating is a beeyotch on binary-based distros as it requires a
 recompile. Not a big deal for us Gentooers :-)

 vdevs can grow, but they can't (yet) shrink. And putting ZFS on
 SSDs... not recommended. Rather, ZFS can employ SSDs to act as a
 'write cache' for the spinning HDDs.

 In my personal opinion, the 'killer' feature of ZFS is that it's built
 from the ground up to provide maximum data integrity. The second
 feature is its high performance COW snapshot ability. You can do an
 obscene amount of snapshots if you want (but don't actually do it;
 managing more than a hundred snapshots is a Royal PITA). And it's also
 able to serialize the snapshots, allowing perfect delta  replication
 to another system. This saves a lot of time doing bit-perfect backup
 because only changed blocks will be transferred. And you can ship a
 snapshot instead of the whole filesystem, allowing online backup.

 (And yes, actually deployed ZoL on my previous employer's email
 system, with the aforementioned snapshot-shipping backup strategy).

 Other features include: Much easier mounting (no need to mess with
 fstab), built-in NFS support for higher throughput, and ability to
 easily rebuild a pool merely by installing the drives (in any order)
 into a new box and let ZFS scan for all the metadata.

 The most serious drawback in my opinion is ZoL's nearly insatiable
 appetite for RAM. Unless you purposefully limit its RAM usage, ZoL's
 cache will consume nearly all available memory, causing memory
 fragmentation and ending with OOM.

 Rgds,
 --


I haven't run into oom situations caused by zfs.

Unlike oom's caused by konqueror, chromium or gcc...


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 12:52 PM, Pandu Poluan pa...@poluan.info wrote:

 ZoL (ZFS on Linux) nowadays is implemented using DKMS instead of FUSE, thus
 running in kernelspace, and (relatively) easier to put into an initramfs.

Sorry about that.  I should have known that, but for some reason I got
that memory crossed in my brain...  :)

 vdevs can grow, but they can't (yet) shrink.

Can you point to any docs on that, including any limitations/etc?  The
inability to expand raid-z the way you can do so with mdadm was one of
the big things that has been keeping me away from zfs.  I understand
that it isn't so important when you're dealing with large numbers of
disks (backblaze's storage pods come to mind), but when you have only
a few disks being able to manipulate them one at a time is very
useful.  Growing is the more likely use case than shrinking.  Then
again, at some point if you want to replace smaller drives with larger
ones you might want a way to remove drives from a vdev.

The one thing that btrfs does that is helpful here is that it works
with data in chunks and not at the whole drive level.  That is,
block 1 on drive A is not hard-mapped to block 1 on drive B in the way
that it is with a traditional RAID.  That makes it easy to have a
non-redundant set of disks and then switch it to raid1 mode while
leaving the existing data unmirrored - new chunks get mirrored, and
old ones don't, and you can run a command telling the system to copy
all the old data into new mirrored chunks.

 And putting ZFS on SSDs... not recommended. Rather, ZFS can employ
 SSDs to act as a 'write cache' for the spinning HDDs.

It can operate as a read-cache as well, right?  I believe you'd need
separate drives/partitions for that.


 In my personal opinion, the 'killer' feature of ZFS is that it's built from
 the ground up to provide maximum data integrity.

That and the snapshots are actually common to both btrfs and ZFS.  The
main advantages of ZFS over btrfs is that the codebase is much more
stable (though ZoL is a newer port of it), and that it has more
enterprise-oriented features like ZIL/RAID-Z already implemented.
Btrfs has license advantages as far as linux is concerned (it can
actually go in the main kernel without a rewrite), and it is a bit
more flexible in design and is intended as a general-purpose
filesystem.

Both are definitely the future of file storage compared to ext4, but
they both have a lot of caveats today.

What I would love to see though is something more optimized for flash
like f2fs, but with the feature-completeness and integrity/snapshot
capabilities of btrfs/zfs.  A log-based filesystem is COW by its
nature, so you just need to add that stuff in.

--
Rich



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 1:23 PM, Volker Armin Hemmann
volkerar...@googlemail.com wrote:
 Am 27.10.2014 um 16:36 schrieb Rich Freeman:
  and a boot
 partition as I don't think grub supports it - it could be a bit of a
 PITA for a single-drive system.

 nope. But I don't see any reason to use zfs with a single drive either.

True, not needing to use FUSE does simplify things, but I don't
believe that grub supports zfs, so you would need a boot partition.
Granted, a newer laptop would need that for EFI anyway.


  However, it is probably more mature
 than btrfs overall, and it certainly supports send.

 and if your send stream is corrupted, your data is gone. That is why I
 prefer cptar to backup my zfs data tank.


If you ONLY save the send stream without checking it, then you're
right that you're depending on its integrity.  I'd certainly be
nervous about doing that with btrfs, probably less so with zfs but I
can't really vouch for it.  I don't know what ability either
filesystem gives you to verify a send stream in isolation.

Now, what you could do is receive the send stream into a replica
filesystem on the far end, and not consider the backup successful
until this is done.  That would look like a btrfs-to-btrfs rsync
operation, but it would be much more efficient in terms of IO.  It
would require a daemon on the far end to run the receive operation and
report back status, vs just dumping the files via scp, etc.

Does anybody know if either btrfs or zfs send includes checksums?  I
know the data is checksummed on disk, but I have no idea if it is
protected in this way while serialized.

--
Rich



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Pandu Poluan
On Oct 28, 2014 12:31 AM, Rich Freeman ri...@gentoo.org wrote:

 On Mon, Oct 27, 2014 at 12:52 PM, Pandu Poluan pa...@poluan.info wrote:
 
  ZoL (ZFS on Linux) nowadays is implemented using DKMS instead of FUSE,
thus
  running in kernelspace, and (relatively) easier to put into an
initramfs.

 Sorry about that.  I should have known that, but for some reason I got
 that memory crossed in my brain...  :)

  vdevs can grow, but they can't (yet) shrink.

 Can you point to any docs on that, including any limitations/etc?  The
 inability to expand raid-z the way you can do so with mdadm was one of
 the big things that has been keeping me away from zfs.  I understand
 that it isn't so important when you're dealing with large numbers of
 disks (backblaze's storage pods come to mind), but when you have only
 a few disks being able to manipulate them one at a time is very
 useful.  Growing is the more likely use case than shrinking.  Then
 again, at some point if you want to replace smaller drives with larger
 ones you might want a way to remove drives from a vdev.


First, you need to set your pool to autoexpand=on.

Then, one by one, you offline a disk within the vdev and replace it with a
larger one. After all disks have been replaced, do a scrub, and ZFS will
automagically enlarge the vdev.

If you're not using whole disks as ZFS, then s/replace with larger/enlarge
the partition/.

Rgds,
--


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Pandu Poluan
On Oct 28, 2014 12:38 AM, Rich Freeman ri...@gentoo.org wrote:

 On Mon, Oct 27, 2014 at 1:23 PM, Volker Armin Hemmann
 volkerar...@googlemail.com wrote:
  Am 27.10.2014 um 16:36 schrieb Rich Freeman:
   and a boot
  partition as I don't think grub supports it - it could be a bit of a
  PITA for a single-drive system.
 
  nope. But I don't see any reason to use zfs with a single drive either.

 True, not needing to use FUSE does simplify things, but I don't
 believe that grub supports zfs, so you would need a boot partition.
 Granted, a newer laptop would need that for EFI anyway.

 
   However, it is probably more mature
  than btrfs overall, and it certainly supports send.
 
  and if your send stream is corrupted, your data is gone. That is why I
  prefer cptar to backup my zfs data tank.
 

 If you ONLY save the send stream without checking it, then you're
 right that you're depending on its integrity.  I'd certainly be
 nervous about doing that with btrfs, probably less so with zfs but I
 can't really vouch for it.  I don't know what ability either
 filesystem gives you to verify a send stream in isolation.

 Now, what you could do is receive the send stream into a replica
 filesystem on the far end, and not consider the backup successful
 until this is done.  That would look like a btrfs-to-btrfs rsync
 operation, but it would be much more efficient in terms of IO.  It
 would require a daemon on the far end to run the receive operation and
 report back status, vs just dumping the files via scp, etc.

 Does anybody know if either btrfs or zfs send includes checksums?  I
 know the data is checksummed on disk, but I have no idea if it is
 protected in this way while serialized.


zfs has checksum for the send stream. That's why you can send the stream to
a file, and fail to import the file sometime later if something changes in
that file.

So, always do a filesystem replication. Don't just save the send stream.
Have the replica make the snapshots visible in poolroot/.zfs, and backup
the whole filesystem using a deduping backup system.

Rgds,
--


Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 8:41 PM, Pandu Poluan pa...@poluan.info wrote:
 First, you need to set your pool to autoexpand=on.

 Then, one by one, you offline a disk within the vdev and replace it with a
 larger one. After all disks have been replaced, do a scrub, and ZFS will
 automagically enlarge the vdev.

 If you're not using whole disks as ZFS, then s/replace with larger/enlarge
 the partition/.

How about adding an additional disk?  Will that work?

--
Rich



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Tom H
On Mon, Oct 27, 2014 at 12:52 PM, Pandu Poluan pa...@poluan.info wrote:

 ZoL (ZFS on Linux) nowadays is implemented using DKMS instead of FUSE, thus
 running in kernelspace, and (relatively) easier to put into an initramfs.

 Updating is a beeyotch on binary-based distros as it requires a recompile.
 Not a big deal for us Gentooers :-)

dkms works perfectly well on binary distros; it's a Dell/RH creation.
It runs make ... to build a kernel module and then installs it.



Re: [gentoo-user] Safeguarding strategies against SSD data loss

2014-10-27 Thread Tom H
On Mon, Oct 27, 2014 at 1:37 PM, Rich Freeman ri...@gentoo.org wrote:
 On Mon, Oct 27, 2014 at 1:23 PM, Volker Armin Hemmann
 volkerar...@googlemail.com wrote:
 Am 27.10.2014 um 16:36 schrieb Rich Freeman:

 and a boot
 partition as I don't think grub supports it - it could be a bit of a
 PITA for a single-drive system.

 nope. But I don't see any reason to use zfs with a single drive either.

 True, not needing to use FUSE does simplify things, but I don't
 believe that grub supports zfs, so you would need a boot partition.
 Granted, a newer laptop would need that for EFI anyway.

# ls /boot/grub/x86_64-efi/z*
/boot/grub/x86_64-efi/zfs.mod  /boot/grub/x86_64-efi/zfscrypt.mod
/boot/grub/x86_64-efi/zfsinfo.mod

so you only need /boot/efi to be a fat partition.