Hi Adam, On Fri, Sep 27, 2013 at 08:58:10AM +0100, Dr A. J. Trickett wrote: > I've pretty much decided to get a flash drive as the root file system, my > preferred "bidder" are currently building with Intel 335 drives. I'm not sure > exactly what combination and mix to go for. > > I don't think the 180 GB drive is large enough on it's own, so I could get a > pair of them and then LVM them together and put a single ext4 over the two.
I know you say later in the thread that you backup all your important stuff, but for me, the lost productivity involved in a disk failure is worth a lot more than the cost of the disk itself. The problem with using LVM to concatenate two drives together for more space is that you've doubled the chance of a failure. SSDs aren't particularly more reliable than conventional HDDs, and the HDD is usually the first thing to break. http://www.zdnet.com/ssd-infant-mortality-ii-7000003945/ http://www.tomshardware.com/reviews/ssd-reliability-failure-rate,2923-9.html It's all a few years old, and I've plenty of anecdotes of people who've "never seen an SSD failure", but personally I am not yet prepared to believe they are any more reliable. Nor any less. When a physical volume in LVM disappears, the physical extents that were on it are obviously no longer available. Depending on the LVM allocation policy in use probably some logical volumes would then have parts or their entirety missing. The system won't even let you activate a volume group that has physical volumes missing, although you can override this if you tell it that you really know what you are doing. That would allow you to still use the logical volumes that didn't have bits missing. You're in a bit of an awkward situation here because you want: - Tons of storage - Performance - Reliability and you haven't got the cash for all three. There isn't going to be a single correct answer, and there are a variety of trade-offs you can make depending on what your priorities are. I will try to think up what is bound to be an incomplete list. My own preferred answer though would be something like this: - Two SSDs in my desktop in a RAID-1, mass storage in a separate device with some sort of RAID configuration, and a decent backup regime. I don't consider that over the top in the home. HP Microservers are back on cash back offer and make great fairly low energy consumption networked storage devices. There's a bunch of other cheap dedicated NAS devices that are suitable for home and small office use as well. The advantages of having the mass storage in a separate device is that it makes it a lot easier to manage. If disks die you can replace them without downtime. If it's not performing well enough you can add disks. Even SSDs when that starts to make sense. You'll probably upgrade your desktop machines a lot more often than the file server, because the file server doesn't need much grunt. No need to keep redesigning how the storage will work with each desktop upgrade. But it's not for everyone, it's inevitably more complicated and expensive. So let's say the file server is a no-go. It's got to all be in the desktop. - Two SSDs and two HDDs in two separate RAID-1s Each SSD should be big enough for your OS and whatever other performance storage you feel you need. Potentially that could be quite small - your OS should easily fit in about 2G without you trying hard. I fit Debian wheezy on a 512M CF card in one of my devices without doing anything special, but admittedly it has only vi for an editor and I even removed the less command. ;-) Mirror the HDDs as well for redundancy (if using Linux software RAID consider RAID-10 for the HDDs - it works with only two devices and performs better than RAID-1. Stick to RAID-1 for the SSDs though because that RAID level supports TRIM/discard). If you can afford two small SSDs then I think you could try stretching to two SSDs plus two bigger HDDs, because HDDs are really cheap. Can't afford two SSDs? - One SSD, two HDDs There's really no excuse to not have two HDDs. Again put your OS and performance stuff on the SSD, put the "bulk" stuff on the HDD mirror. Even more important than normal to have good backups. If you have decided to have a desktop with some SSD storage and some HDD storage in it, regardless of whether they're mirrored or not it's now a case of working out where to put the data and how to make the smaller SSD storage speed up the larger HDD storage. Linux has a bunch of interesting options for caching slow storage with faster storage. - ZFS on Linux ZFS is going places in the Linux world. Ubuntu and Debian have packages for it now (though don't expect to be able to call up Mark Shuttleworth at 3am and ask him to assign some minions to fix it or anything, like). But at least no more downloading kernel source from a strange web site and having to build it yourself. ZFS supports the concept of tiered storage. You build your storage pool with a tier of fast stuff like SSDs, and a tier of slow stuff like HDDs, and it does the right thing. I am not a ZFS expert but tiered storage works like this: ZFS has the concept of "cache" devices and "log" devices. It calls cache devices L2ARC and it calls log devices ZIL (ZFS Intent Log). If you tell it that a device is an L2ARC device then it will copy hot storage extents into that device and consult it first in future when wanting to access them. So this is a fast read cache. It only speeds up read operations. Since it's only used for read operations you don't need to mirror it. In fact you cannot mirror it; if you add more than one then ZFS stripes them. If one dies then it reads the data from disk again and puts it on the other L2ARC devices. If all L2ARC devices are dead then it gets the data from disk every time, which works but is slower. If you tell it that a device is a ZIL device then writes go to the ZIL first, and are later asynchronously written to the slower tier. Point being that the ZIL device is going to be really quick, so it can tell the OS that this write has definitely hit persistent storage and you can be on your way now. You really should mirror ZIL devices. Otherwise if one goes pop then you lose all the data that was written to it that didn't yet make it onto the slower media. This might only be a few megabytes but that's enough to ruin your file system and your day. So, you can mirror those. If you have multiple SSDs then you can partition each one into two bits, the larger partition being for L2ARC and the smaller one being for ZIL. You then end up with two L2ARC devices and one mirrored ZIL device. If you'd be willing to investigate this you would actually end up with just a single logical device and it would be really simple. ZFS itself has many years of testing, but it's a comparative newcomer on Linux and integration with your favourite distribution may not be completely there. That means, for example, that you might encounter difficulties installing, doing upgrades and other areas where the operating system config needs to know about partition layout. Probably nothing that can't be worked around by some time on the command line, but maybe you are not comfortable with that. Also there's unlikely to be much third party support for times of trouble, just community support (from the ZFS on Linux community). - bcache http://bcache.evilpiepirate.org/ flashcache https://github.com/facebook/flashcache/ enhanceio https://github.com/stec-inc/EnhanceIO dm-cache https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/device-mapper/cache.txt These are Linux kernel answers to L2ARC and ZIL. They cache one set of devices with another set of devices. They are all pretty new. bcache is a Google project, Flashcache is a Facebook project, EnhanceIO is a fork of Flashcache that is trying to get more community support, dm-cache is more closely tied to the existing device-mapper and LVM projects. They have different levels of readiness. I think bcache got included in the upstream 3.11 kernel. dm-cache maybe since 3.9. The others maybe not yet. In any case I personally would regard these as highly experimental and wouldn't use them except for if I could stand to see it break unexpectedly and lose all data at any time. Since it's a layer in front of *all your storage* the stakes are kind of high. You can take a slightly safer (and slower) path with all of them by using them as a read cache only (like L2ARC), but still software bugs could equal corruption being written back. - mdadm write-mostly http://linux.die.net/man/4/md search for "write-mostly" "write-mostly" is a feature of Linux software RAID-1 where you tell it that some devices are unsuitable for reads. Writes still go to both. Let's say that you could not afford two SSDs. You have just one SSD and a pair of HDDs. Example: /dev/sda - 180G SSD /dev/sdb - 2T HDD /dev/sdc - 2T HDD Partition sdb and sdc into two bits, one of 180G and the other the rest. Make a RAID-10 of sdb1 and sdc1, call it md0. Make a RAID-1 on top of *that* of sda and md0, call it md1. Tell it that its component md0 is write-mostly. Make a RAID-10 of sdb2 and sdc2, call it md2. You end up with this: /dev/md1 - 180G RAID-1 /dev/md2 - ~1,820G RAID-10 In theory you have all the speedy read advantages of an SSD, but you've mirrored it onto an HDD so you can continue working even if your SSD goes pop. It is of course a lot more complicated. People report some success with this strategy: http://marc.info/?l=linux-raid&m=126496930530289&w=2 Of course it's only going to be caching reads. You may want to reserve some SSD space outside of the md array to use for guaranteed fast write space. Accepting that it won't be redundant in the face of failure. - LVM LVM's great when you're not entirely sure what your needs will be or if they change often. I earlier cautioned against using LVM to concatenate a bunch of drives without redundancy, but you can use it in other ways. Once you've ended up with a block device that is fast and a block device that is slow, you can use them both as LVM physical devices and put them both into the same volume group. You then create a logical volume for each kind of data you store, e.g. one for VM images, one for photos, and so on. You can specify the allocation policy of each LV in order to tell it where to place the physical extents - you make sure that the extents you need to perform well are placed on the fast PV. Best of all, if you get it wrong or change your mind or your needs change, you can move the extents on a live system without unmounting anything. If you only had the one SSD, you could combine this with the "write-mostly" above. In the above example md1 was the fast So in summary, given the constraint of having to fit all the storage into the desktop, this is how I personally would approach this: I'd buy two SSDs and two HDDs. If I couldn't afford that, I'd buy one SSD and two HDDs. If I felt brave enough for ZFS then I'd do that, because it makes tiered storage fairly simple. Otherwise I'd go the route of LVM in order to balance the differing needs of the data across the different types of storage, whilst still retaining redundancy. I'd use md "write-mostly" underneath the LVM if I only had one SSD. Every Linux distribution should have an installer that supports software RAID (might have to use "alternate" ISO on Ubuntu), so just do that and set the write-mostly afterwards, once it's booted, if necessary. No matter what scheme I ended up with I'd probably make some effort to keep /boot outside of all the complicated stuff, because: - that really simplifies booting - /boot is quite small anyway (512M should be ample) - Aside from kernel upgrades, /boot isn't read or written after boot, so it doesn't matter if it's on slow media. RAID-1 of HDDs. Cheers, Andy -- http://bitfolk.com/ -- No-nonsense VPS hosting -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --------------------------------------------------------------