> On Oct 30, 2014, at 7:35 PM, Dave Warren <[email protected]> wrote: > > On 2014-10-30 17:15, Jim Thompson wrote: >>> On Oct 30, 2014, at 3:39 PM, Dave Warren <[email protected]> wrote: >>> Buy quality instead of junk? > <...> >>> Even a cheapo 30GB/60GB/whatever SSD is more than enough for pfSense and >>> makes a far more reliable solution than external flash. >> I strongly disagree. SSDs have to be part of a system, especially in an >> embedded environment. The debacle with the “cheap 30GB” m-sata drive from >> PC Engines earlier in the year (they had to take them all back) should amply >> demonstrate why thinking such as what you express here is deeply flawed. > > Sorry if I wasn't clear, I meant a cheapo SSD because it's small -- I'm > suggesting you don't need to invest in a large or fast SSD for pfSense, but > rather, cheap out on size, while getting a quality device built for lifespan > and reliability.
Understood, but even here your suggestion is out of date with respect to the current state-of-the-art. Assuming a decent wear-leveling implementation, larger drives will last longer for a given amount of data written. In the same way that, when flying an airplane you can trade altitude for glide, with modern SSDs, you can trade capacity for endurance. (It also matters *how* you write the data.) In the below, I’m quoting JEDEC-219 compliant numbers/stats. Here’s an equation you might want to think about. Total writes to the device <= (Max endurance cycles) * (total partition capacity) / (WAF) Where Maximum Endurance Cycles = the total number of program erase cycles each block in the NAND flash can withstand. For the current generation of MLC flash this is 3,000 Program-Erase Cycles. Write Amplification Factor (WAF) = is a result of wear leveling activity to some degree and the nature of writes to the flash. The actual nand flash is written in units of pages. For the current generation of flash, this page size is typically 16K Bytes. If the nature of writes are sequential within the 16K page, then the WAF should be low. However if this write information is not contiguous, or is interrupted by another write stream then the partial page will be programmed to the NAND flash. In general, random writes will contribute to higher WAF. Ideally we would want WAF to be 1. However, this is the real world, and we have seen this go as high as 20 in some applications with non-ideal writing behavior. (Very poorly behaved, always non-contiguous or interrupted write streams, e.g. logging or sql databases.) Example: Application that writes 100 MB of data to the device per day. 100 MB / day * 365 days / year = 36.5 GB / year Let’s assume a standard mode 4GB CF card/USB/… with perfect wear-leveling (LOL!): Best case: For WAF = 1, standard mode 30GB part: Total Writes = (3,000) * (4GB) / 1 = 12 TB With the above data this yields: 12,000 / 36.5 = 329 years Worst case: WAF = 20, standard mode 4GB part: Total Writes to reach endurance = (3,000) * (4GB) / 20 = 600 GB of data written will exceed endurance With the above data this yields: 600 / 36.5 = 16.4 years This is how a “commodity” flash/SSD vendor (or a shill^W “technology journalist”) will talk to you: “It will take more than 16.4 years to wear out the disk!” The reality is that with the 3000 program-erase cycles rating of today's underlying MLC cells, the 30GB part can support a "worst case" 600GB of data writes assuming very poorly behaved, always non-contiguous or interrupted write streams. "Best case" assuming purely contiguous writes would be 12TB. Actual worst case without effective wear-leveling (as was the case with CF cards and a lot of the early SSDs) would be 3,000 writes to a single 16K page. (Thus the “don’t swap to an SSD!” advice so often heard.) Do this, and “Boom!” the sector is dead (or will be quite soon.) If this was in a file that you needed (or worse, a filesystem metadata block), *poof* goes your data. Bummer, dude. This is *also* why SLC flash is often recommended for applications that require high write-endurance. SLC flash can endure approximately 10X the program-erase cycles of MLC flash in a given lithography. The direct result is that today you see a lot of people attempting to quote “TBW” (terabytes written) when talking about SSD / flash endurance, but even then they don’t talk about WAF very often. Once you start thinking about it, it’s not very difficult to figure out that it doesn’t take long to write 600GB on a very busy system, that does a lot of short writes due to logging, etc. Now go run the numbers yourself for a larger SSD (and you can assume wear-leveling). Double the size of the device, and you’ll double the TBW figure, assuming everything else stays the same. Larger density devices will yield correspondingly higher total write endurance since (QED!) they have more blocks of NAND in them. Here is the kicker: the eMMCs we’re using on the coming Netgate hardware (that yes, will be available at the pfSense store) have an “enhanced” or “pseudo-SLC” mode. This mode reduces device capacity by 50% (a 4GB device becomes a 2GB device), but increases the maximum endurance cycles 10X to 30,000, instead of 3,000, and the .6TBW becomes 6TBW *worst case*. With a bit of work to the system (such as some of what happens to reduce writes in the nano images), we can probably get the WAF into the 5-10 range, yielding an additional 2x-4x increase in TBW (so 12TB - 24TB) on a 4GB part that is in a “2GB” mode. Note that this gain is really “software at work”. Better/more sophisticated wear-leveling, combined with on-die RAM in the flash controller, and a whole suite of other enhancements, and that eMMC on your tablet or phone is way, way better than sticking a SD card in the side of it. Nevermind the steeper volume curve diminishes the cost of a fully-deployed solution more quickly. (Want to take a guess why nearly nobody offers tablets or phones with SD sockets in them now?) What to guess what the reliability difference is compared to loading the internal “bootable” USB2 socket on the Soekris 6801 (http://soekris.com/products/net6801.html) or the SD card socket on the APU or a CF card on the Lanner boxes / Alix? Yeah, it’s bad. Go ahead and run the nano/embedded image on these if you have one, because it’s the best solution available today. But if you’re still reading, now run all the math again for a 32GB part: 32GB eMMC endurance in MLC mode: 4.8TB worst case, 96TB best case 32GB eMMC endurance in pseudo-SLC mode (capacity reduced to 16GB): 24TB worst case, 480TB best case <— 480TB / 36.5GB/year = 13,150 years. oh wow. The “pseudo-SLC” mode isn’t the kind of thing you’re going to find on “consumer” parts. It takes a system vendor to design this kind of thing in and enable it. Now go back and look at the ZFS results that I posted over the summer, where I made the filesystem ZFS, and set “copies=2” and turned on lzs compression. Performance (especially read performance goes up, you have better redundancy, and software updates actually update both copies (something that *does not happen* with the nano-based images today.) You have a full filesystem underneath you, and the whole system behaves a whole lot better. set “copies=1” (or just never set it to anything), and you still have compression reducing the amount of data actually written to the device by ~~2X, and ZFS a far more resilient filesystem than UFS (because of ZFS’s copy-on-write semantics), and a filesystem that treats the SSD/flash/.. far better than UFS ever would. I could go on for 2-3 more pages about this, explaining that as the feature size of the utilized lithography goes down, the endurance also goes down (because the voltages held in adjacent cells are more likely to influence the read), and that as temperatures go up, read reliability goes down (thermal ‘noise’), but all you really need to know, frankly, is that the world has changed since 2004, that while “nanoBSD” is a good solution, it is no longer a great solution, and that your “buy cheap small SSDs” advice “sounds right”, but is, quite likely, wrong. Again, (because people love to take me out of context), as before, you’re more than welcome to go buy whatever on eBay or Amazon (…), the project is open source and freely available, but understand that “buying cheap" has potentially disastrous consequences, especially as things scale when people attempt to become vendors. The above should also somewhat explain “Why” we ship the C2758 with a 80GB Intel SSD (yes, one that has a power-fail cap) and the full install of our enhanced version of pfSense. … and we’ve never had one back for disk failure. — The Lizard Has Spoken _______________________________________________ List mailing list [email protected] https://lists.pfsense.org/mailman/listinfo/list
