25 мая 2014 г. 23:34:26 CEST, Richard Elling via illumos-discuss <discuss@lists.illumos.org> пишет: >On May 25, 2014, at 8:47 AM, Schweiss, Chip <c...@innovates.com> wrote: >> Reservations don't keep ZFS from eventually writing to every >assignable block on the SSD. >> > >Disagree. SPA allocations are first fit and biased towards >lower-numbered LBAs. This means that >for the vast majority of cases, there is large unallocated space of >significant magnitude. >> This would work if Illumos did trim. Until then slicing and using >less than the entire capacity will keep the performance of the SSD >high. >> > >You are assuming that LBAs are exposed by SSDs. They are not. SSDs use >their own COW file >systems and the garbage collection will readjust the layout such that >you can never be sure what >pages are recycled or when. This is especially important for those >cases that overwrite the same >blocks many times, notably FAT and NTFS. > >> Also using a shift 12 or 13 helps by causing less write >amplification. >> > >Agree. Some SSDs, even expensive enterprise-grade SSDs, do a poor job >of sub-4KB sector allocations. > -- richard > >> -Chip >> >> On May 25, 2014 8:21 AM, "Richard Elling via illumos-discuss" ><discuss@lists.illumos.org> wrote: >> >> On May 24, 2014, at 3:32 PM, Günther Alka via illumos-discuss ><discuss@lists.illumos.org> wrote: >> >>> SSDs are the future of high performance storage. >>> With most consumer SSDs overpovisioning is a common way to keep >write performance high. >>> >>> On Linux you can use hdparm to create a host protected area with the >main advantage that you do not need to struggle with partitions or >slices - just use the whole disk as usual. >>> >>> read >>> http://www.thomas-krenn.com/de/wiki/SSD_Over-Provisioning_mit_hdparm >>> >>> >>> Has anyone compiled hdparm for Illumos or know about a Solaris tool >to create a host protected area? >> >> But why bother overprovisioning with such a low-level tool? We use >ZFS, if you want to reserve some space, make a reservation :-) >> >>> >>> sdparm? >>> https://www.illumos.org/issues/2899 >> >> It should compile fine, OOB. >> -- richard >> >> -- >> >> richard.ell...@richardelling.com >> +1-760-896-4422 >> >> >> >> illumos-discuss | Archives | Modify Your Subscription > >-- > >richard.ell...@richardelling.com >+1-760-896-4422 > > > > > > >------------------------------------------- >illumos-discuss >Archives: https://www.listbox.com/member/archive/182180/=now >RSS Feed: >https://www.listbox.com/member/archive/rss/182180/22642773-76f3f1fc >Modify Your Subscription: >https://www.listbox.com/member/?& >Powered by Listbox: http://www.listbox.com
Thanks, the hint about allocation biasing is a useful one. Some of the rest kinda 'does not compute', so I'd explain how I see this, and kindly ask to explain what i miss and why we differ (namely, on the matter of whether reservations are guaranteed to help)? First of all, taking the biasing thesis: say, there is a reserved dataset that's never written into and only exists to block writes into some amount of blocks (each located randomly on disk if ever written into at all). Writes into other datasets land into the unreferenced low LBA offsets and after a while all the reserved space for this one dataset ends up as unreferenced sectors with higher LBA numbers - similar to what partitioning would do for "cheap overprovisioning", reserving those sectors from writes right from the start of disk usage? However is there any guarantee with zfs reservation approach that those higherLBA's would indeed never be written into and would remain all-zeroes -always - regardless of whether any current zfs block tree references those sectors? I guess one take at this would be to store uncompressed /dev/zero'es into this dataset. *Hopefully*, then, the ssd firmware would know to not allocate actual hardware storage for them (at least, the compress! ing firmwares are likely to do this) and thus would keep this amount of bytes on real chips available for housekeeping as in other ways of overprovisioning. But maybe this would a hieve the opposite - and keep the sectors allocated and unavailable for garbage collection. Generally we don't have a means to know that, right? That's what TRIM/UNMAP are for... Going to the second part: as you say, we as users of a black box have no control over which LBA's would be colocated on which pages (hardware sets of ssd cells, cca 512kb) at any instant of time, and indeed on any sufficiently sophisticated SSD this mapping is dynamic and unrelated to logical offsets of the data on 'disk', rather, such colocations are influenced by trying to stuff non-null sectors as tightly as possible into each single page. Whether these are referenced by any filesystem - the firmware has no way of knowing. Non-null is key. Now, as we also know, SSD writes are done by 'programming' a page of cells with some data, which is a single large operation to store some (non-null) logical sectors mapped from random logical offsets. Re-programming of pages requires that they are zeroed first. This can be either a slow process, or in some devices a faster one but with write-current (or voltage?) amplification which more quickly wears out the very limited life resources of this ssd page's cells. Having an abundance of empty pages in advance in is one way to improve ssd write speeds, as well as longevity, and higher vendor over-provisioning is one way to achieve this. Other ways include the user overprovisioning that keeps some ranges of LBAs from ever being written. Anyhow, for writes to be at all possinle, empty pages are needed. This requires garbage collection to colocate the needed sectors from mostly-empty pages onto a single page and thus free up some pages completely for reprogramming; this may be helped by having the sectors logically filled with null bytes or marked as unused with TRIM/UNMAP commands. Finally, this GC can be a background process which is less aggressive to ssd chips, or a more aggressive on-demand procedure done because there is a need to write and no stashed empty pages. The latter case requires the device to solve the problem fast - even if sub-optimally, i.e. at expense of wear-leveling considerations, and still at a hit to performance because it must complete before actually doing the new userdata write. Again, since this is ruled by statistics (how large a portion of cells is on average filled with referenced sectors), overprovisioning and explicit trim support is the way to tip the balance and allow the system to reallocate the 'swiss cheese' of allocations sooner and in the background (when the device is idling, and in a more sparing fashion). While we can't and needn't influence where the data is really allocated, we can influence how efficiently the preparation of empty pages happens - either by never using some data ranges at all, or by trying to inform the device of their availability for housekeeping - see trimming or hopeful zeroing. //Jim -- Typos courtesy of K-9 Mail on my Samsung Android ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com