Re: [discuss] Tool for SSD overprovisioning on OmniOS like hdparm

Jim Klimov via illumos-discuss Mon, 26 May 2014 01:16:30 -0700

25 мая 2014 г. 23:34:26 CEST, Richard Elling via illumos-discuss 
<discuss@lists.illumos.org> пишет:
>On May 25, 2014, at 8:47 AM, Schweiss, Chip <c...@innovates.com> wrote:
>> Reservations don't keep ZFS from eventually writing to every
>assignable block on the SSD.  
>> 
>
>Disagree. SPA allocations are first fit and biased towards
>lower-numbered LBAs. This means that 
>for the vast majority of cases, there is large unallocated space of
>significant magnitude.
>> This would work if Illumos did trim.   Until then slicing and using
>less than the entire capacity will keep the performance of the SSD
>high.  
>> 
>
>You are assuming that LBAs are exposed by SSDs. They are not. SSDs use
>their own COW file
>systems and the garbage collection will readjust the layout such that
>you can never be sure what
>pages are recycled or when.  This is especially important for those
>cases that overwrite the same 
>blocks many times, notably FAT and NTFS.
>
>> Also using a shift 12 or 13 helps by causing less write
>amplification.
>> 
>
>Agree. Some SSDs, even expensive enterprise-grade SSDs, do a poor job
>of sub-4KB sector allocations.
> -- richard
>
>> -Chip
>> 
>> On May 25, 2014 8:21 AM, "Richard Elling via illumos-discuss"
><discuss@lists.illumos.org> wrote:
>> 
>> On May 24, 2014, at 3:32 PM, Günther Alka via illumos-discuss
><discuss@lists.illumos.org> wrote:
>> 
>>> SSDs are the future of high performance storage.
>>> With most consumer SSDs overpovisioning is a common way to keep
>write performance high.
>>> 
>>> On Linux you can use hdparm to create a host protected area with the
>main advantage that you do not need to struggle with partitions or
>slices - just use the whole disk as usual.
>>> 
>>> read
>>> http://www.thomas-krenn.com/de/wiki/SSD_Over-Provisioning_mit_hdparm
>>> 
>>> 
>>> Has anyone compiled hdparm for Illumos or know about a Solaris tool
>to create a host protected area?
>> 
>> But why bother overprovisioning with such a low-level tool? We use
>ZFS, if you want to reserve some space, make a reservation :-)
>> 
>>> 
>>> sdparm?
>>> https://www.illumos.org/issues/2899
>> 
>> It should compile fine, OOB.
>>  -- richard
>> 
>> --
>> 
>> richard.ell...@richardelling.com
>> +1-760-896-4422
>> 
>> 
>> 
>> illumos-discuss | Archives  | Modify Your Subscription        
>
>--
>
>richard.ell...@richardelling.com
>+1-760-896-4422
>
>
>
>
>
>
>-------------------------------------------
>illumos-discuss
>Archives: https://www.listbox.com/member/archive/182180/=now
>RSS Feed:
>https://www.listbox.com/member/archive/rss/182180/22642773-76f3f1fc
>Modify Your Subscription:
>https://www.listbox.com/member/?&;
>Powered by Listbox: http://www.listbox.com


Thanks, the hint about allocation biasing is a useful one. 
Some of the rest kinda 'does not compute', so I'd explain how I see this, and 
kindly ask to explain what i miss and why we differ (namely, on the matter of 
whether reservations are guaranteed to help)?

First of all, taking the biasing thesis: say, there is a reserved dataset 
that's never written into and only exists to block writes into some amount of 
blocks (each located randomly on disk if ever written into at all). Writes into 
other datasets land into the unreferenced low LBA offsets and after a while all 
the reserved space for this one dataset ends up as unreferenced sectors with 
higher LBA numbers - similar to what partitioning would do for "cheap 
overprovisioning", reserving those sectors from writes right from the start of 
disk usage? However is there any guarantee with zfs reservation approach that 
those higherLBA's would indeed never be written into and would remain 
all-zeroes -always - regardless of whether any current zfs block tree 
references those sectors? I guess one take at this would be to store 
uncompressed /dev/zero'es into this dataset. *Hopefully*, then, the ssd 
firmware would know to not allocate actual hardware storage for them (at least, 
the compress!
 ing
firmwares are likely to do this) and thus would keep this amount of bytes on 
real chips available for housekeeping as in other ways of overprovisioning. But 
maybe this would a hieve the opposite - and keep the sectors allocated and 
unavailable for garbage collection. Generally we don't have a means to know 
that, right? That's what TRIM/UNMAP are for...

Going to the second part: as you say, we as users of a black box have no 
control over which LBA's would be colocated on which pages (hardware sets of 
ssd cells, cca 512kb) at any instant of time, and indeed on any sufficiently 
sophisticated SSD this mapping is dynamic and unrelated to logical offsets of 
the data on 'disk', rather, such colocations are influenced by trying to stuff 
non-null sectors as tightly as possible into each single page. Whether these 
are referenced by any filesystem - the firmware has no way of knowing. Non-null 
is key.

Now, as we also know, SSD writes are done by 'programming' a page of cells with 
some data, which is a single large operation to store some (non-null) logical 
sectors mapped from random logical offsets.

Re-programming of pages requires that they are zeroed first. This can be either 
a slow process, or in some devices a faster one but with write-current (or 
voltage?) amplification which more quickly wears out the very limited life 
resources of this ssd page's cells.

Having an abundance of empty pages in advance in is one way to improve ssd 
write speeds, as well as longevity, and higher vendor over-provisioning is one 
way to achieve this. Other ways include the user overprovisioning that keeps 
some ranges of LBAs from ever being written.

Anyhow, for writes to be at all possinle, empty pages are needed. This requires 
garbage collection to colocate the needed sectors from mostly-empty pages onto 
a single page and thus free up some pages completely for reprogramming; this 
may be helped by having the sectors logically filled with null bytes or marked 
as unused with TRIM/UNMAP commands.

Finally, this GC can be a background process which is less aggressive to ssd 
chips, or a more aggressive on-demand procedure done because there is a need to 
write and no stashed empty pages. The latter case requires the device to solve 
the problem fast - even if sub-optimally, i.e. at expense of wear-leveling 
considerations, and still at a hit to performance because it must complete 
before actually doing the new userdata write. 

Again, since this is ruled by statistics (how large a portion of cells is on 
average filled with referenced sectors), overprovisioning and explicit trim 
support is the way to tip the balance and allow the system to reallocate the 
'swiss cheese' of allocations sooner and in the background (when the device is 
idling, and in a more sparing fashion).

While we can't and needn't influence where the data is really  allocated, we 
can influence how efficiently the preparation of empty pages happens - either 
by never using some data ranges at all, or by trying to inform the device of 
their availability for housekeeping - see trimming or hopeful zeroing.

//Jim
--
Typos courtesy of K-9 Mail on my Samsung Android


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] Tool for SSD overprovisioning on OmniOS like hdparm

Reply via email to