Re: [OmniOS-discuss] Fragmentation

2017-06-25 Thread Jim Klimov
On June 23, 2017 9:01:20 PM GMT+02:00, Richard Elling 
 wrote:
>ZIL pre-allocates at the block level, so think along the lines of 12k
>or 132k.
> — richard
>
>> On Jun 23, 2017, at 11:30 AM, Günther Alka 
>wrote:
>> 
>> hello Richard
>> 
>> I can follow that the Zil does not add more fragmentation to the free
>space but is this effect relevant?
>> If a ZIL pre-allocates say 4G and the remaining fragmented poolsize
>for regular writes is 12T
>> 
>> Gea
>> 
>> Am 23.06.2017 um 19:30 schrieb Richard Elling:
>>> A slog helps fragmentation because the space for ZIL is
>pre-allocated based on a prediction of
>>> how big the write will be. The pre-allocated space includes a
>physical-block-sized chain block for the
>>> ZIL. An 8k write can allocate 12k for the ZIL entry that is freed
>when the txg commits. Thus, a slog
>>> can help decrease free space fragmentation in the pool.
>>>  — richard
>>> 
>>> 
 On Jun 23, 2017, at 8:56 AM, Guenther Alka 
>wrote:
 
 A Zil or better dedicated Slog device will not help as this is not
>a write cache but a logdevice. Its only there to commit every written
>datablock and to put it onto stable storage. It is read only after a
>crash to redo a missing committed write.
 
 All writes, does not matter if sync or not, are going over the
>rambased write cache (per default up to 4GB). This is flushed from time
>to time as a large sequential write. Writes are fragmented then
>depending on the fragmentation of the free space.
 
 Gea
 
 
> To prevent it, a ZIL caching all writes (including sync ones, e.g.
>nfs) can help. Perhaps a DDR drive (or mirror of these) with battery
>and flash protection from poweroffs, so it does not wear out like flash
>would. In this case, how-ever random writes come, ZFS does not have to
>put them on media asap - so it can do larger writes later. This can
>also protect SSD arrays from excessive small writes and wear-out,
>though there a bad(ly sized) ZIL can become a bottleneck.
> 
> Hope this helps,
> Jim
> --
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
>> -- 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>___
>OmniOS-discuss mailing list
>OmniOS-discuss@lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

@Gea, IIRC one can set sync mode on a dataset, effectively forcing all writes 
to go to (dedicated) ZIL, and data remains in memory until flushed to 
persistent bulk storage like normal pool writes go. This way more consolidated 
writes can be sent to disks of the pool, rather than forcing many small (sync) 
allocations and deallocations if (sync) writes are small and intensive enough, 
e.g. appending log files, etc.

For SSD pools this is thought to also ease the wear due to ability to reprogram 
whole pages, compensating also for small intensive random writes since random 
LBAs can live in same page.

Jim

Hope Richard would correct me if I got something wrong ;)
--
Typos courtesy of K-9 Mail on my Redmi Android
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Richard Elling
ZIL pre-allocates at the block level, so think along the lines of 12k or 132k.
 — richard

> On Jun 23, 2017, at 11:30 AM, Günther Alka  wrote:
> 
> hello Richard
> 
> I can follow that the Zil does not add more fragmentation to the free space 
> but is this effect relevant?
> If a ZIL pre-allocates say 4G and the remaining fragmented poolsize for 
> regular writes is 12T
> 
> Gea
> 
> Am 23.06.2017 um 19:30 schrieb Richard Elling:
>> A slog helps fragmentation because the space for ZIL is pre-allocated based 
>> on a prediction of
>> how big the write will be. The pre-allocated space includes a 
>> physical-block-sized chain block for the
>> ZIL. An 8k write can allocate 12k for the ZIL entry that is freed when the 
>> txg commits. Thus, a slog
>> can help decrease free space fragmentation in the pool.
>>  — richard
>> 
>> 
>>> On Jun 23, 2017, at 8:56 AM, Guenther Alka  wrote:
>>> 
>>> A Zil or better dedicated Slog device will not help as this is not a write 
>>> cache but a logdevice. Its only there to commit every written datablock and 
>>> to put it onto stable storage. It is read only after a crash to redo a 
>>> missing committed write.
>>> 
>>> All writes, does not matter if sync or not, are going over the rambased 
>>> write cache (per default up to 4GB). This is flushed from time to time as a 
>>> large sequential write. Writes are fragmented then depending on the 
>>> fragmentation of the free space.
>>> 
>>> Gea
>>> 
>>> 
 To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) 
 can help. Perhaps a DDR drive (or mirror of these) with battery and flash 
 protection from poweroffs, so it does not wear out like flash would. In 
 this case, how-ever random writes come, ZFS does not have to put them on 
 media asap - so it can do larger writes later. This can also protect SSD 
 arrays from excessive small writes and wear-out, though there a bad(ly 
 sized) ZIL can become a bottleneck.
 
 Hope this helps,
 Jim
 --
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> -- 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Günther Alka

hello Richard

I can follow that the Zil does not add more fragmentation to the free 
space but is this effect relevant?
If a ZIL pre-allocates say 4G and the remaining fragmented poolsize for 
regular writes is 12T


Gea

Am 23.06.2017 um 19:30 schrieb Richard Elling:

A slog helps fragmentation because the space for ZIL is pre-allocated based on 
a prediction of
how big the write will be. The pre-allocated space includes a 
physical-block-sized chain block for the
ZIL. An 8k write can allocate 12k for the ZIL entry that is freed when the txg 
commits. Thus, a slog
can help decrease free space fragmentation in the pool.
  — richard



On Jun 23, 2017, at 8:56 AM, Guenther Alka  wrote:

A Zil or better dedicated Slog device will not help as this is not a write 
cache but a logdevice. Its only there to commit every written datablock and to 
put it onto stable storage. It is read only after a crash to redo a missing 
committed write.

All writes, does not matter if sync or not, are going over the rambased write 
cache (per default up to 4GB). This is flushed from time to time as a large 
sequential write. Writes are fragmented then depending on the fragmentation of 
the free space.

Gea



To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) can 
help. Perhaps a DDR drive (or mirror of these) with battery and flash 
protection from poweroffs, so it does not wear out like flash would. In this 
case, how-ever random writes come, ZFS does not have to put them on media asap 
- so it can do larger writes later. This can also protect SSD arrays from 
excessive small writes and wear-out, though there a bad(ly sized) ZIL can 
become a bottleneck.

Hope this helps,
Jim
--

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


--
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Richard Elling
A slog helps fragmentation because the space for ZIL is pre-allocated based on 
a prediction of
how big the write will be. The pre-allocated space includes a 
physical-block-sized chain block for the
ZIL. An 8k write can allocate 12k for the ZIL entry that is freed when the txg 
commits. Thus, a slog
can help decrease free space fragmentation in the pool.
 — richard


> On Jun 23, 2017, at 8:56 AM, Guenther Alka  wrote:
> 
> A Zil or better dedicated Slog device will not help as this is not a write 
> cache but a logdevice. Its only there to commit every written datablock and 
> to put it onto stable storage. It is read only after a crash to redo a 
> missing committed write.
> 
> All writes, does not matter if sync or not, are going over the rambased write 
> cache (per default up to 4GB). This is flushed from time to time as a large 
> sequential write. Writes are fragmented then depending on the fragmentation 
> of the free space.
> 
> Gea
> 
> 
>> To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) can 
>> help. Perhaps a DDR drive (or mirror of these) with battery and flash 
>> protection from poweroffs, so it does not wear out like flash would. In this 
>> case, how-ever random writes come, ZFS does not have to put them on media 
>> asap - so it can do larger writes later. This can also protect SSD arrays 
>> from excessive small writes and wear-out, though there a bad(ly sized) ZIL 
>> can become a bottleneck.
>> 
>> Hope this helps,
>> Jim
>> --
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Jim Klimov
On June 23, 2017 4:13:52 PM GMT+02:00, Artyom Zhandarovsky  
wrote:
>disk errors: none
>
>
>
>
>
>-
>
>CAP Alert
>
>-
>
>
>
> Is there any way to decrease fragmentation of dr_tank ?
>
>--
>
>zpool list (Sum of RAW disk capacity without redundancy counted)
>
>--
>
>NAME  SIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH 
>ALTROOT
>
>dr_slow  9.06T  77.6M  9.06T - 0% 0%  1.00x  ONLINE  -
>
>dr_tank  48.9T  35.1T  13.9T -23%71%  1.00x  ONLINE  -
>
>rpool 272G  42.1G   230G -10%15%  1.00x  ONLINE  -
>
>
>
>Real Pool capacity from zfs list
>
>--
>
>NAME   USED AVAILMOUNTPOINT  %
>
>dr_slow   7.69T 1.26T /dr_slow 14%!
>
>dr_tank 41.6T 6.33T /dr_tank 13%!
>
>rpool 45.6G218G  /rpool   83%

The issue of zfs fragmentation is that at some point it becomes hard to find 
free spots to write into, as well as to do large writes contiguously, so 
performance suddenly and noticeably drops. This can impact reads as well, 
especially if atime=on is left as default.

To recover from existing fragmentation you must free up space, perhaps zfs-send 
datasets to another pool, empty as much as you can on this one, and send data 
back - so it lands in large contiguous writes.

To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) can 
help. Perhaps a DDR drive (or mirror of these) with battery and flash 
protection from poweroffs, so it does not wear out like flash would. In this 
case, how-ever random writes come, ZFS does not have to put them on media asap 
- so it can do larger writes later. This can also protect SSD arrays from 
excessive small writes and wear-out, though there a bad(ly sized) ZIL can 
become a bottleneck.

Hope this helps,
Jim
--
Typos courtesy of K-9 Mail on my Redmi Android
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss