On Apr 27, 2011, at 9:26 PM, Edward Ned Harvey 
<opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Neil Perrin
>> 
>> No, that's not true. The DDT is just like any other ZFS metadata and can
> be
>> split over the ARC,
>> cache device (L2ARC) and the main pool devices. An infrequently referenced
>> DDT block will get
>> evicted from the ARC to the L2ARC then evicted from the L2ARC.
> 
> When somebody has their "baseline" system, and they're thinking about adding
> dedup and/or cache, I'd like to understand the effect of not having enough
> ram.  Obviously the impact will be performance, but precisely...

Pecision is only possible if you know what the data looks like...

> At bootup, I presume the arc & l2arc are all empty.  So all the DDT entries
> reside in pool.  As the system reads things (anything, files etc) from pool,
> it will populate arc, and follow fill rate policies to populate the l2arc
> over time.  Every entry in l2arc requires 200 bytes of arc, regardless of
> what type of entry it is.  (A DDT entry in l2arc consumes just as much arc
> memory as any other type of l2arc entry.)  (Ummm...  What's the point of
> that?  Aren't DDT entries 270 bytes and ARC references 200 bytes?

No. The DDT entries vary in size.

>  Seems
> like a very questionable benefit to allow DDT entries to get evicted into
> L2ARC.)  So the ram consumption caused by the presence of l2arc will
> initially be zero after bootup, and it will grow over time as the l2arc
> populates, up to a maximum which is determined linearly as 200 bytes * the
> number of entries that can fit in the l2arc.  Of course that number varies
> based on the size of each entry and size of l2arc, but at least you can
> estimate and establish upper and lower bounds.

The upper and lower bounds vary by 256x, unless you know what the data
looks like more precisely.

> So that's how the l2arc consumes system memory in arc.  The penalty of
> insufficient ram, in conjunction with enabled L2ARC, is insufficient arc
> availability for other purposes - Maybe the whole arc is consumed by l2arc
> entries, and so the arc doesn't have any room for other stuff like commonly
> used files.  

I've never seen this.

> Worse yet, your arc consumption could be so large, that
> PROCESSES don't fit in ram anymore.  In this case, your processes get pushed
> out to swap space, which is really bad.

[for Solaris, illumos, and NexentaOS]
This will not happen unless the ARC size is at arc_min. At that point you are
already close to severe memory shortfall.

> Correct me if I'm wrong, but the dedup sha256 checksum happens in addition
> to (not instead of) the fletcher2 integrity checksum.  

You are mistaken.

> So after bootup,
> while the system is reading a bunch of data from the pool, all those reads
> are not populating the arc/l2arc with DDT entries.  Reads are just
> populating the arc and l2arc with other stuff.

L2ARC is populated by a separate thread that watches the to-be-evicted list.
The L2ARC fill rate is also throttled, so that under severe shortfall, blocks
will be evicted without being placed in the L2ARC.

> DDT entries don't get into the arc/l2arc until something tries to do a
> write.  

No, the DDT entry contains the references to the actual data.

> When performing a write, dedup calculates the checksum of the block
> to be written, and then it needs to figure out if that's a duplicate of
> another block that's already on disk somewhere.  So (I guess this part)
> there's probably a tree-structure (I'll use the subdirectories and files
> analogy even though I'm certain that's not technically correct) on disk.

Implemented as an AVL tree.

> You need to find the DDT entry, if it exists, for the block whose checksum
> is 1234ABCD.  So you start by looking under the 1 directory, and from there
> look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you
> encounter "not found" at any step, then the DDT entry doesn't already exist
> and you decide to create a new one.  But if you get all the way down to the
> C subdirectory and it contains a file named "D,"  then you have found a
> possible dedup hit - the checksum matched another block that's already on
> disk.  Now the DDT entry is stored in ARC just like anything else you read
> from disk.

DDT is metadata, not data, so it is more constrained than data entries in the
ARC.

> So the point is - Whenever you do a write, and the calculated DDT is not
> already in ARC/L2ARC, the system will actually perform several small reads
> looking for the DDT entry before it finally knows that the DDT entry
> actually exists.  So the penalty of performing a write, with dedup enabled,
> and the relevant DDT entry not already in ARC/L2ARC is a very large penalty.
> What originated as a single write quickly became several small reads plus a
> write, due to the fact the necessary DDT entry was not already available.
> 
> The penalty of insufficient ram, in conjunction with dedup, is terrible
> write performance.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to