Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Friday, April 29, 2011 12:49 AM The lower bound of ARC size is c_min # kstat -p zfs::arcstats:c_min I see there is another character in the plot: c_max c_max seems to be 80% of system ram (at least on my systems). I assume this means the ARC will never grow larger than 80%, so if you're trying to calculate the ram needed for your system, in order to hold DDT and L2ARC references in ARC... This better be factored into the equation. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Thu, Apr 28, 2011 at 6:48 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: What does it mean / what should you do, if you run that command, and it starts spewing messages like this? leaked space: vdev 0, offset 0x3bd8096e00, size 7168 I'm not sure there's much you can do about it short of deleting datasets and/or snapshots. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
: xvm-4200m2-02 ; I can do the echo | mdb -k. But what is that : xvm-4200 command? My guess is that is a very odd shell prompt ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 04/30/11 01:41, Sean Sprague wrote: : xvm-4200m2-02 ; I can do the echo | mdb -k. But what is that : xvm-4200 command? My guess is that is a very odd shell prompt ;-) - Indeed ':' means what follows a comment (at least to /bin/ksh) 'xvm-4200m2-02' is the comment - actually the system name (not very inventive) ';' ends the comment. I use this because I can cut and paste entire lines back to the shell. Sorry for the confusion: Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
And one of these: Assertion failed: space_map_load(msp-ms_map, zdb_space_map_ops, 0x0, msp-ms_smo, spa-spa_meta_objset) == 0, file ../zdb.c, line 1439, function zdb_leak_init Abort (core dumped) I saved the core and ran again. This time it spewed leaked space messages for an hour, and completed. But the final result was physically impossible (it counted up 744k total blocks, which means something like 3Megs per block in my 2.39T used pool. I checked compressratio is 1.00x and I have no compression.) I ran again. Still spewing messages. This can't be a good sign. Anyone know what it means, or what to do about it? IIRC it runs out of memory, not space. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Richard Elling [mailto:richard.ell...@gmail.com] Worse yet, your arc consumption could be so large, that PROCESSES don't fit in ram anymore. In this case, your processes get pushed out to swap space, which is really bad. This will not happen. The ARC will be asked to shrink when other memory consumers demand memory. The lower bound of ARC size is c_min Makes sense. Is c_min a constant? Suppose processes are consuming a lot of memory. Will c_min protect L2ARC entries in the ARC? At least on my systems, it seems that c_min is fixed at 10% of the total system memory. If c_min is sufficiently small, relative to the amount of ARC that would be necessary to index the L2ARC... Since every entry in L2ARC requires an entry in ARC, this seems to imply, that if process memory consumption is high, then both the ARC and L2ARC are effectively useless. Things sometimes get evicted from ARC completely, and sometimes they get evicted into L2ARC with only a reference still remaining in ARC. But if processes consume enough memory on the system so as to shrink the ARC to effectively nonexistent, then the L2ARC must also be nonexistent. L2ARC is populated by a thread that watches the soon-to-be-evicted list. This seems to imply, if processes start consuming a lot of memory, the first thing to disappear is the ARC, and the second thing to disappear is the L2ARC (because the L2ARC references stored in ARC get evicted from ARC after other things in ARC) AVL trees Good to know. Thanks. So the point is - Whenever you do a write, and the calculated DDT is not already in ARC/L2ARC, the system will actually perform several small reads looking for the DDT entry before it finally knows that the DDT entry actually exists. So the penalty of performing a write, with dedup enabled, and the relevant DDT entry not already in ARC/L2ARC is a very large penalty. very is a relative term, Agreed. Here is what I was implying: Suppose you don't have enough ram to hold the complete DDT. And you perform a bunch of random writes (whether sync or async). Then you will suffer a lot of cache-misses searching for DDT entries, and the consequence will be ... For every little write that could potentially have only the disk penalty of one little write, instead has the disk penalty of several reads plus a write. So your random write performance is effectively several times slower than it could potentially have been, if only you had more ram. Reads are unaffected, except if there's random-write congestion hogging disk time. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
Controls whether deduplication is in effect for a dataset. The default value is off. The default checksum used for deduplication is sha256 (subject to change). snip/ This is from b159. This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a combination with verify (which I would use anyway, since there are always tiny chances of collisions), why would sha256 be a better choice? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a combination with verify (which I would use anyway, since there are always tiny chances of collisions), why would sha256 be a better choice? fletcher4 was only an option for snv_128, which was quickly pulled and replaced with snv_128b which removed fletcher4 as an option. The official post is here: http://www.opensolaris.org/jive/thread.jspa?threadID=118519tstart=0#437431 It looks like fletcher4 is still an option in snv_151a for non-dedup datasets, and is in fact the default. As an aside: Erik, any idea when the 159 bits will make it to the public? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 4/29/2011 9:44 AM, Brandon High wrote: On Fri, Apr 29, 2011 at 7:10 AM, Roy Sigurd Karlsbakkr...@karlsbakk.net wrote: This was fletcher4 earlier, and still is in opensolaris/openindiana. Given a combination with verify (which I would use anyway, since there are always tiny chances of collisions), why would sha256 be a better choice? fletcher4 was only an option for snv_128, which was quickly pulled and replaced with snv_128b which removed fletcher4 as an option. The official post is here: http://www.opensolaris.org/jive/thread.jspa?threadID=118519tstart=0#437431 It looks like fletcher4 is still an option in snv_151a for non-dedup datasets, and is in fact the default. As an aside: Erik, any idea when the 159 bits will make it to the public? -B Yup, fletcher4 is still the default for any fileset not using dedup. It's good enough, and I can't see any reason to change it for those purposes (since it's collision problems aren't much of an issue when just doing data integrity checks). Sorry, no idea on release date stuff. I'm completely out of the loop on release info. I'm lucky if I can get a heads up before it actually gets published internally. :-( I'm just a lowly Java Platform Group dude. Solaris ain't my silo. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Edward Ned Harvey I saved the core and ran again. This time it spewed leaked space messages for an hour, and completed. But the final result was physically impossible (it counted up 744k total blocks, which means something like 3Megs per block in my 2.39T used pool. I checked compressratio is 1.00x and I have no compression.) I ran again. Still spewing messages. This can't be a good sign. Anyone know what it means, or what to do about it? After running again, I get an even more impossible number ... 45.4K total blocks, which would mean something like 50 megs per block. This pool does scrub regularly (every other week). In fact, it's scheduled to scrub this weekend ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey What does it mean / what should you do, if you run that command, and it starts spewing messages like this? leaked space: vdev 0, offset 0x3bd8096e00, size 7168 And one of these: Assertion failed: space_map_load(msp-ms_map, zdb_space_map_ops, 0x0, msp-ms_smo, spa-spa_meta_objset) == 0, file ../zdb.c, line 1439, function zdb_leak_init Abort (core dumped) I saved the core and ran again. This time it spewed leaked space messages for an hour, and completed. But the final result was physically impossible (it counted up 744k total blocks, which means something like 3Megs per block in my 2.39T used pool. I checked compressratio is 1.00x and I have no compression.) I ran again. Still spewing messages. This can't be a good sign. Anyone know what it means, or what to do about it? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Neil Perrin [mailto:neil.per...@oracle.com] The size of these structures will vary according to the release you're running. You can always find out the size for a particular system using ::sizeof within mdb. For example, as super user : : xvm-4200m2-02 ; echo ::sizeof ddt_entry_t | mdb -k sizeof (ddt_entry_t) = 0x178 : xvm-4200m2-02 ; echo ::sizeof arc_buf_hdr_t | mdb -k sizeof (arc_buf_hdr_t) = 0x100 : xvm-4200m2-02 ; I can do the echo | mdb -k. But what is that : xvm-4200 command? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
OK, I just re-looked at a couple of things, and here's what I /think/ is the correct numbers. A single entry in the DDT is defined in the struct ddt_entry : http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/ddt.h#108 I just checked, and the current size of this structure is 0x178, or 376 bytes. Each ARC entry, which points to either an L2ARC item (of any kind, cached data, metadata, or a DDT line) or actual data/metadata/etc., is defined in the struct arc_buf_hdr : http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#431 It's current size is 0xb0, or 176 bytes. These are fixed-size structures. PLEASE - someone correct me if these two structures AREN'T what we should be looking at. So, our estimate calculations have to be based on these new numbers. Back to the original scenario: 1TB (after dedup) of 4k blocks: how much space is needed for the DDT, and how much ARC space is needed if the DDT is kept in a L2ARC cache device? Step 1) 1TB (2^40 bytes) stored in blocks of 4k (2^12) = 2^28 blocks total, which is about 268 million. Step 2) 2^28 blocks of information in the DDT requires 376 bytes/block * 2^28 blocks = 94 * 2^30 = 94 GB of space. Step 3) Storing a reference to 268 million (2^28) DDT entries in the L2ARC will consume the following amount of ARC space: 176 bytes/entry * 2^28 entries = 44GB of RAM. That's pretty ugly. So, to summarize: For 1TB of data, broken into the following block sizes: DDT sizeARC consumption 512b752GB (73%) 352GB (34%) 4k 94GB (9%) 44GB (4.3%) 8k 47GB (4.5%) 22GB (2.1%) 32k 11.75GB (2.2%) 5.5GB (0.5%) 64k 5.9GB (1.1%)2.75GB (0.3%) 128k2.9GB% (0.6%) 1.4GB (0.1%) ARC consumption presumes the whole DDT is stored in the L2ARC. Percentage size is relative to the original 1TB total data size Of course, the trickier proposition here is that we DON'T KNOW what our dedup value is ahead of time on a given data set. That is, given a data set of X size, we don't know how big the deduped data size will be. The above calculations are for DDT/ARC size for a data set that has already been deduped down to 1TB in size. Perhaps it would be nice to have some sort of userland utility that builds it's own DDT as a test and does all the above calculations, to see how dedup would work on a given dataset. 'zdb -S' sorta, kinda does that, but... -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Erik Trimble [mailto:erik.trim...@oracle.com] OK, I just re-looked at a couple of things, and here's what I /think/ is the correct numbers. I just checked, and the current size of this structure is 0x178, or 376 bytes. Each ARC entry, which points to either an L2ARC item (of any kind, cached data, metadata, or a DDT line) or actual data/metadata/etc., is defined in the struct arc_buf_hdr : http://src.opensolaris.org/source/xref/onnv/onnv- gate/usr/src/uts/common/fs/zfs/arc.c#431 It's current size is 0xb0, or 176 bytes. These are fixed-size structures. heheheh... See what I mean about all the conflicting sources of information? Is it 376 and 176? Or is it 270 and 200? Erik says it's fixed-size. Richard says The DDT entries vary in size. So far, what Erik says is at least based on reading the source code, with a disclaimer of possibly misunderstanding the source code. What Richard says is just a statement of supposed absolute fact without any backing. In any event, thank you both for your input. Can anyone answer these authoritatively? (Neil?) I'll send you a pizza. ;-) For 1TB of data, broken into the following block sizes: DDT sizeARC consumption 512b 752GB (73%) 352GB (34%) 4k94GB (9%) 44GB (4.3%) 8k47GB (4.5%) 22GB (2.1%) 32k 11.75GB (2.2%) 5.5GB (0.5%) 64k 5.9GB (1.1%)2.75GB (0.3%) 128k 2.9GB% (0.6%) 1.4GB (0.1%) At least the methodology to calculate all this seems reasonable to me. If the new numbers (376 and 176) are correct, I would just state it like this: DDT size = 376b * # unique blocks You can find the number of blocks in an existing filesystem using zdb -bb poolname ARC consumption = 176b * #blocks in the L2ARC You can estimate the #blocks in L2ARC, if you divide total pool disk usage by the number of blocks in pool obtained above, to find the average block size in pool. Divide the total L2ARC capacity by the average block size, and you get the number of average-sized blocks stored in your L2ARC. (Or take L2ARC capacity / Total pool usage, * #blocks in whole pool. To estimate #blocks in L2ARC) ARC consumption presumes the whole DDT is stored in the L2ARC. Percentage size is relative to the original 1TB total data size Of course, the trickier proposition here is that we DON'T KNOW what our dedup value is ahead of time on a given data set. That is, given a data set of X size, we don't know how big the deduped data size will be. The above calculations are for DDT/ARC size for a data set that has already been deduped down to 1TB in size. Perhaps it would be nice to have some sort of userland utility that builds it's own DDT as a test and does all the above calculations, to see how dedup would work on a given dataset. 'zdb -S' sorta, kinda does that, but... -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 4/28/11 12:45 PM, Edward Ned Harvey wrote: From: Erik Trimble [mailto:erik.trim...@oracle.com] OK, I just re-looked at a couple of things, and here's what I /think/ is the correct numbers. I just checked, and the current size of this structure is 0x178, or 376 bytes. Each ARC entry, which points to either an L2ARC item (of any kind, cached data, metadata, or a DDT line) or actual data/metadata/etc., is defined in the struct arc_buf_hdr : http://src.opensolaris.org/source/xref/onnv/onnv- gate/usr/src/uts/common/fs/zfs/arc.c#431 It's current size is 0xb0, or 176 bytes. These are fixed-size structures. heheheh... See what I mean about all the conflicting sources of information? Is it 376 and 176? Or is it 270 and 200? Erik says it's fixed-size. Richard says The DDT entries vary in size. So far, what Erik says is at least based on reading the source code, with a disclaimer of possibly misunderstanding the source code. What Richard says is just a statement of supposed absolute fact without any backing. In any event, thank you both for your input. Can anyone answer these authoritatively? (Neil?) I'll send you a pizza. ;-) - I wouldn't consider myself an authority on the dedup code. The size of these structures will vary according to the release you're running. You can always find out the size for a particular system using ::sizeof within mdb. For example, as super user : : xvm-4200m2-02 ; echo ::sizeof ddt_entry_t | mdb -k sizeof (ddt_entry_t) = 0x178 : xvm-4200m2-02 ; echo ::sizeof arc_buf_hdr_t | mdb -k sizeof (arc_buf_hdr_t) = 0x100 : xvm-4200m2-02 ; This shows yet another size. Also there are more changes planned within the arc. Sorry, I can't talk about those changes and nor when you'll see them. However, that's not the whole story. It looks like the arc_buf_hdr_t use their own kmem cache so there should be little wastage, but the ddt_entry_t are allocated from the generic kmem caches and so will probably have some roundup and unused space. Caches for small buffers are aligned to 64 bytes. See kmem_alloc_sizes[] and comment: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#920 Pizza: Mushroom and anchovy - er, just kidding. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Thu, 2011-04-28 at 13:59 -0600, Neil Perrin wrote: On 4/28/11 12:45 PM, Edward Ned Harvey wrote: In any event, thank you both for your input. Can anyone answer these authoritatively? (Neil?) I'll send you a pizza. ;-) - I wouldn't consider myself an authority on the dedup code. The size of these structures will vary according to the release you're running. You can always find out the size for a particular system using ::sizeof within mdb. For example, as super user : : xvm-4200m2-02 ; echo ::sizeof ddt_entry_t | mdb -k sizeof (ddt_entry_t) = 0x178 : xvm-4200m2-02 ; echo ::sizeof arc_buf_hdr_t | mdb -k sizeof (arc_buf_hdr_t) = 0x100 : xvm-4200m2-02 ; yup, that's how I got them. Just to add to the confusion, there are typedefs in the code which can make names slightly different: typedef struct arc_buf_hdr arc_buf_hdr_t; typedef struct ddt_entry ddt_entry_t; I got my values from a x86 box running b159, and a SPARC box running S10u9. The values were the same from both. E.g.: root@invisible:~# uname -a SunOS invisible 5.11 snv_159 i86pc i386 i86pc Solaris root@invisible:~# isainfo amd64 i386 root@invisible:~# mdb -k Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs ip hook neti arp usba uhci fctl stmf kssl stmf_sbd sockfs lofs random sata sd fcip cpc crypto nfs logindmux ptm ufs sppp ipc ] ::sizeof struct arc_buf_hdr sizeof (struct arc_buf_hdr) = 0xb0 ::sizeof struct ddt_entry sizeof (struct ddt_entry) = 0x178 This shows yet another size. Also there are more changes planned within the arc. Sorry, I can't talk about those changes and nor when you'll see them. However, that's not the whole story. It looks like the arc_buf_hdr_t use their own kmem cache so there should be little wastage, but the ddt_entry_t are allocated from the generic kmem caches and so will probably have some roundup and unused space. Caches for small buffers are aligned to 64 bytes. See kmem_alloc_sizes[] and comment: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#920 Ugg. I hadn't even thought of memory alignment/allocation issues. Pizza: Mushroom and anchovy - er, just kidding. Neil. And, let me say: Yuck! What is that, an ISO-standard pizza? Disgusting. ANSI-standard pizza, all the way! (pepperoni mushrooms) -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Wed, Apr 27, 2011 at 9:26 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. So after bootup, My understanding is that enabling dedup forces sha256. The default checksum used for deduplication is sha256 (subject to change). When dedup is enabled, the dedup checksum algorithm overrides the checksum property. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Thu, 2011-04-28 at 14:33 -0700, Brandon High wrote: On Wed, Apr 27, 2011 at 9:26 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. So after bootup, My understanding is that enabling dedup forces sha256. The default checksum used for deduplication is sha256 (subject to change). When dedup is enabled, the dedup checksum algorithm overrides the checksum property. -B From the man page for zfs(1) dedup=on | off | verify | sha256[,verify] Controls whether deduplication is in effect for a dataset. The default value is off. The default checksum used for deduplication is sha256 (subject to change). When dedup is enabled, the dedup checksum algorithm overrides the checksum property. Setting the value to verify is equivalent to specifying sha256,verify. If the property is set to verify, then, whenever two blocks have the same signature, ZFS will do a byte-for- byte comparison with the existing block to ensure that the contents are identical. This is from b159. A careful reading of the man page seems to imply that there's no way to change the dedup checksum algorithm from sha256, as the dedup property ignores the checksum property, and there's no provided way to explicitly set a checksum algorithm specific to dedup (i.e. there's no way to override the default for dedup). -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Thu, Apr 28, 2011 at 3:05 PM, Erik Trimble erik.trim...@oracle.com wrote: A careful reading of the man page seems to imply that there's no way to change the dedup checksum algorithm from sha256, as the dedup property ignores the checksum property, and there's no provided way to explicitly set a checksum algorithm specific to dedup (i.e. there's no way to override the default for dedup). That's my understanding as well. The initial release used fletcher4 or sha256, but there was either a bug in the fletcher4 code or a hash collision that required removing it as an option. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Brandon High [mailto:bh...@freaks.com] Sent: Thursday, April 28, 2011 5:33 PM On Wed, Apr 27, 2011 at 9:26 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. So after bootup, My understanding is that enabling dedup forces sha256. The default checksum used for deduplication is sha256 (subject to change). When dedup is enabled, the dedup checksum algorithm overrides the checksum property. Interesting. So it would seem, that the DDT probably does get populated into ARC, simply by having read something from disk. That was one important consequence in discussion ... (DDT does not only get populated into ARC during writes.) PS. I'm only drawing conclusions here, so please tell me I'm wrong if I'm wrong somehow. The other important consequence, not yet answered: When a block is scheduled to be written, system performs checksum, and looks for a matching entry in DDT in ARC/L2ARC. In the event of an ARC/L2ARC cache miss for a DDT entry which actually exists, the system will need to perform a number of small disk reads in order to fetch the DDT entry from disk. Correct? I figure at least one, probably more than one, read to locate the entry on disk, and then another read to actually read the entry. After this, the system knows there is a checksum match between the block waiting to be written, and another block that's already on disk, and it could possibly have to do yet another read for verification, before it is able to finally do the write. Right? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: Tomas Ögren [mailto:st...@acc.umu.se] zdb -bb pool Oy - this is scary - Thank you by the way for that command - I've been gathering statistics across a handful of systems now ... What does it mean / what should you do, if you run that command, and it starts spewing messages like this? leaked space: vdev 0, offset 0x3bd8096e00, size 7168 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
[the dog jumped on the keyboard and wiped out my first reply, second attempt...] On Apr 27, 2011, at 9:26 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Neil Perrin No, that's not true. The DDT is just like any other ZFS metadata and can be split over the ARC, cache device (L2ARC) and the main pool devices. An infrequently referenced DDT block will get evicted from the ARC to the L2ARC then evicted from the L2ARC. When somebody has their baseline system, and they're thinking about adding dedup and/or cache, I'd like to understand the effect of not having enough ram. Obviously the impact will be performance, but precisely... precisely only works when you know precisely what your data looks like. For most folks, that is unknown in advance. slow disks + small RAM = bad recipe for dedup At bootup, I presume the arc l2arc are all empty. So all the DDT entries reside in pool. Yes As the system reads things (anything, files etc) from pool, it will populate arc, and follow fill rate policies to populate the l2arc over time. Every entry in l2arc requires 200 bytes of arc, regardless of what type of entry it is. (A DDT entry in l2arc consumes just as much arc memory as any other type of l2arc entry.) Approximately 200 bytes, this is subject to change. (Ummm... What's the point of that? Aren't DDT entries 270 bytes and ARC references 200 bytes? DDT entries vary in size. More references means more bytes needed. Seems like a very questionable benefit to allow DDT entries to get evicted into L2ARC.) So the ram consumption caused by the presence of l2arc will initially be zero after bootup, and it will grow over time as the l2arc populates, up to a maximum which is determined linearly as 200 bytes * the number of entries that can fit in the l2arc. Of course that number varies based on the size of each entry and size of l2arc, but at least you can estimate and establish upper and lower bounds. Yes, this is simple enough to toss into a spreadsheet. So that's how the l2arc consumes system memory in arc. The penalty of insufficient ram, in conjunction with enabled L2ARC, is insufficient arc availability for other purposes - Maybe the whole arc is consumed by l2arc entries, and so the arc doesn't have any room for other stuff like commonly used files. I've never witnessed such a condition and doubt that it would happen. Worse yet, your arc consumption could be so large, that PROCESSES don't fit in ram anymore. In this case, your processes get pushed out to swap space, which is really bad. This will not happen. The ARC will be asked to shrink when other memory consumers demand memory. The lower bound of ARC size is c_min # kstat -p zfs::arcstats:c_min Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. You are wrong, as others have pointed out. Documented in the man page. So after bootup, while the system is reading a bunch of data from the pool, all those reads are not populating the arc/l2arc with DDT entries. Reads are just populating the arc and l2arc with other stuff. L2ARC is populated by a thread that watches the soon-to-be-evicted list. If the flow through the ARC is much greater than the throttle of the L2ARC filling thread, then the data just won't make it into the L2ARC. The thottle changes after the ARC fills, so it can warm the L2ARC faster, but then gets out of the way when needed. DDT entries don't get into the arc/l2arc until something tries to do a write. When performing a write, dedup calculates the checksum of the block to be written, and then it needs to figure out if that's a duplicate of another block that's already on disk somewhere. So (I guess this part) there's probably a tree-structure AVL trees (I'll use the subdirectories and files analogy even though I'm certain that's not technically correct) on disk. You need to find the DDT entry, if it exists, for the block whose checksum is 1234ABCD. So you start by looking under the 1 directory, and from there look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you encounter not found at any step, then the DDT entry doesn't already exist and you decide to create a new one. But if you get all the way down to the C subdirectory and it contains a file named D, then you have found a possible dedup hit - the checksum matched another block that's already on disk. Now the DDT entry is stored in ARC just like anything else you read from disk. http://en.wikipedia.org/wiki/AVL_tree So the point is - Whenever you do a write, and the calculated DDT is not already in ARC/L2ARC, the system will actually perform several small reads looking for the DDT entry before it finally knows that the DDT entry actually exists. So the penalty of performing a write, with dedup enabled,
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Erik Trimble (BTW, is there any way to get a measurement of number of blocks consumed per zpool? Per vdev? Per zfs filesystem?) *snip*. you need to use zdb to see what the current block usage is for a filesystem. I'd have to look up the particular CLI usage for that, as I don't know what it is off the top of my head. Anybody know the answer to that one? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 27 April, 2011 - Edward Ned Harvey sent me these 0,6K bytes: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Erik Trimble (BTW, is there any way to get a measurement of number of blocks consumed per zpool? Per vdev? Per zfs filesystem?) *snip*. you need to use zdb to see what the current block usage is for a filesystem. I'd have to look up the particular CLI usage for that, as I don't know what it is off the top of my head. Anybody know the answer to that one? zdb -bb pool /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Neil Perrin No, that's not true. The DDT is just like any other ZFS metadata and can be split over the ARC, cache device (L2ARC) and the main pool devices. An infrequently referenced DDT block will get evicted from the ARC to the L2ARC then evicted from the L2ARC. When somebody has their baseline system, and they're thinking about adding dedup and/or cache, I'd like to understand the effect of not having enough ram. Obviously the impact will be performance, but precisely... At bootup, I presume the arc l2arc are all empty. So all the DDT entries reside in pool. As the system reads things (anything, files etc) from pool, it will populate arc, and follow fill rate policies to populate the l2arc over time. Every entry in l2arc requires 200 bytes of arc, regardless of what type of entry it is. (A DDT entry in l2arc consumes just as much arc memory as any other type of l2arc entry.) (Ummm... What's the point of that? Aren't DDT entries 270 bytes and ARC references 200 bytes? Seems like a very questionable benefit to allow DDT entries to get evicted into L2ARC.) So the ram consumption caused by the presence of l2arc will initially be zero after bootup, and it will grow over time as the l2arc populates, up to a maximum which is determined linearly as 200 bytes * the number of entries that can fit in the l2arc. Of course that number varies based on the size of each entry and size of l2arc, but at least you can estimate and establish upper and lower bounds. So that's how the l2arc consumes system memory in arc. The penalty of insufficient ram, in conjunction with enabled L2ARC, is insufficient arc availability for other purposes - Maybe the whole arc is consumed by l2arc entries, and so the arc doesn't have any room for other stuff like commonly used files. Worse yet, your arc consumption could be so large, that PROCESSES don't fit in ram anymore. In this case, your processes get pushed out to swap space, which is really bad. Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. So after bootup, while the system is reading a bunch of data from the pool, all those reads are not populating the arc/l2arc with DDT entries. Reads are just populating the arc and l2arc with other stuff. DDT entries don't get into the arc/l2arc until something tries to do a write. When performing a write, dedup calculates the checksum of the block to be written, and then it needs to figure out if that's a duplicate of another block that's already on disk somewhere. So (I guess this part) there's probably a tree-structure (I'll use the subdirectories and files analogy even though I'm certain that's not technically correct) on disk. You need to find the DDT entry, if it exists, for the block whose checksum is 1234ABCD. So you start by looking under the 1 directory, and from there look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you encounter not found at any step, then the DDT entry doesn't already exist and you decide to create a new one. But if you get all the way down to the C subdirectory and it contains a file named D, then you have found a possible dedup hit - the checksum matched another block that's already on disk. Now the DDT entry is stored in ARC just like anything else you read from disk. So the point is - Whenever you do a write, and the calculated DDT is not already in ARC/L2ARC, the system will actually perform several small reads looking for the DDT entry before it finally knows that the DDT entry actually exists. So the penalty of performing a write, with dedup enabled, and the relevant DDT entry not already in ARC/L2ARC is a very large penalty. What originated as a single write quickly became several small reads plus a write, due to the fact the necessary DDT entry was not already available. The penalty of insufficient ram, in conjunction with dedup, is terrible write performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Apr 27, 2011, at 9:26 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Neil Perrin No, that's not true. The DDT is just like any other ZFS metadata and can be split over the ARC, cache device (L2ARC) and the main pool devices. An infrequently referenced DDT block will get evicted from the ARC to the L2ARC then evicted from the L2ARC. When somebody has their baseline system, and they're thinking about adding dedup and/or cache, I'd like to understand the effect of not having enough ram. Obviously the impact will be performance, but precisely... Pecision is only possible if you know what the data looks like... At bootup, I presume the arc l2arc are all empty. So all the DDT entries reside in pool. As the system reads things (anything, files etc) from pool, it will populate arc, and follow fill rate policies to populate the l2arc over time. Every entry in l2arc requires 200 bytes of arc, regardless of what type of entry it is. (A DDT entry in l2arc consumes just as much arc memory as any other type of l2arc entry.) (Ummm... What's the point of that? Aren't DDT entries 270 bytes and ARC references 200 bytes? No. The DDT entries vary in size. Seems like a very questionable benefit to allow DDT entries to get evicted into L2ARC.) So the ram consumption caused by the presence of l2arc will initially be zero after bootup, and it will grow over time as the l2arc populates, up to a maximum which is determined linearly as 200 bytes * the number of entries that can fit in the l2arc. Of course that number varies based on the size of each entry and size of l2arc, but at least you can estimate and establish upper and lower bounds. The upper and lower bounds vary by 256x, unless you know what the data looks like more precisely. So that's how the l2arc consumes system memory in arc. The penalty of insufficient ram, in conjunction with enabled L2ARC, is insufficient arc availability for other purposes - Maybe the whole arc is consumed by l2arc entries, and so the arc doesn't have any room for other stuff like commonly used files. I've never seen this. Worse yet, your arc consumption could be so large, that PROCESSES don't fit in ram anymore. In this case, your processes get pushed out to swap space, which is really bad. [for Solaris, illumos, and NexentaOS] This will not happen unless the ARC size is at arc_min. At that point you are already close to severe memory shortfall. Correct me if I'm wrong, but the dedup sha256 checksum happens in addition to (not instead of) the fletcher2 integrity checksum. You are mistaken. So after bootup, while the system is reading a bunch of data from the pool, all those reads are not populating the arc/l2arc with DDT entries. Reads are just populating the arc and l2arc with other stuff. L2ARC is populated by a separate thread that watches the to-be-evicted list. The L2ARC fill rate is also throttled, so that under severe shortfall, blocks will be evicted without being placed in the L2ARC. DDT entries don't get into the arc/l2arc until something tries to do a write. No, the DDT entry contains the references to the actual data. When performing a write, dedup calculates the checksum of the block to be written, and then it needs to figure out if that's a duplicate of another block that's already on disk somewhere. So (I guess this part) there's probably a tree-structure (I'll use the subdirectories and files analogy even though I'm certain that's not technically correct) on disk. Implemented as an AVL tree. You need to find the DDT entry, if it exists, for the block whose checksum is 1234ABCD. So you start by looking under the 1 directory, and from there look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you encounter not found at any step, then the DDT entry doesn't already exist and you decide to create a new one. But if you get all the way down to the C subdirectory and it contains a file named D, then you have found a possible dedup hit - the checksum matched another block that's already on disk. Now the DDT entry is stored in ARC just like anything else you read from disk. DDT is metadata, not data, so it is more constrained than data entries in the ARC. So the point is - Whenever you do a write, and the calculated DDT is not already in ARC/L2ARC, the system will actually perform several small reads looking for the DDT entry before it finally knows that the DDT entry actually exists. So the penalty of performing a write, with dedup enabled, and the relevant DDT entry not already in ARC/L2ARC is a very large penalty. What originated as a single write quickly became several small reads plus a write, due to the fact the necessary DDT entry was not already available. The penalty of insufficient ram, in conjunction with
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
- Original Message - On 04/25/11 11:55, Erik Trimble wrote: On 4/25/2011 8:20 AM, Edward Ned Harvey wrote: And one more comment: Based on what's below, it seems that the DDT gets stored on the cache device and also in RAM. Is that correct? What if you didn't have a cache device? Shouldn't it *always* be in ram? And doesn't the cache device get wiped every time you reboot? It seems to me like putting the DDT on the cache device would be harmful... Is that really how it is? Nope. The DDT is stored only in one place: cache device if present, /or/ RAM otherwise (technically, ARC, but that's in RAM). If a cache device is present, the DDT is stored there, BUT RAM also must store a basic lookup table for the DDT (yea, I know, a lookup table for a lookup table). No, that's not true. The DDT is just like any other ZFS metadata and can be split over the ARC, cache device (L2ARC) and the main pool devices. An infrequently referenced DDT block will get evicted from the ARC to the L2ARC then evicted from the L2ARC. and with the default size of a zfs configuration's metadata being (ram size - 1GB) / 4, without tuning, and with 128kB blocks all over, you'll need some 5-6GB+ per terabyte stored. -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
After modifications that I hope are corrections, I think the post should look like this: The rule-of-thumb is 270 bytes/DDT entry, and 200 bytes of ARC for every L2ARC entry. DDT doesn't count for this ARC space usage E.g.: I have 1TB of 4k blocks that are to be deduped, and it turns out that I have about a 5:1 dedup ratio. I'd also like to see how much ARC usage I eat up with a 160GB L2ARC. (1) How many entries are there in the DDT: 1TB of 4k blocks means there are 268million blocks. However, at a 5:1 dedup ratio, I'm only actually storing 20% of that, so I have about 54 million blocks. Thus, I need a DDT of about 270bytes * 54 million =~ 14GB in size (2) My L2ARC is 160GB in size, but I'm using 14GB for the DDT. Thus, I have 146GB free for use as a data cache. 146GB / 4k =~ 38 million blocks can be stored in the remaining L2ARC space. However, 38 million files takes up: 200bytes * 38 million =~ 7GB of space in ARC. Thus, I better spec my system with (whatever base RAM for basic OS and cache and application requirements) + 14G because of dedup + 7G because of L2ARC. Thanks, but one more ting: Add some tuning parameters, such as set zfs:zfs_arc_meta_limit = somevalue in /etc/system to help zfs use more memory for its metadata (like the DDT), as it won't use more than (RAM-1GB)/4 by default Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 4/25/2011 8:20 AM, Edward Ned Harvey wrote: There are a lot of conflicting references on the Internet, so I'd really like to solicit actual experts (ZFS developers or people who have physical evidence) to weigh in on this... After searching around, the reference I found to be the most seemingly useful was Erik's post here: http://opensolaris.org/jive/thread.jspa?threadID=131296 Unfortunately it looks like there's an arithmetic error (1TB of 4k blocks means 268million blocks, not 1 billion). Also, IMHO it seems important make the distinction, #files != #blocks. Due to the existence of larger files, there will sometimes be more than one block per file; and if I'm not mistaken, thanks to write aggregation, there will sometimes be more than one file per block. YMMV. Average block size could be anywhere between 1 byte and 128k assuming default recordsize. (BTW, recordsize seems to be a zfs property, not a zpool property. So how can you know or configure the blocksize for something like a zvol iscsi target?) I said 2^30, which is roughly a quarter billion. But, I should have been more exact. And, the file != block difference is important to note. zvols also take a Recordsize attribute. And, zvols tend to be sticklers about all blocks being /exactly/ the recordsize value, unlike filesystems, which use it as a *maximum* block size. Min block size is 512 bytes. (BTW, is there any way to get a measurement of number of blocks consumed per zpool? Per vdev? Per zfs filesystem?) The calculations below are based on assumption of 4KB blocks adding up to a known total data consumption. The actual thing that matters is the number of blocks consumed, so the conclusions drawn will vary enormously when people actually have average block sizes != 4KB. you need to use zdb to see what the current block usage is for a filesystem. I'd have to look up the particular CLI usage for that, as I don't know what it is off the top of my head. And one more comment: Based on what's below, it seems that the DDT gets stored on the cache device and also in RAM. Is that correct? What if you didn't have a cache device? Shouldn't it *always* be in ram? And doesn't the cache device get wiped every time you reboot? It seems to me like putting the DDT on the cache device would be harmful... Is that really how it is? Nope. The DDT is stored only in one place: cache device if present, /or/ RAM otherwise (technically, ARC, but that's in RAM). If a cache device is present, the DDT is stored there, BUT RAM also must store a basic lookup table for the DDT (yea, I know, a lookup table for a lookup table). My minor corrections here: The rule-of-thumb is 270 bytes/DDT entry, and 200 bytes of ARC for every L2ARC entry, since the DDT is stored on the cache device. the DDT itself doesn't consume any ARC space usage if stored in a L2ARC cache E.g.: I have 1TB of 4k blocks that are to be deduped, and it turns out that I have about a 5:1 dedup ratio. I'd also like to see how much ARC usage I eat up with using a 160GB L2ARC to store my DDT on. (1) How many entries are there in the DDT? 1TB of 4k blocks means there are 268million blocks. However, at a 5:1 dedup ratio, I'm only actually storing 20% of that, so I have about 54 million blocks. Thus, I need a DDT of about 270bytes * 54 million =~ 14GB in size (2) How much ARC space does this DDT take up? The 54 million entries in my DDT take up about 200bytes * 54 million =~ 10G of ARC space, so I need to have 10G of RAM dedicated just to storing the references to the DDT in the L2ARC. (3) How much space do I have left on the L2ARC device, and how many blocks can that hold? Well, I have 160GB - 14GB (DDT) = 146GB of cache space left on the device, which, assuming I'm still using 4k blocks, means I can cache about 37 million 4k blocks, or about 66% of my total data. This extra cache of blocks in the L2ARC would eat up 200 b * 37 million =~ 7.5GB of ARC entries. Thus, for the aforementioned dedup scenario, I'd better spec it with (whatever base RAM for basic OS and ordinary ZFS cache and application requirements) at least a 14G L2ARC device for dedup + 10G more of RAM for the DDT L2ARC requirements + 1GB of RAM for every 20GB of additional space in the L2ARC cache beyond that used by the DDT. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On 04/25/11 11:55, Erik Trimble wrote: On 4/25/2011 8:20 AM, Edward Ned Harvey wrote: And one more comment: Based on what's below, it seems that the DDT gets stored on the cache device and also in RAM. Is that correct? What if you didn't have a cache device? Shouldn't it *always* be in ram? And doesn't the cache device get wiped every time you reboot? It seems to me like putting the DDT on the cache device would be harmful... Is that really how it is? Nope. The DDT is stored only in one place: cache device if present, /or/ RAM otherwise (technically, ARC, but that's in RAM). If a cache device is present, the DDT is stored there, BUT RAM also must store a basic lookup table for the DDT (yea, I know, a lookup table for a lookup table). No, that's not true. The DDT is just like any other ZFS metadata and can be split over the ARC, cache device (L2ARC) and the main pool devices. An infrequently referenced DDT block will get evicted from the ARC to the L2ARC then evicted from the L2ARC. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Mon, Apr 25, 2011 at 10:55 AM, Erik Trimble erik.trim...@oracle.com wrote: Min block size is 512 bytes. Technically, isn't the minimum block size 2^(ashift value)? Thus, on 4 KB disks where the vdevs have an ashift=12, the minimum block size will be 4 KB. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)
On Mon, Apr 25, 2011 at 8:20 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: and 128k assuming default recordsize. (BTW, recordsize seems to be a zfs property, not a zpool property. So how can you know or configure the blocksize for something like a zvol iscsi target?) zvols use the 'volblocksize' property, which defaults to 8k. A 1TB zvol is therefore 2^27 blocks and would require ~ 34 GB for the ddt (assuming that a ddt entry is 270 bytes). The zfs man page for the property reads: volblocksize=blocksize For volumes, specifies the block size of the volume. The blocksize cannot be changed once the volume has been written, so it should be set at volume creation time. The default blocksize for volumes is 8 Kbytes. Any power of 2 from 512 bytes to 128 Kbytes is valid. This property can also be referred to by its shortened column name, volblock. -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss