OK, I just re-looked at a couple of things, and here's what I /think/ is the correct numbers.
A single entry in the DDT is defined in the struct "ddt_entry" : http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/ddt.h#108 I just checked, and the current size of this structure is 0x178, or 376 bytes. Each ARC entry, which points to either an L2ARC item (of any kind, cached data, metadata, or a DDT line) or actual data/metadata/etc., is defined in the struct "arc_buf_hdr" : http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#431 It's current size is 0xb0, or 176 bytes. These are fixed-size structures. -------- PLEASE - someone correct me if these two structures AREN'T what we should be looking at. -------- So, our estimate calculations have to be based on these new numbers. Back to the original scenario: 1TB (after dedup) of 4k blocks: how much space is needed for the DDT, and how much ARC space is needed if the DDT is kept in a L2ARC cache device? Step 1) 1TB (2^40 bytes) stored in blocks of 4k (2^12) = 2^28 blocks total, which is about 268 million. Step 2) 2^28 blocks of information in the DDT requires 376 bytes/block * 2^28 blocks = 94 * 2^30 = 94 GB of space. Step 3) Storing a reference to 268 million (2^28) DDT entries in the L2ARC will consume the following amount of ARC space: 176 bytes/entry * 2^28 entries = 44GB of RAM. That's pretty ugly. So, to summarize: For 1TB of data, broken into the following block sizes: DDT size ARC consumption 512b 752GB (73%) 352GB (34%) 4k 94GB (9%) 44GB (4.3%) 8k 47GB (4.5%) 22GB (2.1%) 32k 11.75GB (2.2%) 5.5GB (0.5%) 64k 5.9GB (1.1%) 2.75GB (0.3%) 128k 2.9GB% (0.6%) 1.4GB (0.1%) ARC consumption presumes the whole DDT is stored in the L2ARC. Percentage size is relative to the original 1TB total data size Of course, the trickier proposition here is that we DON'T KNOW what our dedup value is ahead of time on a given data set. That is, given a data set of X size, we don't know how big the deduped data size will be. The above calculations are for DDT/ARC size for a data set that has already been deduped down to 1TB in size. Perhaps it would be nice to have some sort of userland utility that builds it's own DDT as a test and does all the above calculations, to see how dedup would work on a given dataset. 'zdb -S' sorta, kinda does that, but... -- Erik Trimble Java System Support Mailstop: usca22-317 Phone: x67195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss