I am suspicious of dedup ratios in general. What I found is that I can divide my data by 4 and be fairly accurate as to how much storage the DD will need. This formula has worked for 2 TSM (12-14:1) and 2 BE (20-25:1) sites, so I would not call it proven, expect in my little world. BRMS seems to be different.
Andy Huebner Perhaps this conversation should be at: The Data Domain Admins List http://lists.ufl.edu/cgi-bin/wa?A0=DD-ADMINS-L -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[email protected]] On Behalf Of Shawn Drew Sent: Thursday, April 19, 2012 11:12 AM To: [email protected] Subject: Re: [ADSM-L] DataDomain and dedup per node I was told the only reason EMC recommends to turn off collocation is that collocation on shoots up the individual volume count-generally and they also recommend a relatively high reclamation threshold. I think these 2 factors together might end up in a lost of wasted unreclaimed space. I think it would be ok if you were more aggressive with your reclamation. Something to keep an eye on at the least. On another note, I've always been suspicious of whether or not granular analysis like this is accurate. The deduplication of a single file would vary depending on the other data that is on the system, which is constantly changing. If you delete all the other files that share data with this one, will the deduplication factor of this file should shoot up? If so, than the deduplication ratio means nothing for a single file like a compression ratio would. I think it really only applies to the storage pool as a whole. Using collocation to identify "bad dedupe citizens" sounds reasonable, but only if the values being returned by the "filesys show compression" command is accurate. Is that data dynamically updated? Are the individual file deduplication ratios immediately update automatically as data is written or cleaned? I remember Falconstor only recorded the deduplication ratio of a virtual tape at the time the data was written and was not updated. I find it hard to believe this is dynamically maintained by the data domain, but I'd definitely want to know before switching to colocation for this purpose. Deduplication adds an abstraction layer between the file metadata and the actual storage. I don't see how you could really get an accurate picture of the true storage an individual file is occupying since it is sharing space. Say there are 10x 100MB files sharing 50 percent of their data with each other. How much space is one of those files occupying? Regards, Shawn ________________________________________________ Shawn Drew Internet [email protected] Sent by: [email protected] 04/19/2012 09:27 AM Please respond to [email protected] To ADSM-L cc Subject [ADSM-L] DataDomain and dedup per node Hi Everyone, As we have been implementing our two new DD boxes we have been setting them up like our existing two DD boxes - file devices with the pool NOT collocated. This is what DD recommends and it seems to work very well this way. But, I've been thinking about collocating anyway! I was poking around the DD command line and found that you can get the dedup/compression information for any individual directory or file. For example, below is the dedup/comp factors for a file volume in a pool with one node I'm testing with: rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q nodedata WVLOGS01P" | grep isdd2260 WVLOGS01p /isdd2260/tsm2/test/0002267E.BFS TEST-PRI-ISDD2260 30,551.83 WVLOGS01P /isdd2260/tsm2/test/0002267F.BFS TEST-PRI-ISDD2260 30,621.15 WVLOGS01P /isdd2260/tsm2/test/00022680.BFS TEST-PRI-ISDD2260 30,601.55 WVLOGS01P /isdd2260/tsm2/test/00022682.BFS TEST-PRI-ISDD2260 30,604.08 WVLOGS01P /isdd2260/tsm2/test/00022683.BFS TEST-PRI-ISDD2260 30,620.86 WVLOGS01P /isdd2260/tsm2/test/00022684.BFS TEST-PRI-ISDD2260 4,731.24 rsbkup:/tsmdata/tsm_scripts==>./run_cmd.ksh tsm2 "q vol /isdd2260/tsm2/test/0002267E.BFS" /isdd2260/tsm2/test/0002267E.BFS TEST-PRI-ISDD2260 TEST 30.6 G 100.0 Full sysadmin@isdd2260# filesys show compression /data/col1/tsm2/test/0002267e.bfs Total files: 1; bytes/storage_used: 4.6 Original Bytes: 32,332,636,620 Globally Compressed: 30,695,597,675 Locally Compressed: 6,930,888,022 Meta-data: 98,615,480 In this case, this vol is getting a 4.6x overall dedup/comp factor. So, if I collocate the pool in TSM I should be able to use "q nodedata <node>" to get a list of vols used by a node, then I can query the DD to get the dedup/comp stats for that node. A little scripting and I can generate a report of dedup/comp ratios by TSM node. This would help us maintain which nodes make sense to put/keep on the DD. Just curious if anyone is using collocation for a DD file pool? To do so would use more volumes and more filling volumes, but I can't think of any real reason to not collocate. Rick ----------------------------------------- The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message. This message and any attachments (the "message") is intended solely for the addressees and is confidential. If you receive this message in error, please delete it and immediately notify the sender. Any use not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. The internet can not guarantee the integrity of this message. BNP PARIBAS (and its subsidiaries) shall (will) not therefore be liable for the message if modified. Please note that certain functions and services for BNP Paribas may be performed by BNP Paribas RCC, Inc. This e-mail (including any attachments) is confidential and may be legally privileged. If you are not an intended recipient or an authorized representative of an intended recipient, you are prohibited from using, copying or distributing the information in this e-mail or its attachments. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete all copies of this message and any attachments. Thank you.
