Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-27 Thread Tim.Kreis
The problem is that the windows server backup seems to choose dynamic vhd (which would make sense in most cases) and I dont know if there is a way to change that. Using ISCSI-volumes wont help in my case since servers are running on physical hardware. Am 27.04.2010 01:54, schrieb Brandon

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-27 Thread Roy Sigurd Karlsbakk
- Tim.Kreis tim.kr...@gmx.de skrev: The problem is that the windows server backup seems to choose dynamic vhd (which would make sense in most cases) and I dont know if there is a way to change that. Using ISCSI-volumes wont help in my case since servers are running on physical

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Constantin Gonzalez
Hi Tim, thanks for sharing your dedup experience. Especially for Virtualization, having a good pool of experience will help a lot of people. So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on the same ZFS backing store, if I understand you correctly. What dedup

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread tim Kries
Hi, The setting was this: Fresh installation of 2008 R2 - server backup with the backup feature - move vhd to zfs - install active directory role - backup again - move vhd to same share I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread tim Kries
I found the VHD specification here: http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc I am not sure if i understand it right, but it seems like data on disk gets compressed into the vhd (no empty space), so even

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Brandon High
On Mon, Apr 26, 2010 at 8:51 AM, tim Kries tim.kr...@gmx.de wrote: I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks. Dedup works on the blocks or either recordsize or volblocksize. The checksum is made per block written, and

[zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
Hi, I am playing with opensolaris a while now. Today i tried to deduplicate the backup VHD files Windows Server 2008 generates. I made a backup before and after installing AD-role and copied the files to the share on opensolaris (build 134). First i got a straight 1.00x, then i set recordsize

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread Richard Jahnel
You might note, dedupe only dedupes data that is writen after the flag is set. It does not retroactivly dedupe already writen data. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
It was active all the time. Made a new zfs with -o dedup=on, copied with default record size, got no dedup, deleted files, set recordsize 4k, dedup ratio 1.29x -- This message posted from opensolaris.org ___ zfs-discuss mailing list

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread Khyron
A few things come to mind... 1. A lot better than...what? Setting the recordsize to 4K got you some deduplication but maybe the pertinent question is what were you expecting? 2. Dedup is fairly new. I haven't seen any reports of experiments like yours so...CONGRATULATIONS!! You're probably

Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-23 Thread tim Kries
Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right. -- This message posted from opensolaris.org ___ zfs-discuss

Re: [zfs-discuss] ZFS Deduplication Replication

2009-11-24 Thread Peter Brouwer, Principal Storage Architect
Hi Darren, Could you post the -D part of the man pages? I have no access to a system (yet) with the latest man pages. http://docs.sun.com/app/docs/doc/819-2240/zfs-1m has not been updated yet. Regards Peter Darren J Moffat wrote: Steven Sim wrote: Hello; Dedup on ZFS is an

[zfs-discuss] ZFS Deduplication Replication

2009-11-16 Thread Steven Sim
Hello; Dedup on ZFS is an absolutely wonderful feature! Is there a way to conduct dedup replication across boxes from one dedup ZFS data set to another? Warmest Regards Steven Sim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] ZFS Deduplication Replication

2009-11-16 Thread Darren J Moffat
Steven Sim wrote: Hello; Dedup on ZFS is an absolutely wonderful feature! Is there a way to conduct dedup replication across boxes from one dedup ZFS data set to another? Pass the '-D' argument to 'zfs send'. -- Darren J Moffat ___ zfs-discuss

Re: [zfs-discuss] Zfs deduplication

2009-08-03 Thread Wes Felter
Dave McDorman wrote: I don't think is at liberty to discuss ZFS Deduplication at this point in time: Did Jeff Bonwick and Bill Moore give a presentation at kernel.conf.au or not? If so, did anyone see the presentation? Did the conference attendees all sign NDAs or something? Wes Felter

Re: [zfs-discuss] Zfs deduplication

2009-08-03 Thread James C. McPherson
On Mon, 03 Aug 2009 18:26:44 -0500 Wes Felter wes...@felter.org wrote: Dave McDorman wrote: I don't think is at liberty to discuss ZFS Deduplication at this point in time: Did Jeff Bonwick and Bill Moore give a presentation at kernel.conf.au or not? Yes they did - a keynote, and they

Re: [zfs-discuss] Zfs deduplication

2009-08-03 Thread Andre van Eyssen
On Tue, 4 Aug 2009, James C. McPherson wrote: If so, did anyone see the presentation? Yes. Everybody who attended. You know, I think we might even have some evidence of their attendance! http://mexico.purplecow.org/static/kca_spk/tn/IMG_2177.jpg.html

[zfs-discuss] Zfs deduplication

2009-07-31 Thread Ashley Avileli
Will the material ever be posted. Looks there is some huge bugs with zfs deduplication that the organizers do not want to post it also there is no indication on sun website if there will be a deduplication feature. I think its best they concentrate on improving zfs performance and speed with

Re: [zfs-discuss] Zfs deduplication

2009-07-31 Thread Dave McDorman
I don't think is at liberty to discuss ZFS Deduplication at this point in time: http://www.itworld.com/storage/71307/sun-tussles-de-duplication-startup Hopefully, the matter is resolved and discussions can proceed openly. Send lawyers, guns and money. - Warren Zevon -- This message posted from

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Jim Klimov
Ok, thank you Nils, Wade for the concise replies. After much reading I agree that the ZFS-development queued features do deserve a higher ranking on the priority list (pool-shrinking/disk-removal and user/group quotas would be my favourites), so probably the deduplication tool I'd need would,

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Richard Elling
Jim Klimov wrote: Ok, thank you Nils, Wade for the concise replies. After much reading I agree that the ZFS-development queued features do deserve a higher ranking on the priority list (pool-shrinking/disk-removal and user/group quotas would be my favourites), so probably the deduplication

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Wade . Stuart
Does some script-usable ZFS API (if any) provide for fetching block/file hashes (checksums) stored in the filesystem itself? In fact, am I wrong to expect file-checksums to be readily available? Yes. Files are not checksummed, blocks are checksummed. -- richard Further, even if

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Bob Friesenhahn
On Tue, 26 Aug 2008, Darren J Moffat wrote: zfs set checksum=sha256 Expect performance to really suck after setting this. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Darren J Moffat
Bob Friesenhahn wrote: On Tue, 26 Aug 2008, Darren J Moffat wrote: zfs set checksum=sha256 Expect performance to really suck after setting this. Do you have evidence of that ? What kind of workload and how did you test it ? I've recently been benchmarking using filebench filemicro and

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Keith Bierman
On Aug 26, 2008, at 9:58 AM, Darren J Moffat wrote: than a private copy. I wouldn't expect that to have too big an impact (I On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to have a large impact? -- Keith H. Bierman [EMAIL PROTECTED] | AIM kbiermank 5430

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Mike Gerdts
On Tue, Aug 26, 2008 at 10:58 AM, Darren J Moffat [EMAIL PROTECTED] wrote: In the interest of full disclosure I have changed the sha256.c in the ZFS source to use the default kernel one via the crypto framework rather than a private copy. I wouldn't expect that to have too big an impact (I

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Bob Friesenhahn
On Tue, 26 Aug 2008, Darren J Moffat wrote: Bob Friesenhahn wrote: On Tue, 26 Aug 2008, Darren J Moffat wrote: zfs set checksum=sha256 Expect performance to really suck after setting this. Do you have evidence of that ? What kind of workload and how did you test it I did some random

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Darren J Moffat
Keith Bierman wrote: On Aug 26, 2008, at 9:58 AM, Darren J Moffat wrote: than a private copy. I wouldn't expect that to have too big an impact (I On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to have a large impact? UltraSPARC T1 has no hardware SHA256 so I

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Darren J Moffat
Mike Gerdts wrote: On Tue, Aug 26, 2008 at 10:58 AM, Darren J Moffat [EMAIL PROTECTED] wrote: In the interest of full disclosure I have changed the sha256.c in the ZFS source to use the default kernel one via the crypto framework rather than a private copy. I wouldn't expect that to have too

Re: [zfs-discuss] ZFS deduplication

2008-08-26 Thread Keith Bierman
On Tue, Aug 26, 2008 at 10:11 AM, Darren J Moffat [EMAIL PROTECTED]wrote: Keith Bierman wrote: On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to have a large impact? UltraSPARC T1 has no hardware SHA256 so I wouldn't expect any real change from running the private

Re: [zfs-discuss] ZFS deduplication

2008-08-25 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 08/22/2008 04:26:35 PM: Just my 2c: Is it possible to do an offline dedup, kind of like snapshotting? What I mean in practice, is: we make many Solaris full-root zones. They share a lot of data as complete files. This is kind of easy to save space - make one

Re: [zfs-discuss] ZFS deduplication

2008-08-22 Thread Jim Klimov
Just my 2c: Is it possible to do an offline dedup, kind of like snapshotting? What I mean in practice, is: we make many Solaris full-root zones. They share a lot of data as complete files. This is kind of easy to save space - make one zone as a template, snapshot/clone its dataset, make new

Re: [zfs-discuss] ZFS deduplication

2008-07-23 Thread Justin Stringfellow
with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. snip very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are

Re: [zfs-discuss] ZFS deduplication

2008-07-23 Thread Mike Gerdts
On Tue, Jul 22, 2008 at 10:44 PM, Erik Trimble [EMAIL PROTECTED] wrote: More than anything, Bob's reply is my major feeling on this. Dedup may indeed turn out to be quite useful, but honestly, there's no broad data which says that it is a Big Win (tm) _right_now_, compared to finishing other

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are a lot of choices

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM: Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia.

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Erik Trimble
Chris Cosby wrote: On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Richard Elling
FWIW, Sun's VTL products use ZFS and offer de-duplication services. http://www.sun.com/aboutsun/pr/2008-04/sunflash.20080407.2.xml -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 11:48:30 AM: Chris Cosby wrote: On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Mike Gerdts
On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote: No, you are right to be concerned over block-level dedup seriously impacting seeks. The problem is that, given many common storage scenarios, you will have not just similar files, but multiple common sections of many

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Charles Soto
On 7/22/08 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote: I'm still not convinced that dedup is really worth it for anything but very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Bob Friesenhahn
On Tue, 22 Jul 2008, Erik Trimble wrote: Dedup Disadvantages: Obviously you do not work in the Sun marketing department which is intrested in this feature (due to some other companies marketing it). Note that the topic starter post came from someone in Sun's marketing department. I think

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Miles Nordin
et == Erik Trimble [EMAIL PROTECTED] writes: et Dedup Advantages: et (1) save space (2) coalesce data which is frequently used by many nodes in a large cluster into a small nugget of common data which can fit into RAM or L2 fast disk (3) back up non-ZFS filesystems that don't

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Bob Friesenhahn
On Tue, 22 Jul 2008, Miles Nordin wrote: scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night (fast

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Erik Trimble
Bob Friesenhahn wrote: On Tue, 22 Jul 2008, Erik Trimble wrote: Dedup Disadvantages: Obviously you do not work in the Sun marketing department which is intrested in this feature (due to some other companies marketing it). Note that the topic starter post came from someone in

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
On Tue, 22 Jul 2008, Miles Nordin wrote: scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Justin Stringfellow
Raw storage space is cheap. Managing the data is what is expensive. Not for my customer. Internal accounting means that the storage team gets paid for each allocated GB on a monthly basis. They have stacks of IO bandwidth and CPU cycles to spare outside of their daily busy period. I can't

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Justin Stringfellow
Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the following blog..:

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Erik Trimble
Justin Stringfellow wrote: Raw storage space is cheap. Managing the data is what is expensive. Not for my customer. Internal accounting means that the storage team gets paid for each allocated GB on a monthly basis. They have stacks of IO bandwidth and CPU cycles to spare outside of

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Ross
Just going to make a quick comment here. It's a good point about wanting backup software to support this, we're a much smaller company but it's already more difficult to manage the storage needed for backups than our live storage. However, we're actively planning that over the next 12 months,

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Wade . Stuart
Even better would be using the ZFS block checksums (assuming we are only summing the data, not it's position or time :)... Then we could have two files that have 90% the same blocks, and still get some dedup value... ;) Yes, but you will need to add some sort of highly collision resistant

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM: Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the following blog..:

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Darren J Moffat
[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM: Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Richard Elling
Justin Stringfellow wrote: Raw storage space is cheap. Managing the data is what is expensive. Not for my customer. Internal accounting means that the storage team gets paid for each allocated GB on a monthly basis. They have stacks of IO bandwidth and CPU cycles to spare outside of

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Keith Bierman
On Jul 8, 2008, at 11:00 AM, Richard Elling wrote: much fun for people who want to hide costs. For example, some bright manager decided that they should charge $100/month/port for ethernet drops. So now, instead of having a centralized, managed network with well defined port mappings,

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Bob Friesenhahn
On Tue, 8 Jul 2008, Richard Elling wrote: [donning my managerial accounting hat] It is not a good idea to design systems based upon someone's managerial accounting whims. These are subject to change in illogical ways at unpredictable intervals. This is why managerial accounting can be so

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Bob Friesenhahn
Something else came to mind which is a negative regarding deduplication. When zfs writes new sequential files, it should try to allocate blocks in a way which minimizes fragmentation (disk seeks). Disk seeks are the bane of existing storage systems since they come out of the available IOPS

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread David Collier-Brown
Hmmn, you might want to look at Andrew Tridgell's' thesis (yes, Andrew of Samba fame), as he had to solve this very question to be able to select an algorithm to use inside rsync. --dave Darren J Moffat wrote: [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM:

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/08/2008 01:26:15 PM: Something else came to mind which is a negative regarding deduplication. When zfs writes new sequential files, it should try to allocate blocks in a way which minimizes fragmentation (disk seeks). Disk seeks are the bane of existing storage

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Moore, Joe
Bob Friesenhahn wrote: Something else came to mind which is a negative regarding deduplication. When zfs writes new sequential files, it should try to allocate blocks in a way which minimizes fragmentation (disk seeks). It should, but because of its copy-on-write nature, fragmentation

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Bob Friesenhahn
On Tue, 8 Jul 2008, Moore, Joe wrote: On ZFS, sequential files are rarely sequential anyway. The SPA tries to keep blocks nearby, but when dealing with snapshotted sequential files being rewritten, there is no way to keep everything in order. I think that rewriting files (updating existing

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Mike Gerdts
On Tue, Jul 8, 2008 at 12:25 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Tue, 8 Jul 2008, Richard Elling wrote: [donning my managerial accounting hat] It is not a good idea to design systems based upon someone's managerial accounting whims. These are subject to change in illogical ways at

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Mike Gerdts
On Tue, Jul 8, 2008 at 1:26 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: Something else came to mind which is a negative regarding deduplication. When zfs writes new sequential files, it should try to allocate blocks in a way which minimizes fragmentation (disk seeks). Disk seeks are the bane

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Jonathan Loran
Tim Spriggs wrote: Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our dataset: HiRISE has a large set of spacecraft data (images) that could

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Jonathan Loran
Justin Stringfellow wrote: Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our Check out the following blog..:

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Jonathan Loran
Moore, Joe wrote: On ZFS, sequential files are rarely sequential anyway. The SPA tries to keep blocks nearby, but when dealing with snapshotted sequential files being rewritten, there is no way to keep everything in order. In some cases, a d11p system could actually speed up data reads

Re: [zfs-discuss] ZFS deduplication

2008-07-08 Thread Richard Elling
Mike Gerdts wrote: [I agree with the comments in this thread, but... I think we're still being old fashioned...] Imagine if university students were allowed to use as much space as they wanted but had to pay a per megabyte charge every two weeks or their account is terminated? This would

[zfs-discuss] ZFS deduplication

2008-07-07 Thread Mertol Ozyoney
Hi All ; Is there any hope for deduplication on ZFS ? Mertol http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Neil Perrin
Mertol, Yes, dedup is certainly on our list and has been actively discussed recently, so there's hope and some forward progress. It would be interesting to see where it fits into our customers priorities for ZFS. We have a long laundry list of projects. In addition there's bug fixes performance

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
A really smart nexus for dedup is right when archiving takes place. For systems like EMC Centera, dedup is basically a byproduct of checksumming. Two files with similar metadata that have the same hash? They're identical. Charles On 7/7/08 4:25 PM, Neil Perrin [EMAIL PROTECTED] wrote:

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Nathan Kroenert
Even better would be using the ZFS block checksums (assuming we are only summing the data, not it's position or time :)... Then we could have two files that have 90% the same blocks, and still get some dedup value... ;) Nathan. Charles Soto wrote: A really smart nexus for dedup is right when

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Jonathan Loran
Neil Perrin wrote: Mertol, Yes, dedup is certainly on our list and has been actively discussed recently, so there's hope and some forward progress. It would be interesting to see where it fits into our customers priorities for ZFS. We have a long laundry list of projects. In addition

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Bob Friesenhahn
On Tue, 8 Jul 2008, Nathan Kroenert wrote: Even better would be using the ZFS block checksums (assuming we are only summing the data, not it's position or time :)... Then we could have two files that have 90% the same blocks, and still get some dedup value... ;) It seems that the hard

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Bob Friesenhahn
On Mon, 7 Jul 2008, Jonathan Loran wrote: use ZFS is as nearline storage for backup data. I have a 16TB server that provides a file store for an EMC Networker server. I'm seeing a compressratio of 1.73, which is mighty impressive, since we also use native EMC compression during the backups.

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Brian Hechinger
On Mon, Jul 07, 2008 at 07:56:26PM -0500, Bob Friesenhahn wrote: This deduplication technology seems similar to the Microsoft adds I see on TV which advertise how their new technology saves the customer Quantum's claim of 20:1 just doesn't jive in my head, either, for some reason. -brian

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Mike Gerdts
On Mon, Jul 7, 2008 at 7:40 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: The actual benefit of data deduplication to an enterprise seems negligible unless the backup system directly supports it. ?In the enterprise the cost of storage has more to do with backing up the data than the amount of

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Maurice Castro
I second this, provided we also check that the data is in fact identical as well. Checksum collisions are likely given the sizes of disks and the sizes of checksums; and some users actually deliberately generate data with colliding checksums (researchers and nefarious users). Dedup must be

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
Good points. I see the archival process as a good candidate for adding dedup because it is essentially doing what a stage/release archiving system already does - faking the existence of data via metadata. Those blocks aren't actually there, but they're still accessible because they're

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Bob Friesenhahn
On Mon, 7 Jul 2008, Mike Gerdts wrote: As I have considered deduplication for application data I see several things happen in various areas. You have provided an excellent description of gross inefficiencies in the way systems and software are deployed today, resulting in massive

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Mike Gerdts
On Mon, Jul 7, 2008 at 9:24 PM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Mon, 7 Jul 2008, Mike Gerdts wrote: There tend to be organizational walls between those that manage storage and those that consume it. As storage is distributed across a network (NFS, iSCSI, FC) things like

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Charles Soto
Oh, I agree. Much of the duplication described is clearly the result of bad design in many of our systems. After all, most of an OS can be served off the network (diskless systems etc.). But much of the dupe I'm talking about is less about not using the most efficient system administration

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Mike Gerdts
On Mon, Jul 7, 2008 at 11:07 PM, Charles Soto [EMAIL PROTECTED] wrote: So, while much of the situation is caused by bad data management, there aren't always systems we can employ that prevent it. Done right, dedup can certainly be worth it for my operations. Yes, teaching the user the right

Re: [zfs-discuss] ZFS deduplication

2008-07-07 Thread Tim Spriggs
Does anyone know a tool that can look over a dataset and give duplication statistics? I'm not looking for something incredibly efficient but I'd like to know how much it would actually benefit our dataset: HiRISE has a large set of spacecraft data (images) that could potentially have large