Re: [zfs-discuss] ZFS space efficiency when copying files from

2008-12-01 Thread BJ Quinn
Oh. Yup, I had figured this out on my own but forgot to post back. --inplace accomplishes what we're talking about. --no-whole-file is also necessary if copying files locally (not over the network), because rsync does default to only copying changed blocks, but it overrides that default

Re: [zfs-discuss] ZFS space efficiency when copying files from

2008-12-01 Thread Darren J Moffat
BJ Quinn wrote: Oh. Yup, I had figured this out on my own but forgot to post back. --inplace accomplishes what we're talking about. --no-whole-file is also necessary if copying files locally (not over the network), because rsync does default to only copying changed blocks, but it

Re: [zfs-discuss] ZFS space efficiency when copying files from

2008-12-01 Thread BJ Quinn
Should I set that as rsync's block size? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread BJ Quinn
Here's an idea - I understand that I need rsync on both sides if I want to minimize network traffic. What if I don't care about that - the entire file can come over the network, but I specifically only want rsync to write the changed blocks to disk. Does rsync offer a mode like that? -- This

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Bob Friesenhahn
On Mon, 24 Nov 2008, BJ Quinn wrote: Here's an idea - I understand that I need rsync on both sides if I want to minimize network traffic. What if I don't care about that - the entire file can come over the network, but I specifically only want rsync to write the changed blocks to disk.

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Erik Trimble
Bob Friesenhahn wrote: On Mon, 24 Nov 2008, BJ Quinn wrote: Here's an idea - I understand that I need rsync on both sides if I want to minimize network traffic. What if I don't care about that - the entire file can come over the network, but I specifically only want rsync to write

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Bob Friesenhahn
On Mon, 24 Nov 2008, Erik Trimble wrote: One note here for ZFS users: On ZFS (or any other COW filesystem), rsync unfortunately does NOT do the Right Thing when syncing an existing file. From ZFS's standpoint, the most efficient way would be merely to rewrite the changed blocks, thus

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-24 Thread Albert Chin
On Mon, Nov 24, 2008 at 08:43:18AM -0800, Erik Trimble wrote: I _really_ wish rsync had an option to copy in place or something like that, where the updates are made directly to the file, rather than a temp copy. Isn't this what --inplace does? -- albert chin ([EMAIL PROTECTED])

Re: [zfs-discuss] ZFS space efficiency when copying files from

2008-11-24 Thread Al Tobey
Rsync can update in-place. From rsync(1): --inplace update destination files in-place -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] ZFS space efficiency when copying files from

2008-11-24 Thread Erik Trimble
Al Tobey wrote: Rsync can update in-place. From rsync(1): --inplace update destination files in-place Whee! This is now newly working (for me). I've been using an older rsync, where this option didn't work properly on ZFS. It looks like this was fixed on newer

[zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread BJ Quinn
We're considering using an OpenSolaris server as a backup server. Some of the servers to be backed up would be Linux and Windows servers, and potentially Windows desktops as well. What I had imagined was that we could copy files over to the ZFS-based server nightly, take a snapshot, and only

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread BJ Quinn
Thank you both for your responses. Let me see if I understand correctly - 1. Dedup is what I really want, but it's not implemented yet. 2. The only other way to accomplish this sort of thing is rsync (in other words, don't overwrite the block in the first place if it's not different), and

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread Will Murnane
On Mon, Nov 17, 2008 at 20:54, BJ Quinn [EMAIL PROTECTED] wrote: 1. Dedup is what I really want, but it's not implemented yet. Yes, as I read it. greenBytes [1] claims to have dedup on their system; you might investigate them if you decide rsync won't work for your application. 2. The only

Re: [zfs-discuss] ZFS space efficiency when copying files from another source

2008-11-17 Thread Tim
On Mon, Nov 17, 2008 at 3:33 PM, Will Murnane [EMAIL PROTECTED]wrote: On Mon, Nov 17, 2008 at 20:54, BJ Quinn [EMAIL PROTECTED] wrote: 1. Dedup is what I really want, but it's not implemented yet. Yes, as I read it. greenBytes [1] claims to have dedup on their system; you might investigate

Re: [zfs-discuss] zfs space efficiency

2007-07-07 Thread -=dave
agreed. while a bitwise check is the only assured way to determine duplicative nature of two blocks, if the check were done in a streaming method as you suggest, performance, while a huge impact compared to not, would be more than bearable if used within an environment with large known levels

Re: [zfs-discuss] zfs space efficiency

2007-07-07 Thread -=dave
one other thing... the checksums for all files to send *could* be checked first in batch and known unique blocks prioritized and sent first, then the possibly duplicative data sent afterwards to be verified a dupe, thereby decreasing the possible data loss for the backup window to levels

Re: [zfs-discuss] zfs space efficiency

2007-07-02 Thread J. David Beutel
Mattias Pantzare wrote: For this application (deduplication data) the likelihood of matching hashes are very high. In fact it has to be, otherwise there would not be any data to deduplicate. In the cp example, all writes would have matching hashes and all need a verify. Would the read for

Re: [zfs-discuss] zfs space efficiency

2007-06-30 Thread Mattias Pantzare
2007/6/25, [EMAIL PROTECTED] [EMAIL PROTECTED]: I wouldn't de-duplicate without actually verifying that two blocks were actually bitwise identical. Absolutely not, indeed. But the nice property of hashes is that if the hashes don't match then the inputs do not either. I.e., the likelyhood of

Re: [zfs-discuss] zfs space efficiency

2007-06-25 Thread Bill Sommerfeld
PROTECTED]; zfs-discuss@opensolaris.org Sent: Sunday, June 24, 2007 3:58 PM Subject: Re: [zfs-discuss] zfs space efficiency On Sun, Jun 24, 2007 at 03:39:40PM -0700, Erik Trimble wrote: Matthew Ahrens wrote: Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now

Re: [zfs-discuss] zfs space efficiency

2007-06-25 Thread Casper . Dik
I wouldn't de-duplicate without actually verifying that two blocks were actually bitwise identical. Absolutely not, indeed. But the nice property of hashes is that if the hashes don't match then the inputs do not either. I.e., the likelyhood of having to do a full bitwise compare is

Re: [zfs-discuss] zfs space efficiency

2007-06-25 Thread Bill Sommerfeld
[This is version 2. the first one escaped early by mistake] On Sun, 2007-06-24 at 16:58 -0700, dave johnson wrote: The most common non-proprietary hash calc for file-level deduplication seems to be the combination of the SHA1 and MD5 together. Collisions have been shown to exist in MD5 and

Re: [zfs-discuss] zfs space efficiency

2007-06-25 Thread Erik Trimble
Bill Sommerfeld wrote: [This is version 2. the first one escaped early by mistake] On Sun, 2007-06-24 at 16:58 -0700, dave johnson wrote: The most common non-proprietary hash calc for file-level deduplication seems to be the combination of the SHA1 and MD5 together. Collisions have been

Re: [zfs-discuss] zfs space efficiency

2007-06-25 Thread Frank Cusack
On June 25, 2007 1:02:38 PM -0700 Erik Trimble [EMAIL PROTECTED] wrote: algorithms. I think (as Casper said), that should you need to, use SHA to weed out the cases where the checksums are different (since, that definitively indicates they are different), then do a bitwise compare on any that

Re: [zfs-discuss] zfs space efficiency

2007-06-24 Thread Matthew Ahrens
Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? Not really. Different apps want different behavior in their copying, so you'd have to expose a

Re: [zfs-discuss] zfs space efficiency

2007-06-24 Thread Erik Trimble
Matthew Ahrens wrote: Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? Not really. Different apps want different behavior in their copying, so

Re: [zfs-discuss] zfs space efficiency

2007-06-24 Thread Gary Mills
On Sun, Jun 24, 2007 at 03:39:40PM -0700, Erik Trimble wrote: Matthew Ahrens wrote: Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus abstracting it away from the userland app? A copyfile

Re: [zfs-discuss] zfs space efficiency

2007-06-24 Thread dave johnson
[EMAIL PROTECTED] Cc: Matthew Ahrens [EMAIL PROTECTED]; roland [EMAIL PROTECTED]; zfs-discuss@opensolaris.org Sent: Sunday, June 24, 2007 3:58 PM Subject: Re: [zfs-discuss] zfs space efficiency On Sun, Jun 24, 2007 at 03:39:40PM -0700, Erik Trimble wrote: Matthew Ahrens wrote: Will Murnane wrote

Re: [zfs-discuss] zfs space efficiency

2007-06-24 Thread Torrey McMahon
PM Subject: Re: [zfs-discuss] zfs space efficiency On Sun, Jun 24, 2007 at 03:39:40PM -0700, Erik Trimble wrote: Matthew Ahrens wrote: Will Murnane wrote: On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Now, wouldn't it be nice to have syscalls which would implement cp and mv, thus

[zfs-discuss] zfs space efficiency

2007-06-23 Thread roland
hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical versions and thus want to make a snapshot

Re: [zfs-discuss] zfs space efficiency

2007-06-23 Thread Matthew Ahrens
Erik Trimble wrote: roland wrote: hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical

Re: [zfs-discuss] zfs space efficiency

2007-06-23 Thread Erik Trimble
roland wrote: hello ! i think of using zfs for backup purpose of large binary data files (i.e. vmware vm`s, oracle database) and want to rsync them in regular interval from other systems to one central zfs system with compression on. i`d like to have historical versions and thus want to make

Re: [zfs-discuss] zfs space efficiency

2007-06-23 Thread Darren Dunham
if i have one large datafile on zfs, make a snapshot from that zfs fs holding it and then overwrting that file by a newer version with slight differences inside - what about the real disk consumption on the zfs side ? If all the blocks are rewritten, then they're all new blocks as far as ZFS

Re: [zfs-discuss] zfs space efficiency

2007-06-23 Thread Erik Trimble
Matthew Ahrens wrote: Erik Trimble wrote: Under ZFS, any equivalent to 'cp A B' takes up no extra space. The metadata is updated so that B points to the blocks in A. Should anyone begin writing to B, only the updated blocks are added on disk, with the metadata for B now containing the proper

Re: [zfs-discuss] zfs space efficiency

2007-06-23 Thread Will Murnane
On 6/23/07, Erik Trimble [EMAIL PROTECTED] wrote: Matthew Ahrens wrote: Basically, the descriptions of Copy on Write. Or does this apply only to Snapshots? My original understanding was that CoW applied whenever you were making a duplicate of an existing file. CoW happens all the time. If