[zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Peter Taps
Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been turned on.

Let's say I copy the same file twice under different names at the initiator 
end. Let's say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block 
from the other file. Essentially, each pair of block being compared must have 
the same start location into the actual data.

For a shared filesystem, ZFS may internally ensure that the block starts match. 
However, over iSCSI, the initiator does not even know about the whole block 
mechanism that zfs has. It is just sending raw bytes to the target. This makes 
me wonder if dedup actually works over iSCSI. 

Can someone please enlighten me on what I am missing?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Neil Perrin

On 10/22/10 15:34, Peter Taps wrote:

Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been turned on.

Let's say I copy the same file twice under different names at the initiator 
end. Let's say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block 
from the other file. Essentially, each pair of block being compared must have 
the same start location into the actual data.
  


No,  ZFS doesn't care about the file offset, just that the checksum of 
the blocks matches.


For a shared filesystem, ZFS may internally ensure that the block starts match. However, over iSCSI, the initiator does not even know about the whole block mechanism that zfs has. It is just sending raw bytes to the target. This makes me wonder if dedup actually works over iSCSI. 


Can someone please enlighten me on what I am missing?

Thank you in advance for your help.

Regards,
Peter
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Peter Taps
Hi Neil,

if the file offset does not match, the chances that the checksum would match, 
especially sha256, is almost 0.

May be I am missing something. Let's say I have a file that contains 11 letters 
- ABCDEFGHIJK. Let's say the block size is 5.

For the first file, the block contents are ABCDE, FGHIJ, and K.

For the second file, let's say the blocks are  ABCD, EFGHI, and JK.

The chance that any checksum would match is very less. The chance that any 
checksum+verify would match is even less.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Neil Perrin

On 10/22/10 17:28, Peter Taps wrote:

Hi Neil,

if the file offset does not match, the chances that the checksum would match, 
especially sha256, is almost 0.

May be I am missing something. Let's say I have a file that contains 11 letters 
- ABCDEFGHIJK. Let's say the block size is 5.

For the first file, the block contents are ABCDE, FGHIJ, and K.

For the second file, let's say the blocks are  ABCD, EFGHI, and JK.

The chance that any checksum would match is very less. The chance that any 
checksum+verify would match is even less.

Regards,
Peter


The block size and contents has to match for ZFS dedup.
See http://blogs.sun.com/bonwick/entry/zfs_dedup

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Haudy Kazemi

Neil Perrin wrote:

On 10/22/10 15:34, Peter Taps wrote:

Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been 
turned on.


Let's say I copy the same file twice under different names at the 
initiator end. Let's say each file ends up taking 5 blocks.


For dedupe to work, each block for a file must match the 
corresponding block from the other file. Essentially, each pair of 
block being compared must have the same start location into the 
actual data.
  


No,  ZFS doesn't care about the file offset, just that the checksum of 
the blocks matches.




One conclusion is that one should be careful not to mess up file 
alignments when working with large files (like you might have in 
virtualization scenarios).  I.e. if you have a bunch of virtual machine 
image clones, they'll dedupe quite well initially.  However, if you then 
make seemingly minor changes inside some of those clones (like changing 
their partition offsets to do 1mb alignment), you'll lose most or all of 
the dedupe benefits.


General purpose compression tends to be less susceptible to changes in 
data offsets but also has its limits based on algorithm and dictionary 
size.  I think dedupe can be viewed as a special-case of compression 
that happens to work quite well for certain workloads when given ample 
hardware resources (compared to what would be needed to run without dedupe).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss