The problem is that the windows server backup seems to choose dynamic
vhd (which would make sense in most cases) and I dont know if there is a
way to change that. Using ISCSI-volumes wont help in my case since
servers are running on physical hardware.
Am 27.04.2010 01:54, schrieb Brandon
- Tim.Kreis tim.kr...@gmx.de skrev:
The problem is that the windows server backup seems to choose dynamic
vhd (which would make sense in most cases) and I dont know if there is
a
way to change that. Using ISCSI-volumes wont help in my case since
servers are running on physical
Hi Tim,
thanks for sharing your dedup experience. Especially for Virtualization, having
a good pool of experience will help a lot of people.
So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on
the same ZFS backing store, if I understand you correctly.
What dedup
Hi,
The setting was this:
Fresh installation of 2008 R2 - server backup with the backup feature - move
vhd to zfs - install active directory role - backup again - move vhd to same
share
I am kinda confused over the change of dedup ratio from changing the record
size, since it should dedup
I found the VHD specification here:
http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc
I am not sure if i understand it right, but it seems like data on disk gets
compressed into the vhd (no empty space), so even
On Mon, Apr 26, 2010 at 8:51 AM, tim Kries tim.kr...@gmx.de wrote:
I am kinda confused over the change of dedup ratio from changing the record
size, since it should dedup 256-bit blocks.
Dedup works on the blocks or either recordsize or volblocksize. The
checksum is made per block written, and
Hi,
I am playing with opensolaris a while now. Today i tried to deduplicate the
backup VHD files Windows Server 2008 generates. I made a backup before and
after installing AD-role and copied the files to the share on opensolaris
(build 134). First i got a straight 1.00x, then i set recordsize
You might note, dedupe only dedupes data that is writen after the flag is set.
It does not retroactivly dedupe already writen data.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
It was active all the time.
Made a new zfs with -o dedup=on, copied with default record size, got no dedup,
deleted files, set recordsize 4k, dedup ratio 1.29x
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
A few things come to mind...
1. A lot better than...what? Setting the recordsize to 4K got you some
deduplication but maybe the pertinent question is what were you
expecting?
2. Dedup is fairly new. I haven't seen any reports of experiments like
yours so...CONGRATULATIONS!! You're probably
Dedup is a key element for my purpose, because i am planning a central
repository for like 150 Windows Server 2008 (R2) servers which would take a lot
less storage if they dedup right.
--
This message posted from opensolaris.org
___
zfs-discuss
Hi Darren,
Could you post the -D part of the man pages? I have no access to a
system (yet) with the latest man pages.
http://docs.sun.com/app/docs/doc/819-2240/zfs-1m
has not been updated yet.
Regards
Peter
Darren J Moffat wrote:
Steven
Sim wrote:
Hello;
Dedup on ZFS is an
Hello;
Dedup on ZFS is an absolutely wonderful feature!
Is there a way to conduct dedup replication across boxes from one dedup
ZFS data set to another?
Warmest Regards
Steven Sim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Steven Sim wrote:
Hello;
Dedup on ZFS is an absolutely wonderful feature!
Is there a way to conduct dedup replication across boxes from one dedup
ZFS data set to another?
Pass the '-D' argument to 'zfs send'.
--
Darren J Moffat
___
zfs-discuss
Dave McDorman wrote:
I don't think is at liberty to discuss ZFS Deduplication at this point in time:
Did Jeff Bonwick and Bill Moore give a presentation at kernel.conf.au or
not? If so, did anyone see the presentation? Did the conference
attendees all sign NDAs or something?
Wes Felter
On Mon, 03 Aug 2009 18:26:44 -0500
Wes Felter wes...@felter.org wrote:
Dave McDorman wrote:
I don't think is at liberty to discuss ZFS Deduplication at this point in
time:
Did Jeff Bonwick and Bill Moore give a presentation at kernel.conf.au or
not?
Yes they did - a keynote, and they
On Tue, 4 Aug 2009, James C. McPherson wrote:
If so, did anyone see the presentation?
Yes. Everybody who attended.
You know, I think we might even have some evidence of their attendance!
http://mexico.purplecow.org/static/kca_spk/tn/IMG_2177.jpg.html
Will the material ever be posted. Looks there is some huge bugs with zfs
deduplication that the organizers do not want to post it also there is no
indication on sun website if there will be a deduplication feature. I think
its best they concentrate on improving zfs performance and speed with
I don't think is at liberty to discuss ZFS Deduplication at this point in time:
http://www.itworld.com/storage/71307/sun-tussles-de-duplication-startup
Hopefully, the matter is resolved and discussions can proceed openly.
Send lawyers, guns and money. - Warren Zevon
--
This message posted from
Ok, thank you Nils, Wade for the concise replies.
After much reading I agree that the ZFS-development queued features do deserve
a higher ranking on the priority list (pool-shrinking/disk-removal and
user/group quotas would be my favourites), so probably the deduplication tool
I'd need would,
Jim Klimov wrote:
Ok, thank you Nils, Wade for the concise replies.
After much reading I agree that the ZFS-development queued features do
deserve a higher ranking on the priority list (pool-shrinking/disk-removal
and user/group quotas would be my favourites), so probably the deduplication
Does some script-usable ZFS API (if any) provide for fetching
block/file hashes (checksums) stored in the filesystem itself? In
fact, am I wrong to expect file-checksums to be readily available?
Yes. Files are not checksummed, blocks are checksummed.
-- richard
Further, even if
On Tue, 26 Aug 2008, Darren J Moffat wrote:
zfs set checksum=sha256
Expect performance to really suck after setting this.
Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,
Bob Friesenhahn wrote:
On Tue, 26 Aug 2008, Darren J Moffat wrote:
zfs set checksum=sha256
Expect performance to really suck after setting this.
Do you have evidence of that ? What kind of workload and how did you
test it ?
I've recently been benchmarking using filebench filemicro and
On Aug 26, 2008, at 9:58 AM, Darren J Moffat wrote:
than a private copy. I wouldn't expect that to have too big an
impact (I
On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to
have a large impact?
--
Keith H. Bierman [EMAIL PROTECTED] | AIM kbiermank
5430
On Tue, Aug 26, 2008 at 10:58 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
In the interest of full disclosure I have changed the sha256.c in the
ZFS source to use the default kernel one via the crypto framework rather
than a private copy. I wouldn't expect that to have too big an impact (I
On Tue, 26 Aug 2008, Darren J Moffat wrote:
Bob Friesenhahn wrote:
On Tue, 26 Aug 2008, Darren J Moffat wrote:
zfs set checksum=sha256
Expect performance to really suck after setting this.
Do you have evidence of that ? What kind of workload and how did you test it
I did some random
Keith Bierman wrote:
On Aug 26, 2008, at 9:58 AM, Darren J Moffat wrote:
than a private copy. I wouldn't expect that to have too big an impact (I
On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to have
a large impact?
UltraSPARC T1 has no hardware SHA256 so I
Mike Gerdts wrote:
On Tue, Aug 26, 2008 at 10:58 AM, Darren J Moffat [EMAIL PROTECTED] wrote:
In the interest of full disclosure I have changed the sha256.c in the
ZFS source to use the default kernel one via the crypto framework rather
than a private copy. I wouldn't expect that to have too
On Tue, Aug 26, 2008 at 10:11 AM, Darren J Moffat [EMAIL PROTECTED]wrote:
Keith Bierman wrote:
On a SPARC CMT (Niagara 1+) based system wouldn't that be likely to have a
large impact?
UltraSPARC T1 has no hardware SHA256 so I wouldn't expect any real change
from running the private
[EMAIL PROTECTED] wrote on 08/22/2008 04:26:35 PM:
Just my 2c: Is it possible to do an offline dedup, kind of like
snapshotting?
What I mean in practice, is: we make many Solaris full-root zones.
They share a lot of data as complete files. This is kind of easy to
save space - make one
Just my 2c: Is it possible to do an offline dedup, kind of like snapshotting?
What I mean in practice, is: we make many Solaris full-root zones. They share a
lot of data as complete files. This is kind of easy to save space - make one
zone as a template, snapshot/clone its dataset, make new
with other Word files. You will thus end up seeking all over the disk
to read _most_ Word files. Which really sucks.
snip
very limited, constrained usage. Disk is just so cheap, that you
_really_ have to have an enormous amount of dup before the performance
penalties of dedup are
On Tue, Jul 22, 2008 at 10:44 PM, Erik Trimble [EMAIL PROTECTED] wrote:
More than anything, Bob's reply is my major feeling on this. Dedup may
indeed turn out to be quite useful, but honestly, there's no broad data
which says that it is a Big Win (tm) _right_now_, compared to finishing
other
Hi All
Is there any hope for deduplication on ZFS ?
Mertol Ozyoney
Storage Practice - Sales Manager
Sun Microsystems
Email [EMAIL PROTECTED]
There is always hope.
Seriously thought, looking at
http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are
a lot of choices
[EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:
Hi All
Is there any hope for deduplication on ZFS ?
Mertol Ozyoney
Storage Practice - Sales Manager
Sun Microsystems
Email [EMAIL PROTECTED]
There is always hope.
Seriously thought, looking at http://en.wikipedia.
To do dedup properly, it seems like there would have to be some overly
complicated methodology for a sort of delayed dedup of the data. For speed,
you'd want your writes to go straight into the cache and get flushed out as
quickly as possibly, keep everything as ACID as possible. Then, a dedup
[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM:
To do dedup properly, it seems like there would have to be some
overly complicated methodology for a sort of delayed dedup of the
data. For speed, you'd want your writes to go straight into the
cache and get flushed out as quickly as
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM:
To do dedup properly, it seems like there would have to be some
overly complicated methodology for a sort of delayed dedup of the
data. For speed, you'd want your writes to go
Chris Cosby wrote:
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote on 07/22/2008
09:58:53 AM:
To do dedup properly, it seems like there would have to be some
overly complicated
FWIW,
Sun's VTL products use ZFS and offer de-duplication services.
http://www.sun.com/aboutsun/pr/2008-04/sunflash.20080407.2.xml
-- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
[EMAIL PROTECTED] wrote on 07/22/2008 11:48:30 AM:
Chris Cosby wrote:
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote on 07/22/2008
09:58:53 AM:
To do dedup properly, it
On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote:
No, you are right to be concerned over block-level dedup seriously
impacting seeks. The problem is that, given many common storage
scenarios, you will have not just similar files, but multiple common
sections of many
On 7/22/08 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote:
I'm still not convinced that dedup is really worth it for anything but
very limited, constrained usage. Disk is just so cheap, that you
_really_ have to have an enormous amount of dup before the performance
penalties of dedup are
On Tue, 22 Jul 2008, Erik Trimble wrote:
Dedup Disadvantages:
Obviously you do not work in the Sun marketing department which is
intrested in this feature (due to some other companies marketing it).
Note that the topic starter post came from someone in Sun's marketing
department.
I think
et == Erik Trimble [EMAIL PROTECTED] writes:
et Dedup Advantages:
et (1) save space
(2) coalesce data which is frequently used by many nodes in a large
cluster into a small nugget of common data which can fit into RAM
or L2 fast disk
(3) back up non-ZFS filesystems that don't
On Tue, 22 Jul 2008, Miles Nordin wrote:
scrubs making pools uselessly slow? Or should it be scrub-like so
that already-written filesystems can be thrown into the dedup bag and
slowly squeezed, or so that dedup can run slowly during the business
day over data written quickly at night (fast
Bob Friesenhahn wrote:
On Tue, 22 Jul 2008, Erik Trimble wrote:
Dedup Disadvantages:
Obviously you do not work in the Sun marketing department which is
intrested in this feature (due to some other companies marketing it).
Note that the topic starter post came from someone in
On Tue, 22 Jul 2008, Miles Nordin wrote:
scrubs making pools uselessly slow? Or should it be scrub-like so
that already-written filesystems can be thrown into the dedup bag and
slowly squeezed, or so that dedup can run slowly during the business
day over data written quickly at night
Raw storage space is cheap. Managing the data is what is expensive.
Not for my customer. Internal accounting means that the storage team gets paid
for each allocated GB on a monthly basis. They have
stacks of IO bandwidth and CPU cycles to spare outside of their daily busy
period. I can't
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
Check out the following blog..:
Justin Stringfellow wrote:
Raw storage space is cheap. Managing the data is what is expensive.
Not for my customer. Internal accounting means that the storage team gets
paid for each allocated GB on a monthly basis. They have
stacks of IO bandwidth and CPU cycles to spare outside of
Just going to make a quick comment here. It's a good point about wanting
backup software to support this, we're a much smaller company but it's already
more difficult to manage the storage needed for backups than our live storage.
However, we're actively planning that over the next 12 months,
Even better would be using the ZFS block checksums (assuming we are only
summing the data, not it's position or time :)...
Then we could have two files that have 90% the same blocks, and still
get some dedup value... ;)
Yes, but you will need to add some sort of highly collision resistant
[EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM:
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
Check out the following blog..:
[EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM:
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
Check out the
Justin Stringfellow wrote:
Raw storage space is cheap. Managing the data is what is expensive.
Not for my customer. Internal accounting means that the storage team gets
paid for each allocated GB on a monthly basis. They have
stacks of IO bandwidth and CPU cycles to spare outside of
On Jul 8, 2008, at 11:00 AM, Richard Elling wrote:
much fun for people who want to hide costs. For example, some bright
manager decided that they should charge $100/month/port for ethernet
drops. So now, instead of having a centralized, managed network with
well defined port mappings,
On Tue, 8 Jul 2008, Richard Elling wrote:
[donning my managerial accounting hat]
It is not a good idea to design systems based upon someone's managerial
accounting whims. These are subject to change in illogical ways at
unpredictable intervals. This is why managerial accounting can be so
Something else came to mind which is a negative regarding
deduplication. When zfs writes new sequential files, it should try to
allocate blocks in a way which minimizes fragmentation (disk seeks).
Disk seeks are the bane of existing storage systems since they come
out of the available IOPS
Hmmn, you might want to look at Andrew Tridgell's' thesis (yes,
Andrew of Samba fame), as he had to solve this very question
to be able to select an algorithm to use inside rsync.
--dave
Darren J Moffat wrote:
[EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote on 07/08/2008 03:08:26 AM:
[EMAIL PROTECTED] wrote on 07/08/2008 01:26:15 PM:
Something else came to mind which is a negative regarding
deduplication. When zfs writes new sequential files, it should try to
allocate blocks in a way which minimizes fragmentation (disk seeks).
Disk seeks are the bane of existing storage
Bob Friesenhahn wrote:
Something else came to mind which is a negative regarding
deduplication. When zfs writes new sequential files, it
should try to
allocate blocks in a way which minimizes fragmentation
(disk seeks).
It should, but because of its copy-on-write nature, fragmentation
On Tue, 8 Jul 2008, Moore, Joe wrote:
On ZFS, sequential files are rarely sequential anyway. The SPA tries to
keep blocks nearby, but when dealing with snapshotted sequential files
being rewritten, there is no way to keep everything in order.
I think that rewriting files (updating existing
On Tue, Jul 8, 2008 at 12:25 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
On Tue, 8 Jul 2008, Richard Elling wrote:
[donning my managerial accounting hat]
It is not a good idea to design systems based upon someone's managerial
accounting whims. These are subject to change in illogical ways at
On Tue, Jul 8, 2008 at 1:26 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
Something else came to mind which is a negative regarding
deduplication. When zfs writes new sequential files, it should try to
allocate blocks in a way which minimizes fragmentation (disk seeks).
Disk seeks are the bane
Tim Spriggs wrote:
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
dataset: HiRISE has a large set of spacecraft data (images) that could
Justin Stringfellow wrote:
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
Check out the following blog..:
Moore, Joe wrote:
On ZFS, sequential files are rarely sequential anyway. The SPA tries to
keep blocks nearby, but when dealing with snapshotted sequential files
being rewritten, there is no way to keep everything in order.
In some cases, a d11p system could actually speed up data reads
Mike Gerdts wrote:
[I agree with the comments in this thread, but... I think we're still being
old fashioned...]
Imagine if university students were allowed to use as much space as
they wanted but had to pay a per megabyte charge every two weeks or
their account is terminated? This would
Hi All ;
Is there any hope for deduplication on ZFS ?
Mertol
http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif
Mertol Ozyoney
Storage Practice - Sales Manager
Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL
Mertol,
Yes, dedup is certainly on our list and has been actively
discussed recently, so there's hope and some forward progress.
It would be interesting to see where it fits into our customers
priorities for ZFS. We have a long laundry list of projects.
In addition there's bug fixes performance
A really smart nexus for dedup is right when archiving takes place. For
systems like EMC Centera, dedup is basically a byproduct of checksumming.
Two files with similar metadata that have the same hash? They're identical.
Charles
On 7/7/08 4:25 PM, Neil Perrin [EMAIL PROTECTED] wrote:
Even better would be using the ZFS block checksums (assuming we are only
summing the data, not it's position or time :)...
Then we could have two files that have 90% the same blocks, and still
get some dedup value... ;)
Nathan.
Charles Soto wrote:
A really smart nexus for dedup is right when
Neil Perrin wrote:
Mertol,
Yes, dedup is certainly on our list and has been actively
discussed recently, so there's hope and some forward progress.
It would be interesting to see where it fits into our customers
priorities for ZFS. We have a long laundry list of projects.
In addition
On Tue, 8 Jul 2008, Nathan Kroenert wrote:
Even better would be using the ZFS block checksums (assuming we are only
summing the data, not it's position or time :)...
Then we could have two files that have 90% the same blocks, and still
get some dedup value... ;)
It seems that the hard
On Mon, 7 Jul 2008, Jonathan Loran wrote:
use ZFS is as nearline storage for backup data. I have a 16TB server
that provides a file store for an EMC Networker server. I'm seeing a
compressratio of 1.73, which is mighty impressive, since we also use
native EMC compression during the backups.
On Mon, Jul 07, 2008 at 07:56:26PM -0500, Bob Friesenhahn wrote:
This deduplication technology seems similar to the Microsoft adds I
see on TV which advertise how their new technology saves the customer
Quantum's claim of 20:1 just doesn't jive in my head, either, for some
reason.
-brian
On Mon, Jul 7, 2008 at 7:40 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
The actual benefit of data deduplication to an enterprise seems
negligible unless the backup system directly supports it. ?In the
enterprise the cost of storage has more to do with backing up the data
than the amount of
I second this, provided we also check that the data is in fact
identical as well. Checksum collisions are likely given the sizes of
disks and the sizes of checksums; and some users actually deliberately
generate data with colliding checksums (researchers and nefarious
users). Dedup must be
Good points. I see the archival process as a good candidate for adding
dedup because it is essentially doing what a stage/release archiving system
already does - faking the existence of data via metadata. Those blocks
aren't actually there, but they're still accessible because they're
On Mon, 7 Jul 2008, Mike Gerdts wrote:
As I have considered deduplication for application data I see several
things happen in various areas.
You have provided an excellent description of gross inefficiencies in
the way systems and software are deployed today, resulting in massive
On Mon, Jul 7, 2008 at 9:24 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
On Mon, 7 Jul 2008, Mike Gerdts wrote:
There tend to be organizational walls between those that manage
storage and those that consume it. As storage is distributed across
a network (NFS, iSCSI, FC) things like
Oh, I agree. Much of the duplication described is clearly the result of
bad design in many of our systems. After all, most of an OS can be served
off the network (diskless systems etc.). But much of the dupe I'm talking
about is less about not using the most efficient system administration
On Mon, Jul 7, 2008 at 11:07 PM, Charles Soto [EMAIL PROTECTED] wrote:
So, while much of the situation is caused by bad data management, there
aren't always systems we can employ that prevent it. Done right, dedup can
certainly be worth it for my operations. Yes, teaching the user the
right
Does anyone know a tool that can look over a dataset and give
duplication statistics? I'm not looking for something incredibly
efficient but I'd like to know how much it would actually benefit our
dataset: HiRISE has a large set of spacecraft data (images) that could
potentially have large
86 matches
Mail list logo