date:20090213

Re: [zfs-discuss] 'zfs recv' is very slow

2009-02-13 Thread Brent Jones

On Mon, Feb 2, 2009 at 6:55 AM, Robert Milkowski mi...@task.gda.pl wrote:
 It definitely does. I made some tests today comparing b101 with b105 while 
 doing 'zfs send -R -I A B /dev/null' with several dozen snapshots between A 
 and B. Well, b105 is almost 5x faster in my case - that's pretty good.

 --
 Robert Milkowski
 http://milek.blogspot.com
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Sad to report that I am seeing the slow zfs recv issue cropping up
again while running b105  :(

Not sure what has triggered the change, but I am seeing the same
behavior again: massive amounts of reads on the receiving side, while
only receiving just tiny bursts of data amounting to a mere megabyte a
second.

It doesn't seem to happen every single time though which is odd, but I
can provoke it by destroying a snapshot from the pool I am sending,
then taking another snapshot and re-sending it. It seems to cause the
receiving side to go into this read storm before any data is
transferred.

I'm going to open a case in the morning, and see if I can't get an
engineer to look at this.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-02-13 Thread Nicola Fankhauser

hi

I have a AOC-USAS-L8i working in both a Gigabyte GA-P35-DS3P and Gigabyte 
GA-EG45M-DS2H under OpenSolaris build 104+ (Nexenta Core 2.0 beta).

the controller looks like this in lspci:

01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express 
Fusion-MPT SAS (rev 08)
Subsystem: Super Micro Computer Inc Unknown device a380
Flags: bus master, fast devsel, latency 0, IRQ 15
I/O ports at a000
Memory at f101 (64-bit, non-prefetchable)
Memory at f100 (64-bit, non-prefetchable)
Capabilities: [50] Power Management version 2
Capabilities: [68] Express Endpoint IRQ 0
Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1

with 8x 300GB (older 7200rpm disks from 2004) as raid-z1 on a 3.2Ghz Core Duo 
and 4GB RAM, this delivers following figures:

1. write performance (linear) ~98MBytes/s:

r...@marvin:/tank/storage# dd_rescue -b4M /dev/zero test
Summary for /dev/zero - test:0, errxfer: 0.0k, succxfer: 10420224.0k
dd_rescue: (info): ipos: 10485760.0k, opos: 10485760.0k, xferd: 10485760.0k
errs: 0, errxfer: 0.0k, succxfer: 10485760.0k
+curr.rate: 106039kB/s, avg.rate: 104173kB/s, avg.load: 16.4%

r...@marvin:/tank/storage# dd_rescue -b4M /dev/zero test2
Summary for /dev/zero - test2:, errxfer: 0.0k, succxfer: 4128768.0k
dd_rescue: (info): ipos: 4194304.0k, opos: 4194304.0k, xferd: 4194304.0k
errs: 0, errxfer: 0.0k, succxfer: 4194304.0k
+curr.rate: 88486kB/s, avg.rate: 96142kB/s, avg.load: 14.1%

2. read performance (linear) ~290MBytes/s:

Summary for test - /dev/null:
dd_rescue: (info): ipos: 10485760.0k, opos: 10485760.0k, xferd: 10485760.0k
errs: 0, errxfer: 0.0k, succxfer: 10485760.0k
+curr.rate: 0kB/s, avg.rate: 285824kB/s, avg.load: 40.4%

Summary for test2 - /dev/null:
dd_rescue: (info): ipos: 4194304.0k, opos: 4194304.0k, xferd: 4194304.0k
errs: 0, errxfer: 0.0k, succxfer: 4194304.0k
+curr.rate: 0kB/s, avg.rate: 308484kB/s, avg.load: 39.8%

regards
nicola
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Strange performance loss

2009-02-13 Thread Peter Tribble

I'm moving some data off an old machine to something reasonably new.
Normally, the new machine performs better, but I have one case just now
where the new system is terribly slow.

Old machine - V880 (Solaris 8) with SVM raid-5:

# ptime du -kds foo
15043722foo

real6.955
user0.964
sys 5.492

And now the new machine - T5140 (latest Solaris 10) with ZFS striped
atop a bunch of 2530 arrays:

# ptime du -kds foo
15343120foo

real 2:55.210
user2.559
sys  2:05.788

It's not just du; a find on that directory is similarly bad.

I have other filesystems of similar size and number of files (there are only
about 200K files) that perform well, so there must be something about this
filesystem that is throwing zfs into a spin.

Anybody else seen anything like this?

I'm suspicious of ACL handling. So for a quick test I took one directory with
approx 5000 files in it and timed du (I'm running all this as root, btw):

1. Just the files, no ACLs.

real0.238
user0.050
sys 0.187

2. Files with ACLs:

real0.467
user0.055
sys 0.411

3.  Files with ACLs, and an ACL on the directory

real0.610
user0.058
sys 0.551

I don't know whether that explains all the problem, but it's clear
that having ACLs
on files and directories has a definite cost.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Jiawei Zhao

I am wondering if the usb storage device is not reliable for ZFS usage, can the 
situation be improved if I put the intent log on internal sata disk to avoid 
corruption and utilize the convenience of usb storage
at the same time?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross

huh?  but that looses the convenience of USB.

I've used USB drives without problems at all, just remember to zpool export 
them before you unplug.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Two zvol devices one volume?

2009-02-13 Thread Marcelo H Majczak

 
 I have seen this 'phantom dataset' with a pool on nv93. I created a 
 zpool, created a dataset, then destroyed the zpool. When creating a new 
 zpool on the same partitions/disks as the destroyed zpool, upon export I 
 receive the same message as you describe above, even though I never 
 created the dataset in the new pool.
 
 Creating a dataset of the same name and then destroying it doesn't seem 
 to get rid of it, either.
 

The solution for your case be on this post, if not file a bug:
http://www.opensolaris.org/jive/thread.jspa?messageID=311573#311573
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Jiawei Zhao

While mobility could be lost, usb storage still has the advantage of being 
cheap and easy to install comparing to install internal disks on pc, so if I 
just want to use it to provide zfs storage space for home file server, can a 
small  intent log located on internal sata disk prevent the pool corruption 
caused by a power cut?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] SPAM *** Re: unformatted partition

2009-02-13 Thread Jan Hlodan


Hello,
thanks for the answer.
The partition table shows that Wind and OS run on:
1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0


 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1 IFS: NTFS 0  50985099 26
 2   ActiveSolaris2   5099  129317833 40


The disk 0. c7t0d0 doesn't contain any disk type:

AVAILABLE DISK SELECTIONS:
  0. c7t0d0 drive type unknown
 /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0
  1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0
Specify disk (enter its number): 0

Error occurred with device in usechecking: Bad file number
Error: can't open disk '/dev/rdsk/c7t0d0p0'.


AVAILABLE DRIVE TYPES:
   0. Auto configure
   1. other
Specify disk type (enter its number): 0
Auto configuration via format.dat[no]? y
Auto configure failed
No Solaris fdisk partition found.


If create some file system using the Gparted, my partition table will 
look like this:



  Cylinders
 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1 IFS: NTFS 0  50985099 26
 2   ActiveSolaris2   5099  129317833 40
 3 Solaris xyz  xyz
34



but I still don't know how to import this partition (num. 3)
If I run:
zpool create  c9d0
I'll lost all my data, right?

Regards,

Jan Hlodan


Will Murnane wrote:

On Thu, Feb 12, 2009 at 21:59, Jan Hlodan jh231...@mail-emea.sun.com wrote:
  

I would like to import 3. partition as a another pool but I can't see this
partition.

sh-3.2# format -e
Searching for disks...done
AVAILABLE DISK SELECTIONS:
 0. c7t0d0 drive type unknown
/p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0
 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
/p...@0,0/pci-...@1f,2/i...@0/c...@0,0
I guess that 0. is wind partition and 1. is Opensolaris


What you see there are whole disks, not partitions.  Try zpool
status, which will show you that rpool is on something like c9d0s0.
Then go into format again, pick 1 (in my example), type fdisk to
look at the DOS-style partition table and verify that the partitioning
of the disk matches what you thought it was.

Then you can create a new zpool with something like zpool create data c9t0p3.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Kyle McDonald


On 2/13/2009 5:58 AM, Ross wrote:

huh?  but that looses the convenience of USB.

I've used USB drives without problems at all, just remember to zpool export 
them before you unplug.
   
I think there is a subcommand of cfgaadm you should run to to notify 
Solariss that you intend to unplug the device. I don't use USB, and my 
familiarity with cfgadm (for FC and SCSI) is limited.


  -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS vdev_cache

2009-02-13 Thread Tony Marshall

Hi All,

How would i obtain the current setting for the vdev_cache from a
production system?  We are looking at trying to tune ZFS for better
performance with respect to oracle databases, however before we start
changing settings via the /etc/system file we would like to confirm the
setting from the running OS.

Thanks
Tony
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vdev_cache

2009-02-13 Thread Mark J Musante


On Fri, 13 Feb 2009, Tony Marshall wrote:

How would i obtain the current setting for the vdev_cache from a 
production system?  We are looking at trying to tune ZFS for better 
performance with respect to oracle databases, however before we start 
changing settings via the /etc/system file we would like to confirm the 
setting from the running OS.


The kernel variable zfs_vdev_cache_size indicates the size of the cash per 
leaf vdev.  By default, it's set to 0xa0 or 10MB.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Neil Perrin


Having a separate intent log on good hardware will not prevent corruption
on a pool with bad hardware. By good I mean hardware that correctly
flush their write caches when requested.

Note, a pool is always consistent (again when using good hardware).
The function of the intent log is not to provide consistency (like a journal),
but to speed up synchronous requests like fsync and O_DSYNC.

Neil.

On 02/13/09 06:29, Jiawei Zhao wrote:

While mobility could be lost, usb storage still has the advantage of being cheap
and easy to install comparing to install internal disks on pc, so if I just 
want to
use it to provide zfs storage space for home file server, can a small  intent 
log
located on internal sata disk prevent the pool corruption caused by a power cut?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Eric D. Mudama


On Fri, Feb 13 at  9:14, Neil Perrin wrote:

Having a separate intent log on good hardware will not prevent corruption
on a pool with bad hardware. By good I mean hardware that correctly
flush their write caches when requested.


Can someone please name a specific piece of bad hardware?

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Eric D. Mudama


On Thu, Feb 12 at 19:43, Toby Thain wrote:
^^ Spec compliance is what we're testing for... We wouldn't know if this 
special variant is working correctly either. :)


Time the difference between NCQ reads with and without FUA in the
presence of overlapped cached write data.  That should have a
significant performance penalty, compared to a device servicing the
reads from a volatile buffer cache.

FYI, there are semi-commonly-available power control units that take
serial port or USB as an input, and have a whole bunch of SATA power
connectors on them.  These are the sorts of things that drive vendors
use to bounce power unexpectedly in their testing, if you need to
perform that same validation, it makes sense to invest in that bit of
infrastructure.

Something like this:
http://www.ulinktech.com/products/hw_power_hub.html

or just roll your own in a few days like this guy did for his printer:
http://chezphil.org/slugpower/


It should be pretty trivial to perform a few thousand cached writes,
issue a flush cache ext, and turn off power immediately after that
command completes.  Then go back and figure out how many of those
writes were successfully written as the device claimed.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin

 gm == Gary Mills mi...@cc.umanitoba.ca writes:

gm That implies that ZFS will have to detect removable devices
gm and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.  And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.

As we've said many times, if the devices are working properly, then
they can be unplugged uncleanly without corrupting the pool, and
without corrupting any other non-Microsoft filesystem.  This is an
old, SOLVED, problem.  It's ridiculous hypocricy to make whole
filesystems DSYNC, to even _invent the possibility for the filesystem
to be DSYNC_, just because it is possible to remove something.  Will
you do the same thing because it is possible for your laptop's battery
to run out?  just, STOP!  If the devices are broken, the problem is
that they're broken, not that they're removeable.

personally, I think everything with a broken write cache should be
black-listed in the kernel and attach read-only by default, whether
it's a USB bridge or a SATA disk.  This will not be perfect because
USB bridges, RAID layers and iSCSI targets, will often hide the
identity of the SATA drive behind them, and of course people will
demand a way to disable it.  but if you want to be ``safe'', then for
the sake of making the point, THIS is the right way to do it, not muck
around with these overloaded notions of ``removeable''.

Also, the so-far unacknowledged ``iSCSI/FC Write Hole'' should be
fixed so that a copy of all written data is held in the initiator's
buffer cache until it's verified as *on the physical platter/NVRAM* so
that it can be replayed if necessary, and SYNC CACHE commands are
allowed to fail far enough that even *things which USE the initiator,
like ZFS* will understand what it means when SYNC CACHE fails, and
bounced connections are handled correctly---otherwise, when
connections bounce or SYNC CACHE returns failure, correctness requires
that the initiator pretend like its plug was pulled and panic.  Short
of that the initiator system must forcibly unmount all filesystems
using that device and kill all processes that had files open on those
filesystems.  

And sysadmins should have and know how to cleverly use a
tool that tests for both functioning barriers and working SYNC CACHE,
end-to-end.

NO more ``removeable'' attributes, please!  You are just pretending to
solve a much bigger problem, and making things clumsy and disgusting
in the process.


pgpoCtG5UI9HX.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin

 fc == Frank Cusack fcus...@fcusack.com writes:

 Dropping a flush-cache command is just as bad as dropping a
 write.

fc Not that it matters, but it seems obvious that this is wrong
fc or anyway an exaggeration.  Dropping a flush-cache just means
fc that you have to wait until the device is quiesced before the
fc data is consistent.

fc Dropping a write is much much worse.

backwards i think.  Dropping a flush-cache is WORSE than dropping the
flush-cache plus all writes after the flush-cache.  The problem that
causes loss of whole pools rather than loss of recently-written data
isn't that you're writing too little.  It's that you're dropping the
barrier and misordering the writes.  consequently you lose *everything
you've ever written,* which is much worse than losing some recent
writes, even a lot of them.


pgp0bxNk2dBD0.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin

 t == Tim  t...@tcsac.net writes:

 t I would like to believe it has more to do with Solaris's
 t support of USB than ZFS, but the fact remains it's a pretty
 t glaring deficiency in 2009, no matter which part of the stack
 t is at fault.

maybe, but for this job I don't much mind glaring deficiencies, as
long as it's possible to assemble a working system without resorting
to trial-and-error, and possible to know it's working before loading
data on it.  Right now, by following the ``best practices'', you don't
know what to buy, and after you receive the hardware you don't know if
it works until you lose a pool, at which time someone will tell you
``i guess it wasn't ever working.''  

Even if you order sun4v or an expensive FC disk shelf, you still don't
know if it works.

(though, I'm starting to suspect, ni the case of FC or iSCSI the
 answer is always ``it does not work'')

The only thing you know for sure is, if you lose a pool, someone will
blame it on hardware bugs surroudning cache flushes, or else try to
conflate the issue with a bunch of inapplicable garbage about
checksums and wire corruption.  This is unworkable.

I'm not saying glaring 2009 deficiencies are irrelevant---on my laptop
I do mind because I got out of a multi-year abusive relationship with
NetBSD/hpcmips, and now want all parts of my laptop to have drivers.
And I guess it applies to that neat timeslider / home-base--USB-disk
case we were talking about a month ago.  but for what I'm doing I will
actually accept the advice ``do not ever put ZFS on USB because ZFS is
a canary in the mine of USB bugs''---it's just, that advice is not
really good enough to settle the whole issue.


pgpFtPv2xfqGk.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Greg Palmer


Miles Nordin wrote:

gm That implies that ZFS will have to detect removable devices
gm and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.  And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
  
Since this discussion is taking place in the context of someone removing 
a USB stick I think you're confusing the issue by dragging in other 
technologies. Let's keep this in the context of the posts preceding it 
which is how USB devices are treated. I would argue that one of the 
first design goals in an environment where you can expect people who are 
not computer professionals to be interfacing with computers is to make 
sure that the appropriate safeties are in place and that the system does 
not behave in a manner which a reasonable person might find unexpected.


This is common practice for any sort of professional engineering effort. 
As an example, you aren't going to go out there and find yourself a 
chainsaw being sold new without a guard. It might be removable, but the 
default is to include it. Why? Well because there is a considerable 
chance of damage to the user without it. Likewise with a file system on 
a device which might cache a data write for as long as thirty seconds 
while being easily removable. In this case, the user may write the file 
and seconds later remove the device. Many folks out there behave in this 
manner.


It really doesn't matter to them that they have a copy of the last save 
they did two hours ago, what they want and expect is that the most 
recent data they saved actually be on the USB stick for the to retrieve. 
What you are suggesting is that it is better to lose that data when it 
could have been avoided. I would personally suggest that it is better to 
have default behavior which is not surprising along with more advanced 
behavior for those who have bothered to read the manual. In Windows 
case, the write cache can be turned on, it is not unchangeable and 
those who have educated themselves use it. I seldom turn it on unless 
I'm doing heavy I/O to a USB hard drive, otherwise the performance 
difference is just not that great.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack


On February 13, 2009 12:20:21 PM -0500 Miles Nordin car...@ivy.net wrote:

fc == Frank Cusack fcus...@fcusack.com writes:


 Dropping a flush-cache command is just as bad as dropping a
 write.

fc Not that it matters, but it seems obvious that this is wrong
fc or anyway an exaggeration.  Dropping a flush-cache just means
fc that you have to wait until the device is quiesced before the
fc data is consistent.

fc Dropping a write is much much worse.

backwards i think.  Dropping a flush-cache is WORSE than dropping the
flush-cache plus all writes after the flush-cache.  The problem that
causes loss of whole pools rather than loss of recently-written data
isn't that you're writing too little.  It's that you're dropping the
barrier and misordering the writes.  consequently you lose *everything
you've ever written,* which is much worse than losing some recent
writes, even a lot of them.


Who said dropping a flush-cache means dropping any subsequent writes,
or misordering writes?  If you're misordering writes isn't that a
completely different problem?  Even then, I don't see how it's worse
than DROPPING a write.  The data eventually gets to disk, and at that
point in time, the disk is consistent.  When dropping a write, the data
never makes it to disk, ever.

In the face of a power loss, of course these result in the same problem,
but even without a power loss the drop of a write is catastrophic.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack


On February 13, 2009 12:10:08 PM -0500 Miles Nordin car...@ivy.net wrote:

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.


thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack


On February 13, 2009 12:41:12 PM -0500 Miles Nordin car...@ivy.net wrote:


fc == Frank Cusack fcus...@fcusack.com writes:


fc if you have 100TB of data, wouldn't you have a completely
fc redundant storage network

If you work for a ponderous leaf-eating brontosorous maybe.  If your
company is modern I think having such an oddly large amount of data in
one pool means you'd more likely have 70 whitebox peecees using
motherboard ethernet/sata only, connected to a mesh of unmanaged L2
switches (of some peculiar brand that happens to work well.)  There
will always be one or two peecees switched off, and constantly
something will be resilvering.  The home user case is not really just
for home users.  I think a lot of people are tired of paying quadruple
for stuff that still breaks, even serious people.


oh i dunno.  i recently worked for a company that practically defines
modern and we had multiples of 100TB of data.  Like you said, not all
in one place, but any given piece was fully redundant (well, if you
count RAID-5 as fully ... but I'm really referring to the infrastructure).

I can't imagine it any other way ... the cost of not having redundancy
in the face of a failure is so much higher compared to the cost of
building in that redundancy.

Also I'm not sure how you get 1 pool with more than 1 peecee as zfs is
not a cluster fs.  So what you are talking about is multiple pools,
and in that case if you do lose one (not redundant for whatever reason)
you only have to restore a fraction of the 100TB from backup.


fc Isn't this easily worked around by having UPS power in
fc addition to whatever the data center supplies?

In NYC over the last five years the power has been more reliable going
into my UPS than coming out of it.  The main reason for having a UPS
is wiring maintenance.  And the most important part of the UPS is the
externally-mounted bypass switch because the UPS also needs
maintenance.  UPS has never _solved_ anything, it always just helps.
so in the end we have to count on the software's graceful behavior,
not on absolutes.


I can't say I agree about the UPS, however I've already been pretty
forthright that UPS, etc. isn't the answer to the problem, just a
mitigating factor to the root problem.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Dick Hoogendijk

On Fri, 13 Feb 2009 17:53:00 +0100, Eric D. Mudama  
edmud...@bounceswoosh.org wrote:



On Fri, Feb 13 at  9:14, Neil Perrin wrote:
Having a separate intent log on good hardware will not prevent  
corruption

on a pool with bad hardware. By good I mean hardware that correctly
flush their write caches when requested.


Can someone please name a specific piece of bad hardware?


Or better still, name a few -GOOD- ones.

--
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv107++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin

 fc == Frank Cusack fcus...@fcusack.com writes:

fc If you're misordering writes
fc isn't that a completely different problem?  

no.  ignoring the flush cache command causes writes to be misordered.

fc Even then, I don't see how it's worse than DROPPING a write.
fc The data eventually gets to disk, and at that point in time,
fc the disk is consistent.  When dropping a write, the data never
fc makes it to disk, ever.

If you drop the flush cache command and every write after the flush
cache command, yeah yeah it's bad, but in THAT case, the disk is still
always consistent because no writes have been misordered.

fc In the face of a power loss, of course these result in the
fc same problem,

no, it's completely different in a power loss, which is exactly the point.

If you pull the cord while the disk is inconsistent, you may lose the
entire pool.  If the disk is never inconsistent because you've never
misordered writes, you will only lose recent write activity.  Losing
everything you've ever written is usually much worse than losing what
you've written recently.

yeah yeah some devil's advocate will toss in, ``i *need* some
consistency promises or else it's better that the pool its hand and
say `broken, restore backup please' even if the hand-raising comes in
the form of losing the entire pool,'' well in that case neither one is
acceptable.  But if your requirements are looser, then dropping a
flush cache command plus every write after the flush cache command is
much better than just ignoring the flush cache command.  of course,
that is a weird kind of failure that never happens.  I described it
just to make a point, to argue against this overly-simple idea ``every
write is precious.  let's do them as soon as possible because there
could be Valuable Business Data inside the writes!  we don't want to
lose anything Valuable!''  The part of SYNC CACHE that's causing
people to lose entire pools isn't the ``hurry up!  write faster!''
part of the command, such that without it you still get your precious
writes, just a little slower.  NO.  It's the ``control the order of
writes'' part that's important for integrity on a single-device vdev.


pgpzrY74grvli.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack


On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote:

fc == Frank Cusack fcus...@fcusack.com writes:


fc If you're misordering writes
fc isn't that a completely different problem?

no.  ignoring the flush cache command causes writes to be misordered.


oh.  can you supply a reference or if you have the time, some more
explanation?  (or can someone else confirm this.)

my understanding (weak, admittedly) is that drives will reorder writes
on their own, and this is generally considered normal behavior.  so
to guarantee consistency *in the face of some kind of failure like a
power loss*, we have write barriers.  flush-cache is a stronger kind
of write barrier.

now that i think more, i suppose yes if you ignore the flush cache,
then writes before and after the flush cache could be misordered,
however it's the same as if there were no flush cache at all, and
again as long as the drive has power and you can quiesce it then
the data makes it to disk, and all is consistent and well.  yes?

whereas if you drop a write, well it's gone off into a black hole.


fc Even then, I don't see how it's worse than DROPPING a write.
fc The data eventually gets to disk, and at that point in time,
fc the disk is consistent.  When dropping a write, the data never
fc makes it to disk, ever.

If you drop the flush cache command and every write after the flush
cache command, yeah yeah it's bad, but in THAT case, the disk is still
always consistent because no writes have been misordered.


why would dropping a flush cache imply dropping every write after the
flush cache?


fc In the face of a power loss, of course these result in the
fc same problem,

no, it's completely different in a power loss, which is exactly the point.

If you pull the cord while the disk is inconsistent, you may lose the
entire pool.  If the disk is never inconsistent because you've never
misordered writes, you will only lose recent write activity.  Losing
everything you've ever written is usually much worse than losing what
you've written recently.


yeah, as soon as i wrote that i realized my error, so thank you and i
agree on that point.  *in the event of a power loss* being inconsistent
is a worse problem.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 10:29:05 AM -0800 Frank Cusack fcus...@fcusack.com 
wrote:

On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote:

fc == Frank Cusack fcus...@fcusack.com writes:


fc If you're misordering writes
fc isn't that a completely different problem?

no.  ignoring the flush cache command causes writes to be misordered.


oh.  can you supply a reference or if you have the time, some more
explanation?  (or can someone else confirm this.)


uhh ... that question can be ignored as i answered it myself below.
sorry if i'm must being noisy now.


my understanding (weak, admittedly) is that drives will reorder writes
on their own, and this is generally considered normal behavior.  so
to guarantee consistency *in the face of some kind of failure like a
power loss*, we have write barriers.  flush-cache is a stronger kind
of write barrier.

now that i think more, i suppose yes if you ignore the flush cache,
then writes before and after the flush cache could be misordered,
however it's the same as if there were no flush cache at all, and
again as long as the drive has power and you can quiesce it then
the data makes it to disk, and all is consistent and well.  yes?


-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Miles Nordin

 fc == Frank Cusack fcus...@fcusack.com writes:

fc why would dropping a flush cache imply dropping every write
fc after the flush cache?

it wouldn't and probably never does.  It was an imaginary scenario
invented to argue with you and to agree with the guy in the USB bug
who said ``dropping a cache flush command is as bad as dropping a
write.''

fc oh.  can you supply a reference or if you have the time, some
fc more explanation?  (or can someone else confirm this.)

I posted something long a few days ago that I need to revisit.  The
problem is, I don't actually understand how the disk commands work, so
I was talking out my ass.  Although I kept saying, ``I'm not sure it
actually works this way,'' my saying so doesn't help anyone who spends
the time to read it and then gets a bunch of mistaken garbage stuck in
his head, which people who actually recognize as garbage are too busy
to correct.  It'd be better for everyone if I didn't do that.

On the other hand, I think there's some worth to dreaming up several
possibilities of what I fantisize the various commands might mean or
do, rather than simply reading one of the specs to get the one right
answer, because from what people in here say it soudns as though
implementors of actual systems based on the SCSI commandset live in
this same imaginary world of fantastic and multiple realities without
any meaningful review or accountability that I do.  (disks, bridges,
iSCSI targets and initiators, VMWare/VBox storage, ...)


pgpkzKNL1NfqX.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross

Superb news, thanks Jeff.

Having that will really raise ZFS up a notch, and align it much better with 
peoples expectations.  I assume it'll work via zpool import, and let the user 
know what's gone wrong?

If you think back to this case, imagine how different the users response would 
have been if instead of being unable to mount the pool, ZFS had turned around 
and said:

This pool was not unmounted cleanly, and data has been lost.  Do you want to 
restore your pool to the last viable state: (timestamp goes here)?

Something like that will have people praising ZFS' ability to safeguard their 
data, and the way it recovers even after system crashes or when hardware has 
gone wrong.  You could even have a common causes of this are... message, or a 
link to an online help article if you wanted people to be really impressed.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn


On Fri, 13 Feb 2009, Ross wrote:


Something like that will have people praising ZFS' ability to 
safeguard their data, and the way it recovers even after system 
crashes or when hardware has gone wrong.  You could even have a 
common causes of this are... message, or a link to an online help 
article if you wanted people to be really impressed.


I see a career in politics for you.  Barring an operating system 
implementation bug, the type of problem you are talking about is due 
to improperly working hardware.  Irreversibly reverting to a previous 
checkpoint may or may not obtain the correct data.  Perhaps it will 
produce a bunch of checksum errors.


There are already people praising ZFS' ability to safeguard their 
data, and the way it recovers even after system crashes or when 
hardware has gone wrong.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-02-13 Thread Will Murnane

On Fri, Feb 13, 2009 at 04:51, Nicola Fankhauser
nicola.fankhau...@variant.ch wrote:
 hi

 I have a AOC-USAS-L8i working in both a Gigabyte GA-P35-DS3P and Gigabyte 
 GA-EG45M-DS2H under OpenSolaris build 104+ (Nexenta Core 2.0 beta).
Very cool!  It's good to see people having success with this card.

How does mounting the card work?  Can one reverse the slot cover and
screw it in like that, or is the card hanging free?  Can you provide
pictures of the card mounted in the case?

Thanks!
Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Nicolas Williams

On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack wrote:
 On February 13, 2009 1:10:55 PM -0500 Miles Nordin car...@ivy.net wrote:
 fc == Frank Cusack fcus...@fcusack.com writes:
 
 fc If you're misordering writes
 fc isn't that a completely different problem?
 
 no.  ignoring the flush cache command causes writes to be misordered.
 
 oh.  can you supply a reference or if you have the time, some more
 explanation?  (or can someone else confirm this.)

Ordering matters for atomic operations, and filesystems are full of
those.

Now, if ordering is broken but the writes all eventually hit the disk
then no one will notice.  But if power failures and/or partitions
(cables get pulled, network partitions occur affecting an iSCSI
connection, ...) then bad things happen.

For ZFS the easiest way to ameliorate this is the txg fallback fix that
Jeff Bonwick has said is now a priority.  And if ZFS guarantees no block
re-use until N txgs pass after a block is freed, then the fallback can
be of up to N txgs, which gives you a decent chance that you'll recover
your pool in the face of buggy devices, but for each discarded txg you
lose that transaction's writes, you lose data incrementally.  (The
larger N is the better your chance that the oldest of the last N txg's
writes will all hit the disk in spite of the disk's lousy cache
behaviors.)

The next question is how to do the fallback, UI-wise.  Should it ever be
automatic?  A pool option for that would be nice (I'd use it on all-USB
pools).  If/when not automatic, how should the user/admin be informed of
the failure to open the pool and the option to fallback on an older txg
(with data loss)?  (For non-removable pools imported at boot time the
answer is that the service will fail, causing sulogin to be invoked so
you can fix the problem on console.  For removable pools there should be
a GUI.)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith

On Fri, Feb 13, 2009 at 7:41 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Fri, 13 Feb 2009, Ross wrote:

 Something like that will have people praising ZFS' ability to safeguard
 their data, and the way it recovers even after system crashes or when
 hardware has gone wrong.  You could even have a common causes of this
 are... message, or a link to an online help article if you wanted people to
 be really impressed.

 I see a career in politics for you.  Barring an operating system
 implementation bug, the type of problem you are talking about is due to
 improperly working hardware.  Irreversibly reverting to a previous
 checkpoint may or may not obtain the correct data.  Perhaps it will produce
 a bunch of checksum errors.

Yes, the root cause is improperly working hardware (or an OS bug like
6424510), but with ZFS being a copy on write system, when errors occur
with a recent write, for the vast majority of the pools out there you
still have huge amounts of data that is still perfectly valid and
should be accessible.  Unless I'm misunderstanding something,
reverting to a previous checkpoint gets you back to a state where ZFS
knows it's good (or at least where ZFS can verify whether it's good or
not).

You have to consider that even with improperly working hardware, ZFS
has been checksumming data, so if that hardware has been working for
any length of time, you *know* that the data on it is good.

Yes, if you have databases or files there that were mid-write, they
will almost certainly be corrupted.  But at least your filesystem is
back, and it's in as good a state as it's going to be given that in
order for your pool to be in this position, your hardware went wrong
mid-write.

And as an added bonus, if you're using ZFS snapshots, now your pool is
accessible, you have a bunch of backups available so you can probably
roll corrupted files back to working versions.

For me, that is about as good as you can get in terms of handling a
sudden hardware failure.  Everything that is known to be saved to disk
is there, you can verify (with absolute certainty) whether data is ok
or not, and you have backup copies of damaged files.  In the old days
you'd need to be reverting to tape backups for both of these, with
potentially hours of downtime before you even know where you are.
Achieving that in a few seconds (or minutes) is a massive step
forwards.

 There are already people praising ZFS' ability to safeguard their data, and
 the way it recovers even after system crashes or when hardware has gone
 wrong.

Yes there are, but the majority of these are praising the ability of
ZFS checksums to detect bad data, and to repair it when you have
redundancy in your pool.  I've not seen that many cases of people
praising ZFS' recovery ability - uberblock problems seem to have a
nasty habit of leaving you with tons of good, checksummed data on a
pool that you can't get to, and while many hardware problems are dealt
with, others can hang your entire pool.



 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread David Collier-Brown



Bob Friesenhahn wrote:
 On Fri, 13 Feb 2009, Ross wrote:

 Something like that will have people praising ZFS' ability to
 safeguard their data, and the way it recovers even after system
 crashes or when hardware has gone wrong.  You could even have a
 common causes of this are... message, or a link to an online help
 article if you wanted people to be really impressed.
 
 I see a career in politics for you.  Barring an operating system
 implementation bug, the type of problem you are talking about is due to
 improperly working hardware.  Irreversibly reverting to a previous
 checkpoint may or may not obtain the correct data.  Perhaps it will
 produce a bunch of checksum errors.

Actually that's a lot like FMA replies when it sees a problem,
telling the person what happened and pointing them to a web page
which can be updated with the newest information on the problem.

That's a good spot for This pool was not unmounted cleanly due
to a hardware fault and data has been lost.  The name of timestamp
line contains the date which can be recovered to.  Use the command
  # zfs reframbulocate this that -t timestamp
to revert to timestamp

--dave
-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
dav...@sun.com |  -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn


On Fri, 13 Feb 2009, Ross Smith wrote:


You have to consider that even with improperly working hardware, ZFS
has been checksumming data, so if that hardware has been working for
any length of time, you *know* that the data on it is good.


You only know this if the data has previously been read.

Assume that the device temporarily stops pysically writing, but 
otherwise responds normally to ZFS.  Then the device starts writing 
again (including a recent uberblock), but with a large gap in the 
writes.  Then the system loses power, or crashes.  What happens then?


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith

On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Fri, 13 Feb 2009, Ross Smith wrote:

 You have to consider that even with improperly working hardware, ZFS
 has been checksumming data, so if that hardware has been working for
 any length of time, you *know* that the data on it is good.

 You only know this if the data has previously been read.

 Assume that the device temporarily stops pysically writing, but otherwise
 responds normally to ZFS.  Then the device starts writing again (including a
 recent uberblock), but with a large gap in the writes.  Then the system
 loses power, or crashes.  What happens then?

Well in that case you're screwed, but if ZFS is known to handle even
corrupted pools automatically, when that happens the immediate
response on the forums is going to be something really bad has
happened to your hardware, followed by troubleshooting to find out
what.  Instead of the response now, where we all know there's every
chance the data is ok, and just can't be gotten to without zdb.

Also, that's a pretty extreme situation since you'd need a device that
is being written to but not read from to fail in this exact way.  It
also needs to have no scrubbing being run, so the problem has remained
undetected.

However, even in that situation, if we assume that it happened and
that these recovery tools are available, ZFS will either report that
your pool is seriously corrupted, indicating a major hardware problem
(and ZFS can now state this with some confidence), or ZFS will be able
to open a previous uberblock, mount your pool and begin a scrub, at
which point all your missing writes will be found too and reported.

And then you can go back to your snapshots.  :-D



 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Richard Elling


Greg Palmer wrote:

Miles Nordin wrote:

gm That implies that ZFS will have to detect removable devices
gm and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
Since this discussion is taking place in the context of someone 
removing a USB stick I think you're confusing the issue by dragging in 
other technologies. Let's keep this in the context of the posts 
preceding it which is how USB devices are treated. I would argue that 
one of the first design goals in an environment where you can expect 
people who are not computer professionals to be interfacing with 
computers is to make sure that the appropriate safeties are in place 
and that the system does not behave in a manner which a reasonable 
person might find unexpected.


It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn't do any write caching. Hence, it
seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs,
nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith

On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Fri, 13 Feb 2009, Ross Smith wrote:

 You have to consider that even with improperly working hardware, ZFS
 has been checksumming data, so if that hardware has been working for
 any length of time, you *know* that the data on it is good.

 You only know this if the data has previously been read.

 Assume that the device temporarily stops pysically writing, but otherwise
 responds normally to ZFS.  Then the device starts writing again (including a
 recent uberblock), but with a large gap in the writes.  Then the system
 loses power, or crashes.  What happens then?

Hey Bob,

Thinking about this a bit more, you've given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?

I wonder if you could do this after a few uberblocks have been
written.  It would seem to be a good way of catching devices that
aren't writing correctly early on, as well as a way of guaranteeing
that previous uberblocks are available to roll back to should a write
go wrong.

I wonder what the upper limits for this kind of write failure is going
to be.  I've seen 30 second delays mentioned in this thread.  How
often are uberblocks written?  Is there any guarantee that we'll
always have more than 30 seconds worth of uberblocks on a drive?
Should ZFS be set so that it keeps either a given number of
uberblocks, or 5 minutes worth of uberblocks, whichever is the larger?

Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn


On Fri, 13 Feb 2009, Ross Smith wrote:


Also, that's a pretty extreme situation since you'd need a device that
is being written to but not read from to fail in this exact way.  It
also needs to have no scrubbing being run, so the problem has remained
undetected.


On systems with a lot of RAM, 100% write is a pretty common situation 
since reads are often against data which are already cached in RAM. 
This is common when doing bulk data copies from one device to another 
(e.g. a backup from an internal pool to a USB-based pool) since the 
necessary filesystem information for the destination filesystem can be 
cached in memory for quick access rather than going to disk.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] SPAM *** zpool create from spare partition

2009-02-13 Thread Jan Hlodan


Hello,

I formated unallocated partition using Gparted and now my table looks:

sh-3.2# format -e
Searching for disks...done


AVAILABLE DISK SELECTIONS:
  0. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0
Specify disk (enter its number): 0
selecting c9d0
NO Alt slice
No defect list found
Total disk size is 19457 cylinders
Cylinder size is 16065 (512 byte) blocks

  Cylinders
 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1 IFS: NTFS 0  50985099 26
 2   ActiveSolaris2   5099  129317833 40
 3 Linux native   12932  194566525 34

Can you give me an advice how to choose 3 Linux native partition?
If I know where is this partition, then I can run: zpool create trunk 
c9d0XYZ

right?
Thanks for the answer.

Regards,

Jan Hlodan

Jan Hlodan wrote:

Hello,
thanks for the answer.
The partition table shows that Wind and OS run on:
1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0


 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1 IFS: NTFS 0  50985099 26
 2   ActiveSolaris2   5099  129317833 40


The disk 0. c7t0d0 doesn't contain any disk type:

AVAILABLE DISK SELECTIONS:
  0. c7t0d0 drive type unknown
 /p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0
  1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
 /p...@0,0/pci-...@1f,2/i...@0/c...@0,0
Specify disk (enter its number): 0

Error occurred with device in usechecking: Bad file number
Error: can't open disk '/dev/rdsk/c7t0d0p0'.


AVAILABLE DRIVE TYPES:
   0. Auto configure
   1. other
Specify disk type (enter its number): 0
Auto configuration via format.dat[no]? y
Auto configure failed
No Solaris fdisk partition found.


If create some file system using the Gparted, my partition table will 
look like this:



  Cylinders
 Partition   StatusType  Start   End   Length%
 =   ==  =   ===   ==   ===
 1 IFS: NTFS 0  50985099 26
 2   ActiveSolaris2   5099  129317833 40
 3 Solaris xyz  xyz
34



but I still don't know how to import this partition (num. 3)
If I run:
zpool create  c9d0
I'll lost all my data, right?

Regards,

Jan Hlodan


Will Murnane wrote:
On Thu, Feb 12, 2009 at 21:59, Jan Hlodan 
jh231...@mail-emea.sun.com wrote:
 
I would like to import 3. partition as a another pool but I can't 
see this

partition.

sh-3.2# format -e
Searching for disks...done
AVAILABLE DISK SELECTIONS:
 0. c7t0d0 drive type unknown
/p...@0,0/pci104d,8...@1d,7/stor...@4/d...@0,0
 1. c9d0 DEFAULT cyl 7830 alt 2 hd 255 sec 63
/p...@0,0/pci-...@1f,2/i...@0/c...@0,0
I guess that 0. is wind partition and 1. is Opensolaris


What you see there are whole disks, not partitions.  Try zpool
status, which will show you that rpool is on something like c9d0s0.
Then go into format again, pick 1 (in my example), type fdisk to
look at the DOS-style partition table and verify that the partitioning
of the disk matches what you thought it was.

Then you can create a new zpool with something like zpool create 
data c9t0p3.


Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn


On Fri, 13 Feb 2009, Ross Smith wrote:


Thinking about this a bit more, you've given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?


That sounds like a good idea.  However, how do you know for sure that 
the data returned is not returned from a volatile cache?  If the 
hardware is ignoring cache flush requests, then any data returned may 
be from a volatile cache.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs destroy hanging

2009-02-13 Thread David Dyer-Bennet

This shouldn't be taking anywhere *near* half an hour.  The snapshots
differ trivially, by one or two files and less than 10k of data (they're
test results from working on my backup script).  But so far, it's still
sitting there after more than half an hour.

local...@fsfs:~/src/bup2# zfs destroy ruin/export
cannot destroy 'ruin/export': filesystem has children
use '-r' to destroy the following datasets:
ruin/export/h...@bup-20090210-202557utc
ruin/export/h...@20090210-213902utc
ruin/export/home/local...@first
ruin/export/home/local...@second
ruin/export/home/local...@bup-20090210-202557utc
ruin/export/home/local...@20090210-213902utc
ruin/export/home/localddb
ruin/export/home
local...@fsfs:~/src/bup2# zfs destroy -r ruin/export

It's still hung.

Ah, here's zfs list output from shortly before I started the destroy:

ruin 474G   440G   431G  /backups/ruin
ruin/export 35.0M   440G18K  /backups/ruin/export
ruin/export/home35.0M   440G19K  /export/home
ruin/export/home/localddb 35M   440G  27.8M  /export/home/localddb

As you can see, the ruin/export/home filesystem (and subs) is NOT large.

iostat shows no activity on pool ruin over a minute.

local...@fsfs:~$ pfexec zpool iostat ruin 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
ruin 474G   454G 10  0  1.13M840
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0
ruin 474G   454G  0  0  0  0

The pool still thinks it is healthy.

local...@fsfs:~$ zpool status -v ruin
  pool: ruin
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 4h42m with 0 errors on Mon Feb  9 19:10:49 2009
config:

NAMESTATE READ WRITE CKSUM
ruinONLINE   0 0 0
  c7t0d0ONLINE   0 0 0

errors: No known data errors

There is still a process out there trying to run that destroy.  It doesn't
appear to be using much cpu time.

local...@fsfs:~$ ps -ef | grep zfs
localddb  7291  7228   0 15:10:56 pts/4   0:00 grep zfs
root  7223  7101   0 14:18:27 pts/3   0:00 zfs destroy -r ruin/export

Running 2008.11.

local...@fsfs:~$ uname -a
SunOS fsfs 5.11 snv_101b i86pc i386 i86pc Solaris

Any suggestions?  Eventually I'll kill the process by the gentlest way
that works, I suppose (if it doesn't complete).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Nicolas Williams

On Fri, Feb 13, 2009 at 02:00:28PM -0600, Nicolas Williams wrote:
 Ordering matters for atomic operations, and filesystems are full of
 those.

Also, note that ignoring barriers is effectively as bad as dropping
writes if there's any chance that some writes will never hit the disk
because of, say, power failures.  Imagine 100 txgs, but some writes from
the first txg never hitting the disk because the drive keeps them in the
cache without flushing them for too long, then you pull out the disk, or
power fails -- in that case not even fallback to older txgs will help
you, there'd be nothing that ZFS could do to help you.

Of course, presumably even with most lousy drives you'd still have to be
quite unlucky to lose writes written more than N txgs ago, for some
value of N.  But the point stands; what you lose will be a matter of
chance (and it could well be whole datasets) given the kinds of devices
we've been discussing.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-02-13 Thread Nicola Fankhauser

 How does mounting the card work?  Can one reverse the
 slot cover and screw it in like that, or is the card hanging free?

unfortunately, the cover does not fit in the case, so I fixed it with a tip of 
hot glue; the same I used to fix the intel gig-e pci-e card (which is a 
low-profile version). not optimal, I know, but it works.

nicola
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] set mountpoint but don't mount?

2009-02-13 Thread Frank Cusack


On January 30, 2009 1:09:49 PM -0500 Mark J Musante mmusante at
east.sun.com
wrote:

On Fri, 30 Jan 2009, Frank Cusack wrote:


so, is there a way to tell zfs not to perform the mounts for data2? or
another way i can replicate the pool on the same host, without exporting
the original pool?


There is not a way to do that currently, but I know it's coming down the
road.


just for closure, a likely solution (seems correct, but i am unable to
test just now) was presented in another thread.  i note the answer here
so that a search which finds this thread has both the question and answer
in the same place.

On January 31, 2009 10:57:11 AM +0100 Kees Nuyt k.nuyt at zonnet.nl
wrote:

That property is called canmount.
man zfs
/canmount


i didn't test, but it seems that setting canmount to noauto, replicating,
then changing canmount back to on, would do the trick.


It turns out this doesn't work for datasets that are mounted in the
global zone that you can't unmount.

Setting the canmount property to 'noauto' has the side effect (why?) of
immediately unmounting, and failing if it can't do so.  For datasets
which are zoned, if you are running the 'zfs set' in the global zone,
the dataset remains mounted in the zone.  But for datasets mounted in
the global zone, e.g. being served via NFS, the 'zfs set' fails.

Funny though, I wrote the above and tested a few more times, and now
I do have one of my home directories' canmount property set to 'noauto',
and I can no longer change it back to 'on'.  How it got set to 'noauto'
is a mystery as it was never unmounted during the brief time I have
been composing this email, and I was consistently getting an error
message from zfs about it being in use.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ian Collins


Richard Elling wrote:

Greg Palmer wrote:

Miles Nordin wrote:

gm That implies that ZFS will have to detect removable devices
gm and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
Since this discussion is taking place in the context of someone 
removing a USB stick I think you're confusing the issue by dragging 
in other technologies. Let's keep this in the context of the posts 
preceding it which is how USB devices are treated. I would argue that 
one of the first design goals in an environment where you can expect 
people who are not computer professionals to be interfacing with 
computers is to make sure that the appropriate safeties are in place 
and that the system does not behave in a manner which a reasonable 
person might find unexpected.


It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn't do any write caching. Hence, it
seems to work ok for casual users. I note that neither NTFS, ZFS, 
reiserfs,

nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?


I'd follow that up by saying that those of us who do use something other 
that FAT with USB devices have a reasonable understanding of the 
limitations of those devices.


Using ZFS is non-trivial from a typical user's perspective.  The device 
has to be identified and the pool created.  When a USB device is 
connected, the pool has to be manually imported before it can be used.  
Import/export could be fully integrated with gnome.  Once that is in 
place, using a ZFS formatted USB stick should be just as safe as a FAT 
formatted one.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Ross Smith

You don't, but that's why I was wondering about time limits.  You have
to have a cut off somewhere, but if you're checking the last few
minutes of uberblocks that really should cope with a lot.  It seems
like a simple enough thing to implement, and if a pool still gets
corrupted with these checks in place, you can absolutely, positively
blame it on the hardware.  :D

However, I've just had another idea.  Since the uberblocks are pretty
vital in recovering a pool, and I believe it's a fair bit of work to
search the disk to find them.  Might it be a good idea to allow ZFS to
store uberblock locations elsewhere for recovery purposes?

This could be as simple as a USB stick plugged into the server, a
separate drive, or a network server.  I guess even the ZIL device
would work if it's separate hardware.  But knowing the locations of
the uberblocks would save yet more time should recovery be needed.



On Fri, Feb 13, 2009 at 8:59 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Fri, 13 Feb 2009, Ross Smith wrote:

 Thinking about this a bit more, you've given me an idea:  Would it be
 worth ZFS occasionally reading previous uberblocks from the pool, just
 to check they are there and working ok?

 That sounds like a good idea.  However, how do you know for sure that the
 data returned is not returned from a volatile cache?  If the hardware is
 ignoring cache flush requests, then any data returned may be from a volatile
 cache.

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Richard Elling


Tim wrote:



On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us mailto:bfrie...@simple.dallas.tx.us 
wrote:


On Fri, 13 Feb 2009, Ross Smith wrote:

However, I've just had another idea.  Since the uberblocks are
pretty
vital in recovering a pool, and I believe it's a fair bit of
work to
search the disk to find them.  Might it be a good idea to
allow ZFS to
store uberblock locations elsewhere for recovery purposes?


Perhaps it is best to leave decisions on these issues to the ZFS
designers who know how things work.

Previous descriptions from people who do know how things work
didn't make it sound very difficult to find the last 20
uberblocks.  It sounded like they were at known points for any
given pool.

Those folks have surely tired of this discussion by now and are
working on actual code rather than reading idle discussion between
several people who don't know the details of how things work.



People who don't know how things work often aren't tied down by the 
baggage of knowing how things work.  Which leads to creative solutions 
those who are weighed down didn't think of.  I don't think it hurts in 
the least to throw out some ideas.  If they aren't valid, it's not 
hard to ignore them and move on.  It surely isn't a waste of anyone's 
time to spend 5 minutes reading a response and weighing if the idea is 
valid or not.


OTOH, anyone who followed this discussion the last few times, has looked
at the on-disk format documents, or reviewed the source code would know
that the uberblocks are kept in an 128-entry circular queue which is 4x
redundant with 2 copies each at the beginning and end of the vdev.
Other metadata, by default, is 2x redundant and spatially diverse.

Clearly, the failure mode being hashed out here has resulted in the defeat
of those protections. The only real question is how fast Jeff can roll 
out the

feature to allow reverting to previous uberblocks.  The procedure for doing
this by hand has long been known, and was posted on this forum -- though
it is tedious.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS on SAN?

2009-02-13 Thread Andras Spitzer

Hi,

When I read the ZFS manual, it usually recommends to configure redundancy at 
the ZFS layer, mainly because there are features that will work only with 
redundant configuration (like corrupted data correction), also it implies that 
the overall robustness will improve.

My question is simple, what is the recommended configuration on SAN (on 
high-end EMC, like the Symmetrix DMX series for example) where usually the 
redundancy is configured at the array level, so most likely we would use simple 
ZFS layout, without redundancy?

Is it worth to move the redundancy from the SAN array layer to the ZFS layer? 
(configuring redundancy on both layers is sounds like a waste to me)  There are 
certain advantages on the array to have redundancy configured (beyond the 
protection against simple disk failure). Can we compare the advantages of 
having  (for example) RAID5 configured on a high-end SAN with no redundancy at 
the ZFS layer versus no redundant RAID configuration on the high-end SAN but 
having raidz or raidz2 on the ZFS layer?

Any tests, experience or best practices regarding this topic?

How does ZFS perform (from performance and robustness (or availability if you 
like) point of view) on high-end SANs, compared to a VSF for example?

If you could share your experience with me, I would really appreciate that.

Regards,
sendai
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Bob Friesenhahn


On Fri, 13 Feb 2009, Tim wrote:

I don't think it hurts in the least to throw out some ideas.  If 
they aren't valid, it's not hard to ignore them and move on.  It 
surely isn't a waste of anyone's time to spend 5 minutes reading a 
response and weighing if the idea is valid or not.


Today I sat down at 9:00 AM to read the new mail for the day and did 
not catch up until five hours later.  Quite a lot of the reading was 
this (now) useless discussion thread.  It is now useless since after 
five hours of reading, there were no ideas expressed that had not been 
expressed before.


With this level of overhead, I am surprise that there is any remaining 
development motion on ZFS at all.


Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Frank Cusack

On February 13, 2009 7:58:51 PM -0600 Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

With this level of overhead, I am surprise that there is any remaining
development motion on ZFS at all.


come on now.  with all due respect, you are attempting to stifle
relevant discussion and that is, well, bordering on ridiculous.

i sure have learned a lot from this thread.  now of course that is
meaningless because i don't and almost certainly never will contribute
to zfs, but i assume there are others who have learned from this thread.
that's definitely a good thing.

this thread also appears to be the impetus to change priorities on
zfs development.


Today I sat down at 9:00 AM to read the new mail for the day and did not
catch up until five hours later.  Quite a lot of the reading was this
(now) useless discussion thread.  It is now useless since after five
hours of reading, there were no ideas expressed that had not been
expressed before.


lastly, WOW!  if this thread is worthless to you, learn to use the
delete button.  especially if you read that slowly.  i know i certainly
couldn't keep up with all my incoming mail if i read everything.

i'm sorry to berate you, as you do make very valuable contributions to
the discussion here, but i take offense at your attempts to limit
discussion simply because you know everything there is to know about
the subject.

great, now i am guilty of being overhead.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread James C. McPherson


Hi Bob,

On Fri, 13 Feb 2009 19:58:51 -0600 (CST)
Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote:

 On Fri, 13 Feb 2009, Tim wrote:
 
  I don't think it hurts in the least to throw out some ideas.  If 
  they aren't valid, it's not hard to ignore them and move on.  It 
  surely isn't a waste of anyone's time to spend 5 minutes reading a 
  response and weighing if the idea is valid or not.
 
 Today I sat down at 9:00 AM to read the new mail for the day and did 
 not catch up until five hours later.  Quite a lot of the reading was 
 this (now) useless discussion thread.  It is now useless since after 
 five hours of reading, there were no ideas expressed that had not
 been expressed before.

I've found this thread to be like watching a car accident, and
also really frustrating due to the inability to use search engines
on the part of many posters. 
 
 With this level of overhead, I am surprise that there is any
 remaining development motion on ZFS at all.

Good thing the ZFS developers have mail filters :-)


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS on SAN?

2009-02-13 Thread Andras Spitzer

Damon,

Yes, we can provide simple concat inside the array (even though today we 
provide RAID5 or RAID1 as our standard, and using Veritas with concat), the 
question is more of if it's worth it to switch the redundancy from the array to 
the ZFS layer.

The RAID5/1 features of the high-end EMC arrays also provide performance 
improvements, that's why I wonder what would be the pros/cons of such a switch 
(I mean the switch of the redundancy from the array to the ZFS layer).

So, you telling me that even if the SAN provides redundancy (HW RAID5 or 
RAID1), people still configure ZFS with either raidz or mirror?

Regards,
sendai

On Sat, Feb 14, 2009 at 6:06 AM, Damon Atkins 
damon.atk...@_no_spam_yahoo.com.au wrote:
 Andras,
  It you can get Concat Disk or Raid 0 Disk inside the array, then use RaidZ
 (if I/O is not large amount or its mostly sequential) if very high I/O then
 use ZFS Mirror. You can not spread a zpool over multiple EMC Arrays using
 SRDF if you are not using EMC Power Path.

 HDS for example does not support anything other than Mirror or RAID5
 configuration, so RaidZ or ZFS Mirror results in a lot of wasted disk space.
  However people still use RaidZ on HDS Raid5. As the top of the line HDS
 arrays are very fast and they want the features offered by ZFS.

 Cheers


-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

51 matches

Mail list logo