Re: [zfs-discuss] Yager on ZFS

2007-11-16 Thread Adam Leventhal
On Thu, Nov 08, 2007 at 07:28:47PM -0800, can you guess? wrote:
  How so? In my opinion, it seems like a cure for the brain damage of RAID-5.
 
 Nope.
 
 A decent RAID-5 hardware implementation has no 'write hole' to worry about, 
 and one can make a software implementation similarly robust with some effort 
 (e.g., by using a transaction log to protect the data-plus-parity 
 double-update or by using COW mechanisms like ZFS's in a more intelligent 
 manner).

Can you reference a software RAID implementation which implements a solution
to the write hole and performs well. My understanding (and this is based on
what I've been told from people more knowledgeable in this domain than I) is
that software RAID has suffered from being unable to provide both
correctness and acceptable performance.

 The part of RAID-Z that's brain-damaged is its 
 concurrent-small-to-medium-sized-access performance (at least up to request 
 sizes equal to the largest block size that ZFS supports, and arguably 
 somewhat beyond that):  while conventional RAID-5 can satisfy N+1 
 small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in 
 parallel (though the latter also take an extra rev to complete), RAID-Z can 
 satisfy only one small-to-medium access request at a time (well, plus a 
 smidge for read accesses if it doesn't verity the parity) - effectively 
 providing RAID-3-style performance.

Brain damage seems a bit of an alarmist label. While you're certainly right
that for a given block we do need to access all disks in the given stripe,
it seems like a rather quaint argument: aren't most environments that matter
trying to avoid waiting for the disk at all? Intelligent prefetch and large
caches -- I'd argue -- are far more important for performance these days.

 The easiest way to fix ZFS's deficiency in this area would probably be to map 
 each group of N blocks in a file as a stripe with its own parity - which 
 would have the added benefit of removing any need to handle parity groups at 
 the disk level (this would, incidentally, not be a bad idea to use for 
 mirroring as well, if my impression is correct that there's a remnant of 
 LVM-style internal management there).  While this wouldn't allow use of 
 parity RAID for very small files, in most installations they really don't 
 occupy much space compared to that used by large files so this should not 
 constitute a significant drawback.

I don't really think this would be feasible given how ZFS is stratified
today, but go ahead and prove me wrong: here are the instructions for
bringing over a copy of the source code:

  http://www.opensolaris.org/os/community/tools/scm

- ahl

-- 
Adam Leventhal, FishWorkshttp://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS mirror and sun STK 2540 FC array

2007-11-16 Thread Ben
Hi all,

we have just bought a sun X2200M2 (4GB / 2 opteron 2214 / 2 disks 250GB 
SATA2, solaris 10 update 4)
and a sun STK 2540 FC array (8 disks SAS 146 GB, 1 raid controller).
The server is attached to the array with a single 4 Gb Fibre Channel link.

I want to make a mirror using ZFS with this array. 

I have created  2 volumes on the array
in RAID0 (stripe of 128 KB) presented to the host with lun0 and lun1.

So, on the host  :
bash-3.00# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c1d0 DEFAULT cyl 30397 alt 2 hd 255 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   1. c2d0 DEFAULT cyl 30397 alt 2 hd 255 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   2. c6t600A0B800038AFBC02F7472155C0d0 DEFAULT cyl 35505 alt 2 
hd 255 sec 126
  /scsi_vhci/[EMAIL PROTECTED]
   3. c6t600A0B800038AFBC02F347215518d0 DEFAULT cyl 35505 alt 2 
hd 255 sec 126
  /scsi_vhci/[EMAIL PROTECTED]
Specify disk (enter its number):

bash-3.00# zpool create tank mirror 
c6t600A0B800038AFBC02F347215518d0 c6t600A0B800038AFBC02F7472155C0d0

bash-3.00# df -h /tank
Filesystem size   used  avail capacity  Mounted on
tank   532G24K   532G 1%/tank


I have tested the performance with a simple dd
[
time dd if=/dev/zero of=/tank/testfile bs=1024k count=1
time dd if=/tank/testfile of=/dev/null bs=1024k count=1
]
command and it gives :
# local throughput
stk2540
   mirror zfs /tank
read   232 MB/s
write  175 MB/s

# just to test the max perf I did:
zpool destroy -f tank
zpool create -f pool c6t600A0B800038AFBC02F347215518d0

And the same basic dd gives me :
  single zfs /pool
read   320 MB/s
write  263 MB/s

Just to give an idea the SVM mirror using the two local sata2 disks
gives :
read  58 MB/s
write 52 MB/s

So, in production the zfs /tank mirror will be used to hold
our home directories (10 users using 10GB each),
our projects files (200 GB mostly text files and cvs database),
and some vendors tools (100 GB).
People will access the data (/tank) using nfs4 with their
workstations (sun ultra 20M2 with centos 4update5).

On the ultra20 M2, the basic test via nfs4 gives :
read  104 MB/s
write  63 MB/s

A this point, I have the following questions :
-- Does someone has some similar figures about the STK 2540 using zfs  ?

-- Instead of doing only 2 volumes in the array,
   what do you think about doing 8 volumes (one for each disk)
   and doing a 4 two way mirror :
   zpool create tank mirror  c6t6001.. c6t6002.. mirror c6t6003.. 
c6t6004.. {...} mirror c6t6007.. c6t6008..

-- I will add 4 disks in the array next summer.
   Do you think  I should create 2 new luns in the array
   and doing a :
zpool add tank mirror c6t6001..(lun3) c6t6001..(lun4)
  
   or build from scratch the 2 luns (6 disks raid0) , and the pool tank
(ie : backup /tank - zpool destroy -- add disk - reconfigure array 
-- zpool create tank ... - restore backuped data)

-- I think about doing a disk scrubbing once a month.
   Is it sufficient ?

-- Have you got any comment on the performance from the nfs4 client ?

If you add any advices / suggestions, feel free to share.

Thanks,  
 
 Benjamin
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-16 Thread Paul Boven
Hi Dan,

Dan Pritts wrote:
 On Tue, Nov 13, 2007 at 12:25:24PM +0100, Paul Boven wrote:

 We've building a storage system that should have about 2TB of storage
 and good sequential write speed. The server side is a Sun X4200 running
 Solaris 10u4 (plus yesterday's recommended patch cluster), the array we
 bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and
 it's connected to the Sun through U320-scsi.
 
 We are doing basically the same thing with simliar Western Scientific
 (wsm.com) raids, based on infortrend controllers.  ZFS notices when we
 pull a disk and goes on and does the right thing.
 
 I wonder if you've got a scsi card/driver problem.  We tried using
 an Adaptec card with solaris with poor results; switched to LSI,
 it just works.

Thanks for your reply. The SCSI-card in the X4200 is a Sun Single
Channel U320 card that came with the system, but the PCB artwork does
sport a nice 'LSI LOGIC' imprint.

So, just to make sure we're talking about the same thing here - your
drives are SATA, you're exporting each drive through the Western
Scientific raidbox as a seperate volume, and zfs actually brings in a
hot spare when you pull a drive?

Over here, I've still not been able to accomplish that - even after
installing Nevada b76 on the machine, removing a disk will not cause a
hot-spare to become active, nor does resilvering start. Our Transtec
raidbox seems to be based on a chipset by Promise, by the way.

Regards, Paul Boven.
-- 
Paul Boven [EMAIL PROTECTED] +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-16 Thread Dan Pritts
On Fri, Nov 16, 2007 at 11:31:00AM +0100, Paul Boven wrote:
 Thanks for your reply. The SCSI-card in the X4200 is a Sun Single
 Channel U320 card that came with the system, but the PCB artwork does
 sport a nice 'LSI LOGIC' imprint.

That is probably the same card i'm using; it's actually a Sun card
but as you say is OEM by LSI.

 So, just to make sure we're talking about the same thing here - your
 drives are SATA, 

yes

 you're exporting each drive through the Western
 Scientific raidbox as a seperate volume, 

yes

 and zfs actually brings in a
 hot spare when you pull a drive?

yes

OS is Sol10U4, system is an X4200, original hardware rev.

 Over here, I've still not been able to accomplish that - even after
 installing Nevada b76 on the machine, removing a disk will not cause a
 hot-spare to become active, nor does resilvering start. Our Transtec
 raidbox seems to be based on a chipset by Promise, by the way.

I have heard some bad things about the Promise RAID boxes but I haven't
had any direct experience.  

I do own one Promise box that accepts 4 PATA drives and exports them to a
host as scsi disks.  Shockingly, it uses a master/slave IDE configuration
rather than 4 separate IDE controllers.  It wasn't super expensive but
it wasn't dirt cheap, either, and it seems it would have cost another
$5 to manufacture the right way.

I've had fine luck with Promise $25 ATA PCI cards :)

The infortrend units, on the other hand, I have had generally quite good
luck with.  When I worked at UUNet in the late '90s we had hundreds of
their SCSI RAIDs deployed.  

I do have an Infortrend FC-attached raid with SATA disks, which basically
works fine.  It has an external JBOD also SATA disks connecting to
the main raid with FC.  Unfortunately, The RAID unit boots faster than
the JBOD.  So, if you turn them on at the same time, it thinks the JBOD
is gone and doesn't notice it's there until you reboot the controller.

That caused a little pucker for my colleagues when it happened while i
was on vacation.  The support guy at the reseller we were working with
(NOT Western Scientific) told them the raid was hosed and they should
rebuild from scratch, hope you had a backup.  

danno
--
Dan Pritts, System Administrator
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-16 Thread roland egle
We are having the same problem.

First with 125025-05 and then also with 125205-07
Solaris 10 update 4 - Know with all Patchesx


We opened a Case and got

T-PATCH 127871-02

we installed the Marvell Driver Binary 3 Days ago.

T127871-02/SUNWckr/reloc/kernel/misc/sata
T127871-02/SUNWmv88sx/reloc/kernel/drv/marvell88sx
T127871-02/SUNWmv88sx/reloc/kernel/drv/amd64/marvell88sx
T127871-02/SUNWsi3124/reloc/kernel/drv/si3124
T127871-02/SUNWsi3124/reloc/kernel/drv/amd64/si3124 

It seems that this resolve the device reset problem and the nfsd crash on
x4500 with one raidz2 pool and a lot of zfs Filesystems
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + fragments

2007-11-16 Thread can you guess?
...

 I personally believe that since most people will have
 hardware LUN's
 (with underlying RAID) and cache, it will be
 difficult to notice
 anything. Given that those hardware LUN's might be
 busy with their own
 wizardry ;) You will also have to minimize the effect
 of the database
 cache ...

By definition, once you've got the entire database in cache, none of this 
matters (though filling up the cache itself takes some added time if the table 
is fragmented).

Most real-world databases don't manage to fit all or even mostly in cache, 
because people aren't willing to dedicate that much RAM to running them.  
Instead, they either use a lot less RAM than the database size or share the 
system with other activity that shares use of the RAM.

In other words, they use a cost-effective rather than a money-is-no-object 
configuration, but then would still like to get the best performance they can 
from it.

 
 It will be a tough assignment ... maybe someone has
 already done this?
 
 Thinking about this (very abstract) ... does it
 really matter?
 
 [8KB-a][8KB-b][8KB-c]
 
 So what it 8KB-b gets updated and moved somewhere
 else? If the DB gets
 a request to read 8KB-a, it needs to do an I/O
 (eliminate all
 caching). If it gets a request to read 8KB-b, it
 needs to do an I/O.
 
 Does it matter that b is somewhere else ...

Yes, with any competently-designed database.

 it still
 needs to go get
 it ... only in a very abstract world with read-ahead
 (both hardware or
 db) would 8KB-b be in cache after 8KB-a was read.

1.  If there's no other activity on the disk, then the disk's track cache will 
acquire the following data when the first block is read, because it has nothing 
better to do.  But if the all the disks are just sitting around waiting for 
this table scan to get to them, then if ZFS has a sufficiently intelligent 
read-ahead mechanism it could help out a lot here as well:  the differences 
become greater when the system is busier.

2.  Even a moderately smart disk will detect a sequential access pattern if one 
exists and may read ahead at least modestly after having detected that pattern 
even if it *does* have other requests pending.

3.  But in any event any competent database will explicitly issue prefetches 
when it knows (and it *does* know) that it is scanning a table sequentially - 
and will also have taken pains to try to ensure that the table data is laid out 
such that it can be scanned efficiently.  If it's using disks that support 
tagged command queuing it may just issue a bunch of single-database-block 
requests at once, and the disk will organize them such that they can all be 
satisfied by a single streaming access; with disks that don't support queuing, 
the database can elect to issue a single large I/O request covering many 
database blocks, accomplishing the same thing as long as the table is in fact 
laid out contiguously on the medium (the database knows this if it's handling 
the layout directly, but when it's using a file system as an intermediary it 
usually can only hope that the file system has minimized file fragmentation).

 
 Hmmm... the only way is to get some data :) *hehe*

Data is good, as long as you successfully analyze what it actually means:  it 
either tends to confirm one's understanding or to refine it.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool question

2007-11-16 Thread Mark J Musante
On Thu, 15 Nov 2007, Brian Lionberger wrote:

 The question is, should I create one zpool or two to hold /export/home
 and /export/backup?
 Currently I have one pool for /export/home and one pool for /export/backup.

 Should it be on pool for both??? Would this be better and why?

One thing to consider is that pools are the granularity of 'export' 
operations, so if you ever want to, for example, move the /export/backup 
disks to a new computer, but keep /export/home on the current computer, 
you couldn't do that easily if you create a pair of striped 2-way mirrors.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Need a 2-port PCI-X SATA-II controller for x86

2007-11-16 Thread Brian Hechinger
I'll be setting up a small server and need two SATA-II ports for an x86
box.  The cheaper the better.

Thanks!!

-brian
-- 
Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it's just
that most of the shit out there is built by people who'd be better
suited to making sure that my burger is cooked thoroughly.  -- Jonathan 
Patschke
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to destory a faulted pool

2007-11-16 Thread Manoj Nayak
How I can destroy the following pool  ?

pool: mstor0
id: 5853485601755236913
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

mstor0  UNAVAIL   insufficient replicas
  raidz1UNAVAIL   insufficient replicas
c5t0d0  FAULTED   corrupted data
c4t0d0  FAULTED   corrupted data
c1t0d0  ONLINE
c0t0d0  ONLINE


pool: zpool1
id: 14693037944182338678
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

zpool1  UNAVAIL   insufficient replicas
  raidz1UNAVAIL   insufficient replicas
c0t1d0  UNAVAIL   cannot open
c1t1d0  UNAVAIL   cannot open
c4t1d0  UNAVAIL   cannot open
c6t1d0  UNAVAIL   cannot open
c7t1d0  UNAVAIL   cannot open
  raidz1UNAVAIL   insufficient replicas
c0t2d0  UNAVAIL   cannot open
c1t2d0  UNAVAIL   cannot open
c4t2d0  UNAVAIL   cannot open
c6t2d0  UNAVAIL   cannot open
c7t2d0  UNAVAIL   cannot open
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-16 Thread Peter Schuller
 Brain damage seems a bit of an alarmist label. While you're certainly right
 that for a given block we do need to access all disks in the given stripe,
 it seems like a rather quaint argument: aren't most environments that
 matter trying to avoid waiting for the disk at all? Intelligent prefetch
 and large caches -- I'd argue -- are far more important for performance
 these days.

The concurrent small-i/o problem is fundamental though. If you have an 
application where you care only about random concurrent reads for example, 
you would not want to use raidz/raidz2 currently. No amount of smartness in 
the application gets around this. It *is* a relevant shortcoming of 
raidz/raidz2 compared to raid5/raid6, even if in many cases it is not 
significant.

If disk space is not an issue, striping across mirrors will be okay for random 
seeks. But if you also care about diskspace, it's a show stopper unless you 
can throw money at the problem.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



signature.asc
Description: This is a digitally signed message part.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool question

2007-11-16 Thread Brian Lionberger
I have a zpool issue that I need to discuss.

My application is going to run on a 3120 with 4 disks. Two(mirrored) 
disks will represent /export/home and the other two(mirrored) will be 
/export/backup.

The question is, should I create one zpool or two to hold /export/home 
and /export/backup?
Currently I have one pool for /export/home and one pool for /export/backup.

Should it be on pool for both??? Would this be better and why?

Thanks for any help and advice.

Brian.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for consumers WAS:Yager on ZFS

2007-11-16 Thread Paul Kraus
Splitting this thread and changing the subject to reflect that...

On 11/14/07, can you guess? [EMAIL PROTECTED] wrote:

 Another prominent debate in this thread revolves around the question of
 just how significant ZFS's unusual strengths are for *consumer* use.
 WAFL clearly plays no part in that debate, because it's available only
 on closed, server systems.

I am both a large systems administrator and a 'home user' (I
prefer that term to 'consumer'). I am also very slow to adopt new
technologies in either environment. We have started using ZFS at work
due to performance improvements (for our workload) over UFS (or any
other FS we tested). At home the biggest reason I went with ZFS for my
data is ease of management. I split my data up based on what it is ...
media (photos, movies, etc.), vendor stuff (software, datasheets,
etc.), home directories, and other misc. data. This gives me a good
way to control backups based on the data type. I know, this is all
more sophisticated than the typical home user. The biggest win for me
is that I don't have to partition my storage in advance. I build one
zpool and multiple datasets. I don't set quotas or reservations
(although I could).

So I suppose my argument for ZFS in home use is not data
integrity, but much simpler management, both short and long term.

-- 
Paul Kraus
Albacon 2008 Facilities
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-16 Thread can you guess?
 can you guess? billtodd at metrocast.net writes:
  
  You really ought to read a post before responding
 to it:  the CERN study
  did encounter bad RAM (and my post mentioned that)
 - but ZFS usually can't
  do a damn thing about bad RAM, because errors tend
 to arise either
  before ZFS ever gets the data or after it has
 already returned and checked
  it (and in both cases, ZFS will think that
 everything's just fine).
 
 According to the memtest86 author, corruption most
 often occurs at the moment 
 memory cells are written to, by causing bitflips in
 adjacent cells. So when a 
 disk DMA data to RAM, and corruption occur when the
 DMA operation writes to 
 the memory cells, and then ZFS verifies the checksum,
 then it will detect the 
 corruption.
 
 Therefore ZFS is perfectly capable (and even likely)
 to detect memory 
 corruption during simple read operations from a ZFS
 pool.
 
 Of course there are other cases where neither ZFS nor
 any other checksumming 
 filesystem is capable of detecting anything (e.g. the
 sequence of events: data 
 is corrupted, checksummed, written to disk).

Indeed - the latter was the first of the two scenarios that I sketched out.  
But at least on the read end of things ZFS should have a good chance of 
catching errors due to marginal RAM.
That must mean that most of the worrisome alpha-particle problems of yore have 
finally been put to rest (since they'd be similarly likely to trash data on the 
read side after ZFS had verified it).  I think I remember reading that 
somewhere at some point, but I'd never gotten around to reading that far in the 
admirably-detailed documentation that accompanies memtest:  thanks for 
enlightening me.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] read/write NFS block size and ZFS

2007-11-16 Thread Richard Elling
msl wrote:
 Hello all...
  I'm migrating a nfs server from linux to solaris, and all clients(linux) are 
 using read/write block sizes of 8192. That was the better performance that i 
 got, and it's working pretty well (nfsv3). I want to use all the zfs' 
 advantages, and i know i can have a performance loss, so i want to know if 
 there is a recomendation for bs on nfs/zfs, or what do you think about it.
   

That is the network block transfer size.  The default is normally 32 kBytes.
I don't see any reason to change ZFS's block size to match.
You should follow the best practices as described at
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

If you notice a performance issue with metadata updates, be sure to 
check out
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
 
 -- richard
 I must test, or there is no need to make such configurations with zfs?
 Thanks very much for your time!
 Leal.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS snapshot send/receive via intermediate device

2007-11-16 Thread Ross
Hey folks,

I have no knowledge at all about how streams work in Solaris, so this might 
have a simple answer, or be completely impossible.  Unfortunately I'm a windows 
admin so haven't a clue which :)

We're looking at rolling out a couple of ZFS servers on our network, and 
instead of tapes we're considering using off-site NAS boxes for backups.  We 
think there's likely to be too much data each day to send the incremental 
snapshots to the remote systems over the wire, so we're wondering if we can use 
removable disks instead to transport just the incremental changes.

The idea is that we can do the initial zfs send on-site with the NAS plugged 
on the network, and from then on we just need a 500GB removable disk to take 
the changes off site each night.

Let me be clear on that:  We're not thinking of storing the whole zfs pool on 
the removable disk, there's just too much data.  Instead, we want to use zfs 
send -i to store just the incremental changes on a removable disk, so we can 
then take that disk home and plug it into another device and use zfs receive to 
upload the changes.  Does anybody know if that's possible?

If it works it's a nice and simple off-site backup, with the added benefit that 
we have a very rapid disaster recovery response.  No need to waste time 
restoring from tape:  the off-site backup can be brought onto the network and 
data is accessible immediately.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot mount 'mypool': Input/output error

2007-11-16 Thread Eric Ham
On Nov 15, 2007 9:42 AM, Nabeel Saad [EMAIL PROTECTED] wrote:
 I am sure I will not use ZFS to its fullest potential at all.. right now I'm 
 trying to recover the dead disk, so if it works to mount a single disk/boot 
 disk, that's all I need, I don't need it to be very functional.  As I 
 suggested, I will only be using this to change permissions and then return 
 the disk into the appropriate Server once I am able to log back into that 
 server.

(Sorry, forgot to CC the list.)

Ok, so assuming that all you want to do is mount your old Solaris disk
and change some permissions, then there is probably an easier solution
which is to put the hard drive back in the original machine and boot
from a (Open)Solaris CD or DVD.  This eliminates the whole Linux/FUSE
issues you're getting into.  Your easiest option might be to try the
new OpenSolaris Developer Preview distribution since it's actually a
Live CD which would give you a full GUI and networking to play with.

http://www.opensolaris.org/os/downloads/

Once the Live CD boots, you should be able to mount your drive to an
alternate path like /a and then change permissions.  If you boot from
a regular Solaris CD or DVD it will start the install process, but
then you should be able to simply cancel the install and get to a
command line and work from there.

Good luck!

Regards,
-Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool io to 6140 is really slow

2007-11-16 Thread Asif Iqbal
I have the following layout

A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
A1 anfd B1 controller port 4Gbps speed.
Each controller has 2G NVRAM

On 6140s I setup raid0 lun per SAS disks with 16K segment size.

On 490 I created a zpool with 8 4+1 raidz1s

I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
/etc/system

Is there a way I can improve the performance. I like to get 1GB/sec IO.

Currently each lun is setup as primary A1 and secondary B1 or vice versa

I also have write cache eanble according to CAM

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destory a faulted pool

2007-11-16 Thread Marco Lopes
Manoj,


# zpool destroy -f mstor0


Regards,
Marco Lopes.


Manoj Nayak wrote:

How I can destroy the following pool  ?

pool: mstor0
id: 5853485601755236913
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

mstor0  UNAVAIL   insufficient replicas
  raidz1UNAVAIL   insufficient replicas
c5t0d0  FAULTED   corrupted data
c4t0d0  FAULTED   corrupted data
c1t0d0  ONLINE
c0t0d0  ONLINE


pool: zpool1
id: 14693037944182338678
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

zpool1  UNAVAIL   insufficient replicas
  raidz1UNAVAIL   insufficient replicas
c0t1d0  UNAVAIL   cannot open
c1t1d0  UNAVAIL   cannot open
c4t1d0  UNAVAIL   cannot open
c6t1d0  UNAVAIL   cannot open
c7t1d0  UNAVAIL   cannot open
  raidz1UNAVAIL   insufficient replicas
c0t2d0  UNAVAIL   cannot open
c1t2d0  UNAVAIL   cannot open
c4t2d0  UNAVAIL   cannot open
c6t2d0  UNAVAIL   cannot open
c7t2d0  UNAVAIL   cannot open
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



-- 

Marco S. Lopes
Senior Technical Specialist
US Systems Practice
Professional Services Delivery
Sun Microsystems
925 984 6611

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
I have historically noticed that in ZFS, when ever there is a heavy
writer to a pool via NFS, the reads can held back (basically paused).
An example is a RAID10 pool of 6 disks, whereby a directory of files
including some large 100+MB in size being written can cause other
clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
I've gotten used to this behavior over NFS, but didn't see it perform
as such when on the server itself doing similar actions.

To improve upon the situation, I thought perhaps I could dedicate a
log device outside the pool, in the hopes that while heavy writes went
to the log device, reads would merrily be allowed to coexist from the
pool itself. My test case isn't ideal per se, but I added a local 9GB
SCSI (80) drive for a log, and added to LUNs for the pool itself.
You'll see from the below that while the log device is pegged at
15MB/sec (sd5),  my directory list request on devices sd15 and sd16
never are answered. I tried this with both no-cache-flush enabled and
off, with negligible difference. Is there anyway to force a better
balance of reads/writes during heavy writes?

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  117.00.0 14970.1  0.0 35.0  299.2   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.10.0 15111.9  0.0 35.0  296.4   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.0 

[zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS + DB + fragments

2007-11-16 Thread Al Hopper

I've been observing two threads on zfs-discuss with the following 
Subject lines:

Yager on ZFS
ZFS + DB + fragments

and have reached the rather obvious conclusion that the author can 
you guess? is a professional spinmeister, who gave up a promising 
career in political speech writing, to hassle the technical list 
membership on zfs-discuss.  To illustrate my viewpoint, I offer the 
following excerpts (reformatted from an obvious WinDoze Luser Mail 
client):

Excerpt 1:  Is this premium technical BullShit (BS) or what?

- BS 301 'grad level technical BS' ---

Still, it does drive up snapshot overhead, and if you start trying to 
use snapshots to simulate 'continuous data protection' rather than 
more sparingly the problem becomes more significant (because each 
snapshot will catch any background defragmentation activity at a 
different point, such that common parent blocks may appear in more 
than one snapshot even if no child data has actually been updated). 
Once you introduce CDP into the process (and it's tempting to, since 
the file system is in a better position to handle it efficiently than 
some add-on product), rethinking how one approaches snapshots (and COW 
in general) starts to make more sense.

- end of BS 301 'grad level technical BS' ---

Comment: Amazing: so many words, so little meaningful technical 
content!

Excerpt 2: Even better than Excerpt 1 - truely exceptional BullShit:

- BS 401 'PhD level technical BS' --

No, but I described how to use a transaction log to do so and later on 
in the post how ZFS could implement a different solution more 
consistent with its current behavior.  In the case of the transaction 
log, the key is to use the log not only to protect the RAID update but 
to protect the associated higher-level file operation as well, such 
that a single log force satisfies both (otherwise, logging the RAID 
update separately would indeed slow things down - unless you had NVRAM 
to use for it, in which case you've effectively just reimplemented a 
low-end RAID controller - which is probably why no one has implemented 
that kind of solution in a stand-alone software RAID product).

...
- end of BS 401 'PhD level technical BS' --

Go ahead and lookup the full context of these exceptional BS excerpts 
and see if the full context brings any further enlightment.  I think 
you'll quickly realize that, after reading the full context, this is 
nothing more than a complete waste of time and that there is nothing 
of technical value to learned from this text.  In fact, there is very, 
very little to be learned from any posts on this list where the 
Subject line is either:

Yager on ZFS
ZFS + DB + fragments

and the author is: can you guess? [EMAIL PROTECTED]

I'm not, for a moment, suggesting that one can't learn *something* 
from the posts of the author can you guess? 
[EMAIL PROTECTED]... indeed there are significant 
spinmeistering skills to be learned from these posts; including how to 
combine portions of cited published technical studies (Google Study, 
CERN study) with a line of total semi-technical bullshit worthy of any 
political spinmeister working withing the DC Beltway Bandit area. 
In fact, if I'm trying to conn^H^H^H^H talk someone out of several 
million dollars to fund a totally BS research project, I'll pay any 
reasonable fees that can you guess? would demand.  Because I'm 
convinced, that with his premium spinmeistering/BS skills - nothing is 
impossible: pigs can fly, NetApp == ZFS, the world is flat  and 
ZFS is a totally deficient technical design because they did'nt 
solicit his totally invaluable technical input.

And.. one note of caution for Jeff Bonwick and Team ZFS - lookout ... 
for this guy - because his new ZFS competitor filesystem, called, 
appropriately, GOMFS (Guess-O-Matic-File-System) is about to be 
released and it'll basically, if I understand can you guess?'s email 
fully, solve all the current ZFS design deficiencies, and totally 
dominate all *nix based filesystems for the next 400 years.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Graduate from sugar-coating school?  Sorry - I never attended! :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Neil Perrin
Joe,

I don't think adding a slog helped in this case. In fact I
believe it made performance worse. Previously the ZIL would be 
spread out over all devices but now all synchronous traffic
is directed at one device (and everything is synchronous in NFS).
Mind you 15MB/s seems a bit on the slow side - especially is
cache flushing is disabled.

It would be interesting to see what all the threads are waiting
on. I think the problem maybe that everything is backed
up waiting to start a transaction because the txg train is
slow due to NFS requiring the ZIL to push everything synchronously.

Neil.

Joe Little wrote:
 I have historically noticed that in ZFS, when ever there is a heavy
 writer to a pool via NFS, the reads can held back (basically paused).
 An example is a RAID10 pool of 6 disks, whereby a directory of files
 including some large 100+MB in size being written can cause other
 clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
 I've gotten used to this behavior over NFS, but didn't see it perform
 as such when on the server itself doing similar actions.
 
 To improve upon the situation, I thought perhaps I could dedicate a
 log device outside the pool, in the hopes that while heavy writes went
 to the log device, reads would merrily be allowed to coexist from the
 pool itself. My test case isn't ideal per se, but I added a local 9GB
 SCSI (80) drive for a log, and added to LUNs for the pool itself.
 You'll see from the below that while the log device is pegged at
 15MB/sec (sd5),  my directory list request on devices sd15 and sd16
 never are answered. I tried this with both no-cache-flush enabled and
 off, with negligible difference. Is there anyway to force a better
 balance of reads/writes during heavy writes?
 
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 fd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1   0.00.00.00.0  0.0  0.00.0   0   0
 sd2   0.00.00.00.0  0.0  0.00.0   0   0
 sd3   0.00.00.00.0  0.0  0.00.0   0   0
 sd4   0.00.00.00.0  0.0  0.00.0   0   0
 sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
 sd6   0.00.00.00.0  0.0  0.00.0   0   0
 sd7   0.00.00.00.0  0.0  0.00.0   0   0
 sd8   0.00.00.00.0  0.0  0.00.0   0   0
 sd9   0.00.00.00.0  0.0  0.00.0   0   0
 sd10  0.00.00.00.0  0.0  0.00.0   0   0
 sd11  0.00.00.00.0  0.0  0.00.0   0   0
 sd12  0.00.00.00.0  0.0  0.00.0   0   0
 sd13  0.00.00.00.0  0.0  0.00.0   0   0
 sd14  0.00.00.00.0  0.0  0.00.0   0   0
 sd15  0.00.00.00.0  0.0  0.00.0   0   0
 sd16  0.00.00.00.0  0.0  0.00.0   0   0
...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
 Joe,

 I don't think adding a slog helped in this case. In fact I
 believe it made performance worse. Previously the ZIL would be
 spread out over all devices but now all synchronous traffic
 is directed at one device (and everything is synchronous in NFS).
 Mind you 15MB/s seems a bit on the slow side - especially is
 cache flushing is disabled.

 It would be interesting to see what all the threads are waiting
 on. I think the problem maybe that everything is backed
 up waiting to start a transaction because the txg train is
 slow due to NFS requiring the ZIL to push everything synchronously.


I agree completely. The log (even though slow) was an attempt to
isolate writes away from the pool. I guess the question is how to
provide for async access for NFS. We may have 16, 32 or whatever
threads, but if a single writer keeps the ZIL pegged and prohibiting
reads, its all for nought. Is there anyway to tune/configure the
ZFS/NFS combination to balance reads/writes to not starve one for the
other. Its either feast or famine or so tests have shown.

 Neil.


 Joe Little wrote:
  I have historically noticed that in ZFS, when ever there is a heavy
  writer to a pool via NFS, the reads can held back (basically paused).
  An example is a RAID10 pool of 6 disks, whereby a directory of files
  including some large 100+MB in size being written can cause other
  clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
  I've gotten used to this behavior over NFS, but didn't see it perform
  as such when on the server itself doing similar actions.
 
  To improve upon the situation, I thought perhaps I could dedicate a
  log device outside the pool, in the hopes that while heavy writes went
  to the log device, reads would merrily be allowed to coexist from the
  pool itself. My test case isn't ideal per se, but I added a local 9GB
  SCSI (80) drive for a log, and added to LUNs for the pool itself.
  You'll see from the below that while the log device is pegged at
  15MB/sec (sd5),  my directory list request on devices sd15 and sd16
  never are answered. I tried this with both no-cache-flush enabled and
  off, with negligible difference. Is there anyway to force a better
  balance of reads/writes during heavy writes?
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  fd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd1   0.00.00.00.0  0.0  0.00.0   0   0
  sd2   0.00.00.00.0  0.0  0.00.0   0   0
  sd3   0.00.00.00.0  0.0  0.00.0   0   0
  sd4   0.00.00.00.0  0.0  0.00.0   0   0
  sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
  sd6   0.00.00.00.0  0.0  0.00.0   0   0
  sd7   0.00.00.00.0  0.0  0.00.0   0   0
  sd8   0.00.00.00.0  0.0  0.00.0   0   0
  sd9   0.00.00.00.0  0.0  0.00.0   0   0
  sd10  0.00.00.00.0  0.0  0.00.0   0   0
  sd11  0.00.00.00.0  0.0  0.00.0   0   0
  sd12  0.00.00.00.0  0.0  0.00.0   0   0
  sd13  0.00.00.00.0  0.0  0.00.0   0   0
  sd14  0.00.00.00.0  0.0  0.00.0   0   0
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
 ...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:17 PM, Joe Little [EMAIL PROTECTED] wrote:
 On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 

Roch wrote this before (thus my interest in the log or NVRAM like solution):


There are 2 independant things at play here.

a) NFS sync semantics conspire againts single thread performance with
any backend filesystem.
 However NVRAM normally offers some releaf of the issue.

b) ZFS sync semantics along with the Storage Software + imprecise
protocol in between, conspire againts ZFS performance
of some workloads on NVRAM backed storage. NFS being one of the
affected workloads.

The conjunction of the 2 causes worst than expected NFS perfomance
over ZFS backend running __on NVRAM back storage__.
If you are not considering NVRAM storage, then I know of no ZFS/NFS
specific problems.

Issue b) is being delt with, by both Solaris and Storage Vendors (we
need a refined protocol);

Issue a) is not related to ZFS and rather fundamental NFS issue.
Maybe future NFS protocol will help.


Net net; if one finds a way to 'disable cache flushing' on the
storage side, then one reaches the state
we'll be, out of the box, when b) is implemented by Solaris _and_
Storage vendor. At that point,  ZFS becomes a fine NFS
server not only on JBOD as it is today , both also on NVRAM backed
storage.

It's complex enough, I thougt it was worth repeating.




 I agree completely. The log (even though slow) was an attempt to
 isolate writes away from the pool. I guess the question is how to
 provide for async access for NFS. We may have 16, 32 or whatever
 threads, but if a single writer keeps the ZIL pegged and prohibiting
 reads, its all for nought. Is there anyway to tune/configure the
 ZFS/NFS combination to balance reads/writes to not starve one for the
 other. Its either feast or famine or so tests have shown.


  Neil.
 
 
  Joe Little wrote:
   I have historically noticed that in ZFS, when ever there is a heavy
   writer to a pool via NFS, the reads can held back (basically paused).
   An example is a RAID10 pool of 6 disks, whereby a directory of files
   including some large 100+MB in size being written can cause other
   clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
   I've gotten used to this behavior over NFS, but didn't see it perform
   as such when on the server itself doing similar actions.
  
   To improve upon the situation, I thought perhaps I could dedicate a
   log device outside the pool, in the hopes that while heavy writes went
   to the log device, reads would merrily be allowed to coexist from the
   pool itself. My test case isn't ideal per se, but I added a local 9GB
   SCSI (80) drive for a log, and added to LUNs for the pool itself.
   You'll see from the below that while the log device is pegged at
   15MB/sec (sd5),  my directory list request on devices sd15 and sd16
   never are answered. I tried this with both no-cache-flush enabled and
   off, with negligible difference. Is there anyway to force a better
   balance of reads/writes during heavy writes?
  
extended device statistics
   devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
   fd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd1   0.00.00.00.0  0.0  0.00.0   0   0
   sd2   0.00.00.00.0  0.0  0.00.0   0   0
   sd3   0.00.00.00.0  0.0  0.00.0   0   0
   sd4   0.00.00.00.0  0.0  0.00.0   0   0
   sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
   sd6   0.00.00.00.0  0.0  0.00.0   0   0
   sd7   0.00.00.00.0  0.0  0.00.0   0   0
   sd8   0.00.00.00.0  0.0  0.00.0   0   0
   sd9   0.00.00.00.0  0.0  0.00.0   0   0
   sd10  0.00.00.00.0  0.0  0.00.0   0   0
   sd11  0.00.00.00.0  0.0  0.00.0   0   0
   sd12  0.00.00.00.0  0.0  0.00.0   0   0
   sd13  0.00.00.00.0  0.0  0.00.0   0   0
   sd14  0.00.00.00.0  0.0  0.00.0   0   0
   sd15  0.00.00.00.0  0.0  0.00.0   0   0
   sd16  0.00.00.00.0  0.0  0.00.0   0   0
  ...
 

___
zfs-discuss