[zfs-discuss] How to do DIRECT IO on ZFS ?

2006-12-12 Thread dudekula mastan
Hi All,
   
  We have directio() system to do DIRECT IO on UFS file system. Can any one 
know how to do DIRECT IO on ZFS file system.
   
  Regards
  Masthan

 
-
Everyone is raving about the all-new Yahoo! Mail beta.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Doubt on solaris 10 installation ..

2006-12-12 Thread Zoram Thanga
[EMAIL PROTECTED] looks like the more appropriate list to 
post questions like yours.


dudekula mastan wrote:

Hi Everybody,
   
  I have some problems in solaris 10 installation. 
   
  After installing the first CD ,  I removed the CD from CDrom , after that the machine is getting rebooting again and again. It is not asking second CD to install.
   
  If you have any idea. Please tell me.
   
  Thanks  Regards

  Masthan


--
Zoram Thanga::Sun Cluster Development::http://blogs.sun.com/zoram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Need Clarification on ZFS quota property.

2006-12-12 Thread dudekula mastan

Hi All,
   
  Assume the device c0t0d0  size is 10 KB.
   
  I created ZFS file system on this
   
  $ zpool create -f mypool c0t0d0s2
   
  and to limit the size of ZFS file system I used quota property.
   
  $ zfs set quota = 5000K mypool
   
  Which 5000 K bytes are belongs (or reserved) to mypool first 5000KB or last 
5000KB or random ?
   
  UFS and VxFS file systems have options to limit the size of file system on 
the device (E.g. We can limit the size offrom 1 block to some nth block . Like 
this is there any sub command to limit the size of ZFS file system from 1 block 
to  some n th block ?
   
  Your help is appreciated.
   
  Thanks  Regards
  Masthan
   
   

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Clarification on ZFS quota property.

2006-12-12 Thread Tomas Ögren
On 12 December, 2006 - dudekula mastan sent me these 2,7K bytes:

 
 Hi All,

   Assume the device c0t0d0  size is 10 KB.

   I created ZFS file system on this

   $ zpool create -f mypool c0t0d0s2

   and to limit the size of ZFS file system I used quota property.

   $ zfs set quota = 5000K mypool

   Which 5000 K bytes are belongs (or reserved) to mypool first 5000KB or last 
 5000KB or random ?

random.. When you've stored 5000K, you can't store anymore there.

   UFS and VxFS file systems have options to limit the size of file
   system on the device (E.g. We can limit the size offrom 1 block to
   some nth block . Like this is there any sub command to limit the
   size of ZFS file system from 1 block to  some n th block ?

Just amount, not specific positions on/portions of the FS/devices.

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Uber block corruption?

2006-12-12 Thread Robert Milkowski
Hello Casper,

Tuesday, December 12, 2006, 10:54:27 AM, you wrote:

So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will
become corrupt through something that doesn't also make all the data
also corrupt or inaccessible.


CDSC So how does this work for data which is freed and overwritten; does
CDSC the system make sure that none of the data referenced by any of the
CDSC old ueberblocks is ever overwritten?

Why it should? If blocks are not used due to current UB I guess you
can safely assume they are free.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs exported a live filesystem

2006-12-12 Thread Darren J Moffat

Boyd Adamson wrote:


On 12/12/2006, at 8:48 AM, Richard Elling wrote:


Jim Hranicky wrote:

By mistake, I just exported my test filesystem while it was up
and being served via NFS, causing my tar over NFS to start
throwing stale file handle errors. Should I file this as a bug, or 
should I just not do that :-


Don't do that.  The same should happen if you umount a shared UFS
file system (or any other file system types).
 -- richard


Except that it doesn't:

# mount /dev/dsk/c1t1d0s0 /mnt
# share /mnt
# umount /mnt
umount: /mnt busy
# unshare /mnt
# umount /mnt


If you umount -f it will though!

I don't quite agree that unmounting a UFS filesystem that is exported 
over NFS is the same as running zpool export on the pool.  The 
equivalent to running umount on the UFS file system is running zfs 
umount on the ZFS file system in the pool.


Running zpool export on the pool is closer to removing (cleanly) the 
disks or metadevices that the ufs file system is stored on.


The system is working as designed, the NFS client did what it was 
supposed to do.  If you brought the pool back in again with zpool import 
things should have picked up where they left off.


Whats more you we probably running as root when you did that so you got 
what you asked for - there is only so much protection we can give 
without being annoying!  If you look at the RBAC profiles we currently 
ship for ZFS you will see that there are two distinct profiles, one for 
ZFS File System Management and one for ZFS Storage Management.  The 
reason they are separate is because they work at quite different layers 
in the system with different protections.


Now having said that I personally wouldn't have expected that zpool 
export should have worked as easily as that while there where shared 
filesystems.  I would have expected that exporting the pool should have 
attempted to unmount all the ZFS filesystems first - which would have 
failed without a -f flag because they were shared.


So IMO it is a bug or at least an RFE.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs exported a live filesystem

2006-12-12 Thread Jim Hranicky
For the record, this happened with a new filesystem. I didn't
muck about with an old filesystem while it was still mounted, 
I created a new one, mounted it and then accidentally exported
it.

  Except that it doesn't:
  
  # mount /dev/dsk/c1t1d0s0 /mnt
  # share /mnt
  # umount /mnt
  umount: /mnt busy
  # unshare /mnt
  # umount /mnt
 
 If you umount -f it will though!

Well, sure, but I was still surprised that it happened anyway.

 The system is working as designed, the NFS client did
 what it was  supposed to do.  If you brought the pool back in
 again with zpool import  things should have picked up where they left off.

Yep -- an import/shareall made the FS available again.

 Whats more you we probably running as root when you
 did that so you got  what you asked for - there is only so much protection
 we can give  without being annoying!  

Sure, but there are still safeguards in place even when running things
as root, such as requiring umount -f as above, or warning you
when running format on a disk with mounted partitions.

Since this appeared to be an operation that may warrant such a
safeguard I thought I'd check and see if this was to be expected or
if a safeguard should be put in.

Annoying isn't always bad :-

 Now having said that I personally wouldn't have
 expected that zpool  export should have worked as easily as that while
 there where shared  filesystems.  I would have expected that exporting
 the pool should have attempted to unmount all the ZFS filesystems first -
 which would have  failed without a -f flag because they were shared.
 
 So IMO it is a bug or at least an RFE.

Ok, where should I file an RFE?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Corruption

2006-12-12 Thread Bill Casale

Please reply directly to me. Seeing the message below.

Is it possible to determine exactly which file is corrupted?
I was thinking the OBJECT/RANGE info may be pointing to it
but I don't know how to equate that to a file.


# zpool status -v
  pool: u01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
u01 ONLINE   0 0 6
  c1t102d0  ONLINE   0 0 6

errors: The following persistent errors have been detected:

  DATASET  OBJECT   RANGE
  u01  4741362  600178688-600309760



Thanks,
Bill


--

   _/_/_/  _/_/  _/ _/Bill Casale - TSE
  _/  _/_/  _/_/   _/  OS Team
 _/_/_/  _/_/  _/  _/ _/  1 Network Drive
_/  _/_/  _/   _/_/ Burlington, MA. 01802
   _/_/_/   _/_/_/   _/ _/  

M  I  C  R  O  S  Y  S  T  E  M  S


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to do DIRECT IO on ZFS ?

2006-12-12 Thread Robert Milkowski




Hello dudekula,

Tuesday, December 12, 2006, 9:36:24 AM, you wrote:







Hi All,

We have directio() system to do DIRECT IO on UFS file system. Can any one know how to do DIRECT IO on ZFS file system.






Right now you can't.



--
Best regards,
Robert  mailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Corruption

2006-12-12 Thread George Wilson

Bill,

If you want to find the file associated with the corruption you could do 
a find /u01 -inum 4741362 or use the output of zdb -d u01 to 
find the object associated with that id.


Thanks,
George

Bill Casale wrote:

Please reply directly to me. Seeing the message below.

Is it possible to determine exactly which file is corrupted?
I was thinking the OBJECT/RANGE info may be pointing to it
but I don't know how to equate that to a file.


# zpool status -v
  pool: u01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
u01 ONLINE   0 0 6
  c1t102d0  ONLINE   0 0 6

errors: The following persistent errors have been detected:

  DATASET  OBJECT   RANGE
  u01  4741362  600178688-600309760



Thanks,
Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Usage in Warehousing (no more lengthy intro)

2006-12-12 Thread Robert Milkowski
Hello Jochen,

Sunday, December 10, 2006, 10:51:57 AM, you wrote:

JMK James,

 Just a thought.
 
 have you thought about giving thumper x4500's a trial
 for this work
 load? Oracle would seem to be IO limited in the end
 so  4 cores may be
 enough to keep oracle happy when linked with upto
 2GB/s disk IO speed.
JMK ===

JMK Actually yes, however I've doubts in regard to scalability
JMK of cpu power.  I'd imagine that a RaidZ setup will increase
JMK cpu usage of zfs, so Mirroring will be the way to go.
JMK I've also browsed some info on greenplum and other appliance
JMK vendors. However none are listed as strategic products for our
JMK company (forcing a lengthy assessment process), support/consulting
JMK in Germany is usually non-existent and a port of our current setup
JMK is difficult at best.
JMK I've asked Robert Milkowski (milek.blogspot.com) if he can provide
JMK me with some cpu figures from his throughput benchmarks.

It's not that bad with CPU usage.
For example with RAID-Z2 while doing scrub I get something like
800MB/s read from disks (550-600MB/s from zpool iostat perspective)
and all four cores are mostly consumed - I get something like 10% idle
on each cpu.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to do DIRECT IO on ZFS ?

2006-12-12 Thread Roch - PAE

Maybe this will help:
http://blogs.sun.com/roch/entry/zfs_and_directio

-r

dudekula mastan writes:
  Hi All,
 
We have directio() system to do DIRECT IO on UFS file system. Can
  any one know how to do DIRECT IO on ZFS file system. 
 
Regards
Masthan
  
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Uber block corruption?

2006-12-12 Thread Mark Maybee

[EMAIL PROTECTED] wrote:

Hello Casper,

Tuesday, December 12, 2006, 10:54:27 AM, you wrote:



So 'a' UB can become corrupt, but it is unlikely that 'all' UBs will
become corrupt through something that doesn't also make all the data
also corrupt or inaccessible.



CDSC So how does this work for data which is freed and overwritten; does
CDSC the system make sure that none of the data referenced by any of the
CDSC old ueberblocks is ever overwritten?

Why it should? If blocks are not used due to current UB I guess you
can safely assume they are free.




What if a newer UB is corrupted and you fall back to an older one?

Casper


A block freed in transaction group N cannot be reused until transaction
group N+3; so there is no possibility of referencing an overwritten
block unless you have to back off more than two uberblocks.  At this
point, blocks that have been overwritten will show up as corrupted (bad
checksums).

-Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Uber block corruption?

2006-12-12 Thread Toby Thain


On 12-Dec-06, at 9:46 AM, George Wilson wrote:

Also note that the UB is written to every vdev (4 per disk) so the  
chances of all UBs being corrupted is rather low.


Furthermore the time window where UBs are mutually inconsistent would  
be very short, since they'd be updated together?


--Toby



Thanks,
George

Darren Dunham wrote:
DD To reduce the chance of it affecting the integrety of the  
filesystem,
DD there are multiple copies of the UB written, each with a  
checksum and a
DD generation number.  When starting up a pool, the oldest  
generation copy
DD that checks properly will be used.  If the import can't find  
any valid
DD UB, then it's not going to have access to any data.  Think of  
a UFS

DD filesystem where all copies of the superblock are corrupt.

Actually the latest UB, not the oldest.

My *other* oldest...  yeah.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Netapp to Solaris/ZFS issues

2006-12-12 Thread Anton B. Rang
NetApp can actually grow their RAID groups, but they recommend adding an entire 
RAID group at once instead. If you add a disk to a RAID group on NetApp, I 
believe you need to manually start a reallocate process to balance data across 
the disks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Jim Hranicky
For my latest test I set up a stripe of two mirrors with one hot spare
like so:

zpool create -f -m /export/zmir zmir mirror c0t0d0 c3t2d0 mirror c3t3d0 c3t4d0 
spare c3t1d0

I spun down c3t2d0 and c3t4d0 simultaneously, and while the system kept 
running (my tar over NFS barely hiccuped), the zpool command hung again.

I rebooted the machine with -dnq, and although the system didn't come up
the first time, it did after a fsck and a second reboot. 

However, once again the hot spare isn't getting used:

# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
  the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Dec 12 09:15:49 2006
config:

  NAMESTATE READ WRITE CKSUM
  zmirDEGRADED 0 0 0
mirrorDEGRADED 0 0 0
  c0t0d0  ONLINE   0 0 0
  c3t2d0  UNAVAIL  0 0 0  cannot open
mirrorDEGRADED 0 0 0
  c3t3d0  ONLINE   0 0 0
  c3t4d0  UNAVAIL  0 0 0  cannot open
  spares
c3t1d0AVAIL

A few questions:

- I know I can attach it via the zpool commands, but is there a way to
kickstart the attachment process if it fails to attach automatically upon
disk failure?

- In this instance the spare is twice as big as the other
drives -- does that make a difference? 

- Is there something inherent to an old SCSI bus that causes spun-
down drives to hang the system in some way, even if it's just hanging
the zpool/zfs system calls? Would a thumper be more resilient to this?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Usage in Warehousing (lengthy intro)

2006-12-12 Thread Al Hopper
On Fri, 8 Dec 2006, Jochen M. Kaiser wrote:

 Dear all,

 we're currently looking forward to restructure our hardware environment for
 our datawarehousing product/suite/solution/whatever.

 We're currently running the database side on various SF V440's attached via
 dual FC to our SAN backend (EMC DMX3) with UFS. The storage system is
 (obviously in  a SAN) shared between many systems. Performance is mediocre
 in  terms of raw throughput at 70-150MB/sec. (lengthy, sequential reads due to
 full table scan  operations on the db side) and excellent is terms of I/O and
 service times (averaging at 1,7ms according to sar).
 From our applications perspective sequential read is the most important 
 factor.
 Read-to-Write ratio is almost 20:1.

 We now want to consolidate our database servers (Oracle, btw.) to a pair of
 x4600 systems running Solaris 10 (which we've already tested in a benchmark
 setup). The whole system was still I/O-bound, even though the backend (3510,
 12x146GB, QFS, RAID10) delivered a sustained data rate of 250-300MB/sec.

 I'd like to target a sequential read performance of 500++MB/sec while reading
 from the db on multiple tablespaces. We're experiencing massive data volume
 growth of about 100% per year and are therefore looking both for an 
 expandable,
 yet cheap solution. We'd like to use a DAS solution, because we had negative
 experiences with SAN in the past in terms of tuning and throughput.

 Being a friend of simplicity I was thinking about using a pair (or more) of 
 3320
 SCSI JBODs with multiple RAIDZ and/or RAID10 zfs disk pools on which we'd

Have you not heard that SCSI is dead?  :)

But seriously, the big issue with SCSI, is that the SCSI commands are sent
over the SCSI bus at the original (legacy) rate of 5 Mbits/Sec in 8-bit
mode.  And since it takes an average of 5 SCSI commands to do something
useful, you can't send enough commands over the bus to busy out a modern
SCSI drive.  Even a single drive on a single SCSI bus.  Also, it takes a
lot of time to send those commands - so you have latency.  And everyone
understands how latency affects throughput on a LAN (or WAN) .. same issue
with SCSI.  This is the main reason why SCSI is EOL and could not be
extended without breaking the existing standards.

While I understand you don't want to build a SAN, an alternative would be
a Fibre Channel (FC) box that presents SATA drives.  This would be a DAS
solution with one or two connections to (Qlogic) FC controllers in the
host - IOW not a SAN and there is no FC switch required.  Many such boxes
are designed to provide expansion to a FC based hardware RAID box.  For
example, the DS4000 EXP100 Storage Expansion Unit from IBM.  In your
application you'd need to find something that supports FC rates of
4Gb/Sec, if possible.

Another possiblity, which is on my todo list to checkout, is:

http://www.norcotek.com/item_detail.php?categoryid=8modelno=DS-1220

Now if I could find a Marvell based equivalent to the:
http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm with
external SATA ports, life would be great.  Another card with external SATA
ports that works with Solaris (via the si3124 driver) is:
http://www.newegg.com/product/product.asp?item=N82E16816124003 which only
has a 32-bit PCI connection. :(

 place the database. If we need more space we'll simply connect yet another
 JBOD. I'd calculate 1-2 PCIe U320 controllers (w/o raid) per jbod, starting 
 with a
 minimum of 4 controllers per server.

 Regarding ZFS I'd be very interested to know, whether someone else is running
 a similar setup and can provide me with some hints or point me at some 
 caveats.

 I'd be also very interested in the cpu usage of such a setup for the zfs raidz
 pools. After searching this forum I found the rule of thumb that 200MB/sec
 throughput roughly consume one 2GHz Opteron cpu, but am hoping that someone
 can provide me with some in depth data. (Frankly I can hardly imagine that 
 this
 holds true for reads).

 I'd be also be interested in you opinion on my targeted setup, so if you have
 any comments - go ahead.

 Any help is appreciated,

 Jochen

 P.S. Fallback scenarios would be Oracle with ASM or a (zfs/ufs) SAN setup.


Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Corruption

2006-12-12 Thread eric kustarz

Bill Casale wrote:

Please reply directly to me. Seeing the message below.

Is it possible to determine exactly which file is corrupted?
I was thinking the OBJECT/RANGE info may be pointing to it
but I don't know how to equate that to a file.


This is bug:
6410433 'zpool status -v' would be more useful with filenames

and i'm actually working on it right now!

eric




# zpool status -v
  pool: u01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
u01 ONLINE   0 0 6
  c1t102d0  ONLINE   0 0 6

errors: The following persistent errors have been detected:

  DATASET  OBJECT   RANGE
  u01  4741362  600178688-600309760



Thanks,
Bill




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs exported a live filesystem

2006-12-12 Thread Darren J Moffat

Jim Hranicky wrote:

Now having said that I personally wouldn't have
expected that zpool  export should have worked as easily as that while
there where shared  filesystems.  I would have expected that exporting
the pool should have attempted to unmount all the ZFS filesystems first -
which would have  failed without a -f flag because they were shared.

So IMO it is a bug or at least an RFE.


Ok, where should I file an RFE?


http://bugs.opensolaris.org/


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Usage in Warehousing (lengthy intro)

2006-12-12 Thread Stuart Glenn


On Dec 12, 2006, at 10:02, Al Hopper wrote:



Another possiblity, which is on my todo list to checkout, is:

http://www.norcotek.com/item_detail.php?categoryid=8modelno=DS-1220


I would not go with this device. I picked up one along with 12 500GB  
SATA drives with the hopes of making a dumping ground on the network  
for my servers to rsync to.


Now I might have it all kinds of not configured or tuned correctly in  
terms of solaris  zfs (which if I do I can't fiure out), but  
performance is terrible compared to my existing dumping ground based  
on a cheap-o raid-5 card  freebsd

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sol10u3 -- is du bug fixed?

2006-12-12 Thread Jeb Campbell
I updated to Sol10u3 last night, and I'm still seeing different differences 
between du -h and ls -h.

du seems to take into account raidz and compression -- if this is correct, 
please let me know.

It makes sense that du reports actual disk usage, but this makes some scripts 
I wrote very broken (need real sizes of files in a directory to be able to put 
them on dvd isos).

Sol10u3 on 3 disk RaidZ:
[EMAIL PROTECTED]:~/burnout/2006-11-30]$ ls -lh JMS-data-1-2006-11-30.iso
-rw-r--r-- 1 splus splus 3.5G Dec  1 10:15 JMS-data-1-2006-11-30.iso
[EMAIL PROTECTED]:~/burnout/2006-11-30]$ du -hs JMS-data-1-2006-11-30.iso
5.2GJMS-data-1-2006-11-30.iso

Thanks,
Jeb
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Uber block corruption?

2006-12-12 Thread Robert Milkowski
Hello Toby,

Tuesday, December 12, 2006, 4:18:54 PM, you wrote:

TT On 12-Dec-06, at 9:46 AM, George Wilson wrote:

 Also note that the UB is written to every vdev (4 per disk) so the  
 chances of all UBs being corrupted is rather low.

It depends actually - if all your vdevs are on the same array with
write back cache set to on you actually can end-up with all UB
corrupted - at least in theory.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Jeb Campbell
 After upgrade you did actually re-create your raid-z
 pool, right?

No, but I did zpool upgrade -a.

Hmm, I guess I'll try re-writing the data first.  I know you have to do that if 
you change compression options.

Ok -- rewriting the data doesn't work ...

I'll create a new temp pool and see what that does ... then I'll investigate 
options for recreating my big pool ...

Thanks for the info,

Jeb
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: zpool import takes to long with large numbers of file systems

2006-12-12 Thread Robert Milkowski
Hello Jason,

Thursday, December 7, 2006, 11:18:17 PM, you wrote:

JJWW Hi Luke,

JJWW That's terrific!

JJWW You know you might be able to tell ZFS which disks to look at. I'm not
JJWW sure. It would be interesting, if anyone with a Thumper could comment
JJWW on whether or not they see the import time issue. What are your load
JJWW times now with MPXIO?

On x4500 importing a pool made of 44 disks takes about 13 seconds.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS

2006-12-12 Thread Matthew C Aycock
We are currently working on a plan to upgrade our HA-NFS cluster that uses 
HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is there a 
known procedure or best practice for this? I have enough free disk space to 
recreate all the filesystems and copy the data if necessary, but would like to 
avoid copying if possible.

Also, I am considering what type of zpools to create. I have a SAN with T3Bs 
and SE3511s. Since neither of these can work as a JBOD (at lesat that is what I 
remember) I guess I am going  to have to add in the LUNS in a mirrored zpool of 
the Raid-5 Luns?

We are at the extreme start of this project and I was hoping for some guidance 
as to what direction to start.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance problems during 'destroy' (and bizzare Zone problem as well)

2006-12-12 Thread Anantha N. Srirama
[b]Setting:[/b]
  We've operating in the following setup for well over 60 days.

 - E2900 (24 x 92)
 - 2 2Gbps FC to EMC SAN
 - Solaris 10 Update 2 (06/06)
 - ZFS with compression turned on
 - Global zone + 1 local zone (sparse)
 - Local zone is fed ZFS clones from the global Zone

[b]Daily Routine[/b]
 - Shutdown local Zone
 - Recreate ZFS clones
 - Restart local Zone
 - End to end timing for this refresh is anywhere between 5 to 30 minutes. Bulk 
of the time is spent in the ZFS 'destroy' phase.

[b]Problem[/b]
 - We had extensive read/write activity in the global and local Zones 
yesterday. I estimate that we wrote 1/4 of one large ZFS filesystem, ~ 160GB of 
write.
 - This morning we had a fair amount of activity on the system when the refresh 
started, zpool was reporting around 150MB/S of write.
 - Our 'zfs destroy' commands took what I considere 'normal', the FS that was 
fielding the bulk of the I/O took 15 minutes. During this time everything was 
crawling or more accurately come to a dead stop. A simple 'rm' would hang. I've 
reported this problem to the forum in the past. I also believe the fix for the 
problem is in Update 3 for Solaris 10, right?
 -[b]Surprisingly today the ZFS 'snapshot  clone' took an inordinate amount of 
time. I observed each snapshot  clone activity together took 10+ minutes. In 
the past the same activity has taken no more than a few seconds even during 
busy times. The total end-to-end timing for all snapshots/clones was a whopping 
1:44:00!!![/b]
 - Even more surprising was that local Zone refused to startup (zoneadm -z 
bluenile boot) with no error messages.
 - I was able to start the Zone only after an hour or so after the completion 
of the ZFS commands.

[b]Questions:[/b]
 - Why is the destroy phase taking so long?
 - What can explain the unduly long snapshot/clone times
 - Why didn't the Zone startup?
 - More surprisingly why did the Zone startup after an hour?

Thanks in advance.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Robert Milkowski
Hello Jeb,

Tuesday, December 12, 2006, 7:11:30 PM, you wrote:

 After upgrade you did actually re-create your raid-z
 pool, right?

JC No, but I did zpool upgrade -a.

JC Hmm, I guess I'll try re-writing the data first.  I know you have
JC to do that if you change compression options.

IIRC you have to re-create entire raid-z pool to get it fixed - just
rewriting data or upgrading a pool won't do it.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Eric Schrock
On Tue, Dec 12, 2006 at 07:53:32AM -0800, Jim Hranicky wrote:
 
 - I know I can attach it via the zpool commands, but is there a way to
 kickstart the attachment process if it fails to attach automatically upon
 disk failure?

Yep.  Just do a 'zpool replace zmir target spare'.  This is what the
FMA agent does in response to failed drive faults.

 - In this instance the spare is twice as big as the other
 drives -- does that make a difference? 

Nope.  The 'size' of a replacing vdev is the minimum size of its two
children, so it won't affect anything.

 - Is there something inherent to an old SCSI bus that causes spun-
 down drives to hang the system in some way, even if it's just hanging
 the zpool/zfs system calls? Would a thumper be more resilient to this?

There are a number of drive failure modes that result in arbitrarily
misbehaving drives, as opposed to drives which fail to open entirely.
We are working on a more complete FMA diagnosis engine which will be
able to diagnose this type of failure and proactively fault the device.

I'm not sure exactly what behavior you're seeing by 'spun-down drives',
so this may or may not address your issue.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Matthew Ahrens

Jeb Campbell wrote:

After upgrade you did actually re-create your raid-z
pool, right?


No, but I did zpool upgrade -a.

Hmm, I guess I'll try re-writing the data first.  I know you have to do that if 
you change compression options.

Ok -- rewriting the data doesn't work ...

I'll create a new temp pool and see what that does ... then I'll investigate 
options for recreating my big pool ...


Unfortunately, this bug is only fixed when you create the pool on the 
new bits.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread James F. Hranicky
Eric Schrock wrote:
 On Tue, Dec 12, 2006 at 07:53:32AM -0800, Jim Hranicky wrote:
 - I know I can attach it via the zpool commands, but is there a way to
 kickstart the attachment process if it fails to attach automatically upon
 disk failure?
 
 Yep.  Just do a 'zpool replace zmir target spare'.  This is what the
 FMA agent does in response to failed drive faults.

Sure, but that's what I want to avoid. The FMA agent should do this by
itself, but it's not, so I guess I'm just wondering why, or if there's
a good way to get to do so. If this happens in the middle of the night I
don't want to have to run the commands by hand.

 - Is there something inherent to an old SCSI bus that causes spun-
 down drives to hang the system in some way, even if it's just hanging
 the zpool/zfs system calls? Would a thumper be more resilient to this?
 
 There are a number of drive failure modes that result in arbitrarily
 misbehaving drives, as opposed to drives which fail to open entirely.
 We are working on a more complete FMA diagnosis engine which will be
 able to diagnose this type of failure and proactively fault the device.
 
 I'm not sure exactly what behavior you're seeing by 'spun-down drives',
 so this may or may not address your issue.

For instance, the zpool command hanging or the system hanging trying to
reboot normally.

Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Netapp to Solaris/ZFS issues

2006-12-12 Thread James F. Hranicky
Jim Davis wrote:

 Have you tried using the automounter as suggested by the linux faq?:
 http://nfs.sourceforge.net/#section_b
 
 Yes.  On our undergrad timesharing system (~1300 logins) we actually hit
 that limit with a standard automounting scheme.  So now we make static
 mounts of the Netapp /home space and then use amd to make symlinks to
 the home directories.  Ugly, but it works.

This is how we've always done it, but we use amd (am-utils) to manage two
maps, a filesystem map and a homes map. The homes map is of all type:=link,
so amd handles the link creation for us, plus we only have a handful of
mounts on any system.

It looks like if each user has a ZFS quota-ed home directory which acts as
its own little filesystem, we won't be able to do this anymore, as we'll have
to export and mount each user directory separately. Is this the case, or is
there a way to export and mount a volume containing zfs quota-ed directories,
i.e., have the quota-ed subdirs not necessarily act like they're separate
filesystems?

Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Eric Schrock
On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote:
 
 Sure, but that's what I want to avoid. The FMA agent should do this by
 itself, but it's not, so I guess I'm just wondering why, or if there's
 a good way to get to do so. If this happens in the middle of the night I
 don't want to have to run the commands by hand.

Yes, the FMA agent should do this.  Can you run 'fmdump -v' and see if
the DE correctly identified the faulted devices?

 For instance, the zpool command hanging or the system hanging trying to
 reboot normally.

If the SCSI commands hang forever, then there is nothing that ZFS can
do, as a single write will never return.  The more likely case is that
the commands are continually timining out with very long response times,
and ZFS will continue to talk to them forever.  The future FMA
integration I mentioned will solve this problem.  In the meantime, you
should be able to 'zpool offline' the affected devices by hand.

There is also associated work going on to better handle asynchrounous
reponse times across devices.  Currently, a single slow device will slow
the entire pool to a crawl.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread James F. Hranicky
Eric Schrock wrote:
 On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote:
 Sure, but that's what I want to avoid. The FMA agent should do this by
 itself, but it's not, so I guess I'm just wondering why, or if there's
 a good way to get to do so. If this happens in the middle of the night I
 don't want to have to run the commands by hand.
 
 Yes, the FMA agent should do this.  Can you run 'fmdump -v' and see if
 the DE correctly identified the faulted devices?

Here you go:

# fmdump -v
TIME UUID SUNW-MSG-ID
Nov 29 16:29:12.1947 e50198f2-2eb9-c58b-d7c5-87aaae5cb935 ZFS-8000-D3
  100%  fault.fs.zfs.device

Problem in: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c
   Affects: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c
   FRU: -

Nov 30 10:31:48.8844 1a44a780-05c0-cb6e-d44f-f1d8999f40e5 ZFS-8000-D3
  100%  fault.fs.zfs.device

Problem in: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54
   Affects: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54
   FRU: -

Dec 11 14:04:57.8803 c46d21e0-200d-43a1-e5db-ae9c9ebf3482 ZFS-8000-D3
  100%  fault.fs.zfs.device

Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15
   Affects: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15
   FRU: -

Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3
  100%  fault.fs.zfs.device

Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
   Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
   FRU: -

I'm not really sure what it means.

 For instance, the zpool command hanging or the system hanging trying to
 reboot normally.
 
 If the SCSI commands hang forever, then there is nothing that ZFS can
 do, as a single write will never return.  The more likely case is that
 the commands are continually timining out with very long response times,
 and ZFS will continue to talk to them forever.  The future FMA
 integration I mentioned will solve this problem.  In the meantime, you
 should be able to 'zpool offline' the affected devices by hand.

Well, as long as I know which device is affected :- If zpool status
doesn't return it may be difficult to figure out.

Do you know if the SATA controllers in a Thumper can better handle this
problem?

 There is also associated work going on to better handle asynchrounous
 reponse times across devices.  Currently, a single slow device will slow
 the entire pool to a crawl.

Do you have an idea as to when this might be available?

Thanks for all your input,
Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS

2006-12-12 Thread Richard Elling

Matthew C Aycock wrote:
We are currently working on a plan to upgrade our HA-NFS cluster that 
uses HA-StoragePlus and VxVM 3.2 on Solaris 9 to Solaris 10 and ZFS. Is 
there a known procedure or best practice for this? I have enough free disk 
space to recreate all the filesystems and copy the data if necessary, but 
would like to avoid copying if possible.


You will need to copy the data from the old file system into ZFS.

Also, I am considering what type of zpools to create. I have a SAN with 
T3Bs and SE3511s. Since neither of these can work as a JBOD (at lesat that 
is what I remember) I guess I am going  to have to add in the LUNS in a 
mirrored zpool of the Raid-5 Luns?


Lacking other information, particularly performance requirements, what you
suggest is a good strategy: ZFS mirrors of RAID-5 LUNs.

We are at the extreme start of this project and I was hoping for some 
guidance as to what direction to start.


By all means, read the Sun Cluster Concepts Guide first.  It will answer
many questions that may arise as you go through the design.  Note version
3.2 which is required for ZFS has updates to the concepts guide regarding
the use of ZFS, available RSN.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage Pool advice

2006-12-12 Thread Richard Elling

Kory Wheatley wrote:
This question is concerning ZFS.  We have a Sun Fire V890 attached to a EMC disk array.  
Here's are plan to incorporate ZFS: 
On our EMC storage array we will create 3 LUNS.  Now how would ZFS be used for the 
best performance?


What I'm trying to ask is if you have 3 LUNS and you want to create a ZFS storage pool, 
would it be better to have a storage pool per LUN or combine the 3 LUNS as one big disks 
under ZFS and create 1 huge ZFS storage pool.


One huge zpool.  Remember, the pool can contain many file systems, but the
reverse is not true.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Eric Schrock
On Tue, Dec 12, 2006 at 02:38:22PM -0500, James F. Hranicky wrote:
 
 Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3
   100%  fault.fs.zfs.device
 
 Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
FRU: -
 
 I'm not really sure what it means.

Hmmm, it means that we correctly noticed that the device had failed, but
for whatever reason the ZFS FMA agent didn't correctly replace the
drive.  I am cleaning up the hot spare behavior as we speak so I will
try to reproduce this.

 Well, as long as I know which device is affected :- If zpool status
 doesn't return it may be difficult to figure out.
 
 Do you know if the SATA controllers in a Thumper can better handle this
 problem?

I will be starting a variety of experiments in this vein in the near
future.  Others may be able to describe their experiences so far.  How
exactly did you 'spin down' the drives in question?  Is there a
particular failure mode you're interested in?

 Do you have an idea as to when this might be available?

It will be a while before the complete functionality is finished.  I
have begun the work, but there are several distinct phases.  First, I
am cleaning up the existing hot spare behavior.  Second, I'm adding
proper hotplug support to ZFS so that it detects device removal without
freaking out and correctly resilvers/replaces drives when they are
plugged back in.  Finally, I'll be adding a ZFS diagnosis engine to both
analyze ZFS faults as well as consume SMART data to predict disk failure
and proactively offline devices.  I would estimate that it will be a few
months before I get all of this into Nevada.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread James F. Hranicky
Eric Schrock wrote:

 Hmmm, it means that we correctly noticed that the device had failed, but
 for whatever reason the ZFS FMA agent didn't correctly replace the
 drive.  I am cleaning up the hot spare behavior as we speak so I will
 try to reproduce this.

Ok, great.

 Well, as long as I know which device is affected :- If zpool status
 doesn't return it may be difficult to figure out.

 Do you know if the SATA controllers in a Thumper can better handle this
 problem?
 
 I will be starting a variety of experiments in this vein in the near
 future.  Others may be able to describe their experiences so far.  How
 exactly did you 'spin down' the drives in question?  Is there a
 particular failure mode you're interested in?

The Andataco cabinet has a button for each disk slot that if you
hold down will spin the drive down so you can pull it out.

I'm interested in any failure mode that might happen to my server :-
Basically, we're very interested in building a nice ZFS server box
that will house a good chunk of our data, be it homes, research or
whatever. I just have to know the server is as bulletproof as
possible, that's why I'm doing the stress tests.

 Do you have an idea as to when this might be available?
 
 It will be a while before the complete functionality is finished.  I
 have begun the work, but there are several distinct phases.  First, I
 am cleaning up the existing hot spare behavior.  Second, I'm adding
 proper hotplug support to ZFS so that it detects device removal without
 freaking out and correctly resilvers/replaces drives when they are
 plugged back in.  Finally, I'll be adding a ZFS diagnosis engine to both
 analyze ZFS faults as well as consume SMART data to predict disk failure
 and proactively offline devices.  I would estimate that it will be a few
 months before I get all of this into Nevada.

Ok, thanks.

Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Anton B. Rang
Are you looking purely for performance, or for the added reliability that ZFS 
can give you?

If the latter, then you would want to configure across multiple LUNs in either 
a mirrored or RAID configuration. This does require sacrificing some storage in 
exchange for the peace of mind that any “silent data corruption” in the array 
or storage fabric will be not only detected but repaired by ZFS.

From a performance point of view, what will work best depends greatly on your 
application I/O pattern, how you would map the application’s data to the 
available ZFS pools if you had more than one, how many channels are used to 
attach the disk array, etc.  A single pool can be a good choice from an 
ease-of-use perspective, but multiple pools may perform better under certain 
types of load (for instance, there’s one intent log per pool, so if the intent 
log writes become a bottleneck then multiple pools can help). This also 
depends on how the LUNs are configured within the EMC array

If you can put together a test system, and run your application as a benchmark, 
you can get an answer. Without that, I don’t think anyone can predict which 
will work best in your particular situation.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Anton B. Rang
Is there an easy way to determine whether a pool has this fix applied or not?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Neil Perrin

Are you looking purely for performance, or for the added reliability that ZFS 
can give you?

If the latter, then you would want to configure across multiple LUNs in either 
a mirrored or RAID configuration. This does require sacrificing some storage in 
exchange for the peace of mind that any “silent data corruption” in the array 
or storage fabric will be not only detected but repaired by ZFS.


From a performance point of view, what will work best depends greatly on your 
application I/O pattern, how you would map the application’s data to the 
available ZFS pools if you had more than one, how many channels are used to 
attach the disk array, etc.  A single pool can be a good choice from an 
ease-of-use perspective, but multiple pools may perform better under certain 
types of load (for instance, there’s one intent log per pool, so if the intent 
log writes become a bottleneck then multiple pools can help).


Bad example, as there's actually one intent log per file system!


This also depends on how the LUNs are configured within the EMC array

If you can put together a test system, and run your application as a benchmark, 
you can get an answer. Without that, I don’t think anyone can predict which 
will work best in your particular situation.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Netapp to Solaris/ZFS issues

2006-12-12 Thread Darren Dunham
 NetApp can actually grow their RAID groups, but they recommend adding
 an entire RAID group at once instead. If you add a disk to a RAID
 group on NetApp, I believe you need to manually start a reallocate
 process to balance data across the disks.

There's no reallocation process that I'm aware of.  Obviously adding a
single column to a pretty full volume prevents you from doing the most
optimal (full-stripe) writes.  But since the existing parity disk covers
the new column, you do have full availability of the new space.  That's
a different story with raidz.

Hopefully you don't wait until the raid group is full before adding
disks, and the blocks sort themselves out over time.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and write caching (SATA)

2006-12-12 Thread Peter Schuller
Hello,

my understanding is that ZFS is specifically designed to work with write 
caching, by instructing drives to flush their caches when a write barrier is 
needed. And in fact, even turns write caching on explicitly on managed 
devices.

My question is of a practical nature: will this *actually* be safe on the 
average consumer grade SATA drive? I have seen offhand references to PATA 
drives generally not being trustworthy when it comes to this (SCSI therefore 
being recommended), but I have not been able to find information on the 
status of typical SATA drives.

While I do intend to perform actual powerloss tests, it would be interesting 
to hear from anybody whether it is generally expected to be safe.

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Robert Milkowski
Hello Anton,

Tuesday, December 12, 2006, 9:36:41 PM, you wrote:

ABR Is there an easy way to determine whether a pool has this fix applied or 
not?

Yep.

Just do 'df -h' and see what is a reported size of a pool. It should
be something like N-1 times disk size for each raid-z group. If it is
N times disk size then it was created before fix.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SunCluster HA-NFS from Sol9/VxVM to Sol10u3/ZFS

2006-12-12 Thread Torrey McMahon

Robert Milkowski wrote:

Hello Matthew,


MCA Also, I am considering what type of zpools to create. I have a
MCA SAN with T3Bs and SE3511s. Since neither of these can work as a
MCA JBOD (at lesat that is what I remember) I guess I am going  to
MCA have to add in the LUNS in a mirrored zpool of the Raid-5 Luns?

1. those boxes can work a JBODs but not in a clustered environment.



Actually, those boxes can't act as JBODs. They only present LUNs created 
from the drives in the enclosures.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage Pool advice

2006-12-12 Thread Jason J. W. Williams

Hi Kory,

It depends on the capabilities of your array in our experience...and
also the zpool type. If you're going to do RAID-Z in a write intensive
environment you're going to have a lot more I/Os with three LUNs then
a single large LUN. Your controller may go nutty.

Also, (Richard can address this better than I) you may want to disable
the ZIL or have your array ignore the write cache flushes that ZFS
issues.

Best Regards,
Jason

On 12/12/06, Kory Wheatley [EMAIL PROTECTED] wrote:

This question is concerning ZFS.  We have a Sun Fire V890 attached to a EMC 
disk array.  Here's are plan to incorporate ZFS:
On our EMC storage array we will create 3 LUNS.  Now how would ZFS be used for 
the best performance?

What I'm trying to ask is if you have 3 LUNS and you want to create a ZFS 
storage pool, would it be better to have a storage pool per LUN or combine the 
3 LUNS as one big disks under ZFS and create 1 huge ZFS storage pool.

Example:
LUN1 200gb  ZFS Storage Pool pooldata1
LUN2 200gb  ZFS Storage Pool pooldata2
LUN3 200gb  ZFS Storage Pool pooldata3

or

LUN 600gb  ZFS Storage Pool alldata


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and write caching (SATA)

2006-12-12 Thread Peter Schuller
 PS While I do intend to perform actual powerloss tests, it would be
 interesting PS to hear from anybody whether it is generally expected to be
 safe.

 Well is disks honors cache flush commands then it should be reliable
 wether it's SATA or SCSI disk.

Yes. Sorry, I could have stated my question clear:er. What I am specifically 
concerned about is exactly that - whether your typical SATA drive *will* 
honor cache flush commands, as I understand a lot of PATA drives did/do not.

Googling tends to give very little concrete information on this since very few 
people actually seem to care about this. Since I wanted to confirm my 
understanding of ZFS semantics w.r.t. write caching anyway I thought I might 
aswell also ask about the general tendency among drives since, if anywhere, 
people here might know.

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problems during 'destroy' (and bizzare Zone problem as well)

2006-12-12 Thread Matthew Ahrens

Anantha N. Srirama wrote:

 - Why is the destroy phase taking so long?


Destroying clones will be much faster with build 53 or later (or the 
unreleased s10u4 or later) -- see bug 6484044.



 - What can explain the unduly long snapshot/clone times
 - Why didn't the Zone startup?
 - More surprisingly why did the Zone startup after an hour?


Perhaps there was so much activity on the system that we couldn't push 
out transaction groups in the usual  5 seconds.  'zfs snapshot' and 
'zfs clone' take at least 1 transaction group to complete, so this could 
explain it.  We've seen this problem as well and are working on a fix...


--mat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Kory Wheatley
Were looking for pure performance.

What will be contained in the LUNS is Student User account files that they will 
access and Department Share files like, MS word documents, excel files, PDF.  
There will be no applications on the ZFS Storage pools or pool   Does this help 
on what strategy might be best?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Kory Wheatley
Also there will be no NFS services on this system.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Monitoring ZFS

2006-12-12 Thread Tom Duell
Group,

We are running a benchmark with 4000 users
simulating a hospital management system
running on Solaris 10 6/06 on USIV+ based
SunFire 6900 with 6540 storage array.

Are there any tools for measuring internal
ZFS activity to help us understand what is going
on during slowdowns?

We have 192GB of RAM and while ZFS runs
well most of the time, there are times where
the system time jumps up to 25-40%
as measured by vmstat and iostat.  These
times coincide with slowdowns in file access
as measured by a side program that simply
reads a random block in a file... these response
times can exceed 1 second or longer.

Any pointers greatly appreaciated!

Tom


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Uber block corruption?

2006-12-12 Thread Darren Dunham
 Hello Toby,
 
 Tuesday, December 12, 2006, 4:18:54 PM, you wrote:
 TT On 12-Dec-06, at 9:46 AM, George Wilson wrote:
 
  Also note that the UB is written to every vdev (4 per disk) so the  
  chances of all UBs being corrupted is rather low.
 
 It depends actually - if all your vdevs are on the same array with
 write back cache set to on you actually can end-up with all UB
 corrupted - at least in theory.

Do such caches respond to explicit flushes?  My understanding is that it
should try to flush between writing the front 2 and the back 2.

Not that even that would guarantee anything if there are real bugs in
the cache code, but it would improve the odds.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Monitoring ZFS

2006-12-12 Thread Neil Perrin



Tom Duell wrote On 12/12/06 17:11,:

Group,

We are running a benchmark with 4000 users
simulating a hospital management system
running on Solaris 10 6/06 on USIV+ based
SunFire 6900 with 6540 storage array.

Are there any tools for measuring internal
ZFS activity to help us understand what is going
on during slowdowns?


dtrace can be used in numerous ways to examine
every part of ZFS and Solaris. lockstat(1M) (which actually
uses dtrace underneath) can also be used to see the cpu activity
(try lockstat -kgIW -D 20 sleep 10).

You can also use iostat (eg iostat -xnpcz) to look at disk activity.



We have 192GB of RAM and while ZFS runs
well most of the time, there are times where
the system time jumps up to 25-40%
as measured by vmstat and iostat.  These
times coincide with slowdowns in file access
as measured by a side program that simply
reads a random block in a file... these response
times can exceed 1 second or longer.


ZFS commits transaction groups every 5 seconds.
I suspect this flurry of activity is due to that.
Commiting can indeed take longer than a second.

You might be able to show this by changing it with:

# echo txg_time/W 10 | mdb -kw

then the activity should be longer but less frequent.
I don't however recommend you keep it at that value.




Any pointers greatly appreaciated!

Tom





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS behavior under heavy load (I/O that is)

2006-12-12 Thread Anantha N. Srirama
I'm observing the following behavior on our E2900 (24 x 92 config), 2 FCs, and 
... I've a large filesystem (~758GB) with compress mode on. When this 
filesystem is under heavy load (150MB/S) I've problems saving files in 'vi'. I 
posted here about it and recall that the issue is addressed in Sol10U3. This 
morning I observed another variation of this problem as follows:

- Create a file in 'vi' and save it, session will hang as if it is waiting for 
the write to complete.
- In another session you'll observe the write from 'vi' is indeed complete as 
evidenced by the contents of the file.

Am I repeating myself here or is it a different problem all together.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Monitoring ZFS

2006-12-12 Thread Tom Duell
Thanks, Neil, for the assistance.

Tom
Neil Perrin wrote On 12/12/06 19:59,:

Tom Duell wrote On 12/12/06 17:11,:
  

Group,

We are running a benchmark with 4000 users
simulating a hospital management system
running on Solaris 10 6/06 on USIV+ based
SunFire 6900 with 6540 storage array.

Are there any tools for measuring internal
ZFS activity to help us understand what is going
on during slowdowns?



dtrace can be used in numerous ways to examine
every part of ZFS and Solaris. lockstat(1M) (which actually
uses dtrace underneath) can also be used to see the cpu activity
(try lockstat -kgIW -D 20 sleep 10).

You can also use iostat (eg iostat -xnpcz) to look at disk activity.
  

Yes, we are doing this and the disks are performing
extremely well.

  

We have 192GB of RAM and while ZFS runs
well most of the time, there are times where
the system time jumps up to 25-40%
as measured by vmstat and iostat.  These
times coincide with slowdowns in file access
as measured by a side program that simply
reads a random block in a file... these response
times can exceed 1 second or longer.



ZFS commits transaction groups every 5 seconds.
I suspect this flurry of activity is due to that.
Commiting can indeed take longer than a second.

You might be able to show this by changing it with:

# echo txg_time/W 10 | mdb -kw

then the activity should be longer but less frequent.
I don't however recommend you keep it at that value.

  

Thanks, we may try that to see what effects it
might have.

  

Any pointers greatly appreaciated!

Tom





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS behavior under heavy load (I/O that is)

2006-12-12 Thread Anton B. Rang
I think you may be observing that fsync() is slow.

The file will be written, and visible to other processes via the in-memory 
cache, before the data has been pushed to disk. vi forces the data out via 
fsync, and that can be quite slow when the file system is under load, 
especially before a fix which allows fsync to work on a per-file basis. (In the 
S10U2 aka 6/06 Solaris release, fsync on ZFS forced all changes to disk, not 
just those of the requested file.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Uber block corruption?

2006-12-12 Thread Anton B. Rang
 Also note that the UB is written to every vdev (4 per disk) so the 
 chances of all UBs being corrupted is rather low.

The chances that they're corrupted by the storage system, yes.

However, they are all sourced from the same in-memory buffer, so an undetected 
in-memory error (e.g. kernel bug) will be replicated to all vdevs.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Anton B. Rang
 Were looking for pure performance.
 
 What will be contained in the LUNS is Student User
 account files that they will access and Department
 Share files like, MS word documents, excel files,
 PDF.  There will be no applications on the ZFS
 Storage pools or pool   Does this help on what
 strategy might be best?

I think so.

I would suggest striping a single pool across all available LUNs, then. (I'm 
presuming that you would be prepared to recover from ZFS-detected errors by 
reloading from backup.) There doesn't seem any compelling reason to split your 
storage into multiple pools, and by using a single pool, you don't have to 
worry about reallocating storage if one pool fills up while another has free 
space.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS and write caching (SATA)

2006-12-12 Thread Anton B. Rang
It took manufacturers of SCSI drives some years to get this right. Around 1997 
or so we were still seeing drives at my former employer that didn't properly 
flush their caches under all circumstances (and had other interesting 
behaviours WRT caching).

Lots of ATA disks never did bother to implement the write cache controls.

I haven't talked recently with any vendors who have been sourcing SATA disks, 
so I don't know what they're seeing. Generally the major players have their own 
disk qualification suites and often wind up with custom firmware because they 
want all of their detected bugs fixed before they'll accept a particular disk. 
If you buy a disk off-the-shelf, you get a drive that's gone through the disk 
manufacturer's testing (which is good, don't get me wrong) but hasn't been 
qualified with the particular commands or configuration that a particular 
operating system or file system might send.

If you can do your own tests, that would be best; but that involves executing a 
flush (with all the various combinations of commands outstanding, dirty vs. 
clean cache buffers, etc.) and immediately powering off the device, which 
generally can't be done without special hardware. My *hunch* is that 
enterprise-class SATA disks have probably gone through more of this sort of 
testing than consumer SATA, even at the drive manufacturers. (It's not at all 
the same firmware.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Kickstart hot spare attachment

2006-12-12 Thread Anton B. Rang
 If the SCSI commands hang forever, then there is nothing that ZFS can
 do, as a single write will never return.  The more likely case is that
 the commands are continually timining out with very long response times,
 and ZFS will continue to talk to them forever.

It looks like the sd driver defaults to a 60-second timeout, which is
quite long. It might be useful if FMA saw a potential fault for any I/O
longer than some much lower value.

(This gets tricky with power management, since if you have to wait for
the disk to spin up, it can take a long time compared to normal I/O.)

That said, it sounds to me like your enclosure is actually powering down
the drive. If so, it ought to stop responding to selection, and I/O should
fail in a hard way within 250 ms (or less, depending on whether you've
got a SCSI bus which supports QAS, as the newer, faster versions do).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on a damaged disk

2006-12-12 Thread Patrick P Korsnick
i have a machine with a disk that has some sort of defect and i've found that 
if i partition only half of the disk that the machine will still work.  i tried 
to use 'format' to scan the disk and find the bad blocks, but it didn't work.

so as i don't know where the bad blocks are but i'd still like to use some of 
the rest of the disk, i thought ZFS might be able to help.  i partitioned the 
disk so slices 4,5,6 and 7 are each 5GB.  i thought i'd make one or multiple 
zpools on those slices and then i'd be able to narrow down where the bad 
sections are.

so my question is can i declare a zpool that spans multiple c0d0sXX but isn't a 
mirror and if i can, then will zfs be able to detect where the problem c0d0sXX 
is and not use it?  if not, i'll have to make 4 different zpools and experiment 
with storing stuff on each to find the approximate location of the bad blocks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss