[zfs-discuss] Why RAID 5 stops working in 2009

2008-07-03 Thread Jim
Anyone here read the article Why RAID 5 stops working in 2009 at 
http://blogs.zdnet.com/storage/?p=162

Does RAIDZ have the same chance of unrecoverable read error as RAID5 in Linux 
if the RAID has to be rebuilt because of a faulty disk?  I imagine so because 
of the physical constraints that plague our hds.  Granted, the chance of 
failure in my case shouldn't be nearly as high as I will most likely recruit 
four or three 750gb drives- not in the order of 10tb.

With my opensolaris NAS, I will be scrubbing every week (consumer grade 
drives[every month for enterprise-grade]) as recommended in the ZFS best 
practices guide.  If I zpool status and I see that the scrub is increasingly 
fixing errors, would that mean that the disk is in fact headed towards failure 
or perhaps that the natural expansion of disk usage is to blame?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Cannot replace a replacing device

2010-03-28 Thread Jim
I had a drive fail and replaced it with a new drive. During the resilvering 
process the new drive had write faults and was taken offline. These faults were 
caused by a broken SATA cable (drive checked with Manufacturers software and 
all ok). New cable fixed the the failure. However, now the drive shows as 
faulted.

I know the drive is healthy so want to force a rescrub. However, this wont 
happen while it is showing FAULTED. I tried to force a replace but this gives 
the error Cannot replace a replacing device. So I seem to be in a stuck 
state, where the replace wont complete. Please help - screen output below.


C3P0# zpool status
  pool: tank
 state: DEGRADED
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
tank   DEGRADED 0 0 0
  raidz1   DEGRADED 0 0 0
ad4ONLINE   0 0 0
ad6ONLINE   0 0 0
replacing  UNAVAIL  0 1.06K 0  insufficient 
replicas
  1796873336336467178  UNAVAIL  0 1.23K 0  was /dev/ad7/old
  4407623704004485413  FAULTED  0 1.22K 0  was /dev/ad7

errors: No known data errors
C3P0# zpool replace -f tank 4407623704004485413 ad7
cannot replace 4407623704004485413 with ad7: cannot replace a replacing device
C3P0#
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot replace a replacing device

2010-03-28 Thread Jim
Yes - but it does nothing. The drive remains FAULTED.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot replace a replacing device

2010-03-29 Thread Jim
Thanks for the suggestion, but have tried detaching but it refuses reporting no 
valid replicas. Capture below.

C3P0# zpool status
  pool: tank
 state: DEGRADED
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
tank   DEGRADED 0 0 0
  raidz1   DEGRADED 0 0 0
ad4ONLINE   0 0 0
ad6ONLINE   0 0 0
replacing  UNAVAIL  0 9.77K 0  insufficient 
replicas
  1796873336336467178  UNAVAIL  0 11.6K 0  was /dev/ad7/old
  4407623704004485413  FAULTED  0 10.4K 0  was /dev/ad7

errors: No known data errors
C3P0# zpool detach tank 1796873336336467178
cannot detach 1796873336336467178: no valid replicas
C3P0# zpool detach tank 4407623704004485413
cannot detach 4407623704004485413: no valid replicas
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot replace a replacing device

2010-03-29 Thread Jim
Thanks for the suggestion, but have tried detaching but it refuses reporting no 
valid replicas. Capture below.

C3P0# zpool status
pool: tank
state: DEGRADED
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0
replacing UNAVAIL 0 9.77K 0 insufficient replicas
1796873336336467178 UNAVAIL 0 11.6K 0 was /dev/ad7/old
4407623704004485413 FAULTED 0 10.4K 0 was /dev/ad7

errors: No known data errors
C3P0# zpool detach tank 1796873336336467178
cannot detach 1796873336336467178: no valid replicas
C3P0# zpool detach tank 4407623704004485413
cannot detach 4407623704004485413: no valid replicas
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot replace a replacing device

2010-03-30 Thread Jim
Thanks - have run it and returns pretty quickly. Given the output (attached) 
what action can I take?

Thanks

James
-- 
This message posted from opensolaris.orgDirty time logs:

tank
outage [300718,301073] length 356
outage [301138,301139] length 2
outage [301149,301149] length 1
outage [301151,301153] length 3
outage [301155,301155] length 1
outage [301157,301158] length 2
outage [301182,301182] length 1
outage [301262,301262] length 1
outage [301911,301916] length 6
outage [304063,304063] length 1
outage [304791,304796] length 6

raidz
outage [300718,301073] length 356
outage [301138,301139] length 2
outage [301149,301149] length 1
outage [301151,301153] length 3
outage [301155,301155] length 1
outage [301157,301158] length 2
outage [301182,301182] length 1
outage [301262,301262] length 1
outage [301911,301916] length 6
outage [304063,304063] length 1
outage [304791,304796] length 6

/dev/ad4

/dev/ad6

replacing
outage [300718,301073] length 356
outage [301138,301139] length 2
outage [301149,301149] length 1
outage [301151,301153] length 3
outage [301155,301155] length 1
outage [301157,301158] length 2
outage [301182,301182] length 1
outage [301262,301262] length 1
outage [301911,301916] length 6
outage [304063,304063] length 1
outage [304791,304796] length 6

/dev/ad7/old
outage [300718,301073] length 356
outage [301138,301139] length 2
outage [301149,301149] length 1
outage [301151,301153] length 3
outage [301155,301155] length 1
outage [301157,301158] length 2
outage [301182,301182] length 1
outage [301262,301262] length 1
outage [301911,301916] length 6
outage [304063,304063] length 1
outage [304791,304796] length 6

/dev/ad7
outage [300718,301073] length 356
outage [301138,301139] length 2
outage [301149,301149] length 1
outage [301151,301153] length 3
outage [301155,301155] length 1
outage [301157,301158] length 2
outage [301182,301182] length 1
outage [301262,301262] length 1
outage [301911,301916] length 6
outage [304063,304063] length 1
outage [304791,304796] length 6


Metaslabs:


vdev 0 0   26   20.0M

offset spacemapfree
-- 

 4   52166M
 8   56   2.66G
 c   65   12.4M
10   66   20.7M
14   69   29.1M
18   73   29.7M
1c   77   29.6M
20   81   79.2M
24   91   87.9M
28   92   63.2M
2c   94   94.2M
30   99123M
34  103523M
38  107   50.9M
3c  111117M
40  116   54.3M
44  119   60.2M
48  123   97.4M
4c  126   1.20G
50  129   48.5M
54  132106M
58  137   27.4M
5c  140   39.6M
60  146   45.3M
64  149   34.9M
68  151544M
6c  154   36.6M
70  156   19.4M
74  160   35.7M
78  162   41.2M
7c  166   23.1M
9c   74   14.1M
a0   78   15.2M
a4   88   28.1M
a8  174   23.3M
ac  178   24.2M
b0  181   26.3M
b4  100   43.4M
b8  104   33.6M
bc  108   30.6M
c0  113   59.8M
c4  115   53.9M
c8  120   30.8M
cc  124   82.2M
d0  127   36.9M
d4  130   76.2M
d8  133   39.7M

Re: [zfs-discuss] howto: make a pool with ashift=X

2011-05-23 Thread Jim Klimov
Well, for the sake of completeness (and perhaps to enable users of snv_151a) 
there should also be links to alternative methods:
1) Using a patched-source and recompiled, or an already precompiled, zpool 
binary, i.e.
http://www.solarismen.de/archives/4-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-1.html
http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html
http://www.solarismen.de/archives/6-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-3.html
http://www.solarismen.de/archives/9-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-4.html
http://www.kuehnke.de/christian/solaris/zpool-s10u8
 
2) Making a pool in an alternate OS, such as FreeBSD LiveCD with their tricks, 
and then importing/upgrading in Solaris.
See www.zfsguru.org and numerous posts in the internet by its author sub_mesa 
or sub.mesa. 
 
I am not promoting either of these methods. I've used (1) successfully on my 
OI_148a by taking a precompiled binary, and I didn't get around to trying (2).
Just my 2c :)
//Jim
___
zfs-crypto-discuss mailing list
zfs-crypto-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss


[zfs-discuss] Re: zfs panic when unpacking open solaris source

2006-05-12 Thread Jim Walker
Looks like CR 6411261 busy intent log runs out of space on small pools.

I found this one. I just bumped up the priority.

Jim

 When unpacking the solaris source onto a local disk
 on a system running build 39 I got the following
 panic:
 
 
 panic[cpu0]/thread=d2c8ade0:
 really out of space
 
 
 d2c8a7b4 zfs:zio_write_allocate_gang_members+3e6
 (e4385ac0)
 d2c8a7d0 zfs:zio_dva_allocate+81 (e4385ac0)
 d2c8a7e8 zfs:zio_next_stage+66 (e4385ac0)
 d2c8a800 zfs:zio_checksum_generate+5e (e4385ac0)
 d2c8a81c zfs:zio_next_stage+66 (e4385ac0)
 d2c8a83c zfs:zio_wait_for_children+46 (e4385ac0, 1,
 e4385c)
 d2c8a850 zfs:zio_wait_children_ready+18 (e4385ac0)
 d2c8a864 zfs:zio_next_stage_async+ac (e4385ac0,
 f8def9d0,)
 d2c8a874 zfs:zio_nowait+e (e4385ac0)
 d2c8a8d4 zfs:zio_write_allocate_gang_members+341
 (e120e0c0)
 d2c8a8f0 zfs:zio_dva_allocate+81 (e120e0c0)
 d2c8a908 zfs:zio_next_stage+66 (e120e0c0)
 d2c8a920 zfs:zio_checksum_generate+5e (e120e0c0)
 d2c8a93c zfs:zio_next_stage+66 (e120e0c0)
 d2c8a95c zfs:zio_wait_for_children+46 (e120e0c0, 1,
 e120e2)
 d2c8a970 zfs:zio_wait_children_ready+18 (e120e0c0)
 d2c8a984 zfs:zio_next_stage_async+ac (e120e0c0,
 f8def9d0,)
 d2c8a994 zfs:zio_nowait+e (e120e0c0)
 d2c8a9f4 zfs:zio_write_allocate_gang_members+341
 (e3c0a580)
 d2c8aa10 zfs:zio_dva_allocate+81 (e3c0a580)
 d2c8aa28 zfs:zio_next_stage+66 (e3c0a580)
 d2c8aa40 zfs:zio_checksum_generate+5e (e3c0a580)
 d2c8aa54 zfs:zio_next_stage+66 (e3c0a580)
 d2c8aaa0 zfs:zio_write_compress+236 (e3c0a580)
 d2c8aabc zfs:zio_next_stage+66 (e3c0a580)
 d2c8aadc zfs:zio_wait_for_children+46 (e3c0a580, 1,
 e3c0a7)
 d2c8aaf0 zfs:zio_wait_children_ready+18 (e3c0a580)
 d2c8ab04 zfs:zio_next_stage_async+ac (e3c0a580, 0,
 f8dbfe)
 d2c8ab1c zfs:zio_nowait+e (e3c0a580)
 d2c8ab3c zfs:arc_write+7b (e44c9780, d895e8c0,)
 d2c8abec zfs:dbuf_sync+5f3 (dbd6ef00, e44c9780,)
 d2c8ac4c zfs:dnode_sync+33a (d34fbb30, 1, e44c97)
 d2c8ac80 zfs:dmu_objset_sync_dnodes+7e (d2380240,
 d23802fc,)
 d2c8acd0 zfs:dmu_objset_sync+5d (d2380240, e96f1e80)
 d2c8ad1c zfs:dsl_pool_sync+121 (d244a180, 15e234, 0)
 d2c8ad6c zfs:spa_sync+10a (d895e8c0, 15e234, 0)
 d2c8adc8 zfs:txg_sync_thread+1df (d244a180, 0)
 d2c8add8 unix:thread_start+8 ()
 
 I now have a chicken and egg problem, need to unpack
 the source to work out what is going on but can't as
 the system crashes unless I put it on my external USB
 drive but there are some issues with that!
 
 Is this a known issue?
 
 Some more data on the file systems:
 
 : sigma IA 4 $; zfs list -r home/cjg
 NAME   USED  AVAIL  REFER
  MOUNTPOINT
 ome/cjg  7.81G   138M  1.99G
  /export/home/cjg
 ome/[EMAIL PROTECTED] 1.91M  -  1.97G  -
 home/[EMAIL PROTECTED]:53:46  2.38M  -  1.97G  -
 home/[EMAIL PROTECTED] 433K  -  1.97G  -
 home/[EMAIL PROTECTED] 492K  -  1.97G  -
 home/[EMAIL PROTECTED] 409K  -  1.97G  -
 home/[EMAIL PROTECTED] 474K  -  1.97G  -
 home/[EMAIL PROTECTED] 314K  -  1.97G  -
 home/[EMAIL PROTECTED] 314K  -  1.97G  -
 home/[EMAIL PROTECTED]0  -  1.97G  -
 home/[EMAIL PROTECTED]0  -  1.97G  -
 home/[EMAIL PROTECTED]0  -  1.97G  -
 home/[EMAIL PROTECTED] 253K  -  1.97G  -
 home/[EMAIL PROTECTED] 342K  -  1.97G  -
 home/[EMAIL PROTECTED] 624K  -  1.98G  -
 home/[EMAIL PROTECTED] 429K  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED] 146K  -  1.98G  -
 home/[EMAIL PROTECTED] 282K  -  1.98G  -
 home/[EMAIL PROTECTED] 218K  -  1.98G  -
 home/[EMAIL PROTECTED] 300K  -  1.98G  -
 home/[EMAIL PROTECTED] 232K  -  1.98G  -
 home/[EMAIL PROTECTED] 458K  -  1.98G  -
 home/[EMAIL PROTECTED] 462K  -  1.98G  -
 home/[EMAIL PROTECTED] 576K  -  1.98G  -
 home/[EMAIL PROTECTED] 147K  -  1.98G  -
 home/[EMAIL PROTECTED] 147K  -  1.98G  -
 home/[EMAIL PROTECTED] 448K  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED] 354K  -  1.98G  -
 home/[EMAIL PROTECTED] 258K  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED]0  -  1.98G  -
 home/[EMAIL PROTECTED] 522K  -  1.98G  -
 home/[EMAIL PROTECTED] 615K  -  1.98G  -
 home/[EMAIL PROTECTED] 766K  -  1.98G  -
 home/[EMAIL PROTECTED] 625K  -  1.98G  -
 home/[EMAIL PROTECTED] 565K  -  1.98G  -
 home/[EMAIL PROTECTED] 470K  -  1.98G  -
 home/[EMAIL PROTECTED] 495K  -  1.98G  -
 home/[EMAIL PROTECTED] 305K  -  1.98G  -
 home/[EMAIL PROTECTED] 314K  -  1.98G  -
 home

[zfs-discuss] Re: RE: [Security-discuss] Proposal for new basic privileges related with

2006-06-21 Thread Jim Walker
 I am also interested in writing some test cases that
 will check the correct semantic of access checks on
 files with different permissions and with different
 privileges set/unset by the process. Are there
 already file access test cases at Sun I may expand?
 Should test suites for OpenSolaris be written in a
 special kind or programming languages?

We do extensive file access testing as part of the zfs test suite.
The test suite is mostly written in ksh scripts with some C code. 
We should have the test suite available externally via 
OpenSolaris.org sometime in July or August. In the meantime
I would code up your unit tests in ksh so they can be more
easily integrated. We'll keep you posted as progress in
releasing the test suite is made.

Cheers,
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Let's get cooking...

2006-06-21 Thread Jim Mauro


http://www.tech-recipes.com/solaris_system_administration_tips1446.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS components for a minimal Solaris 10 U2 install?

2006-06-28 Thread Jim Connors
For an embedded application, I'm looking at creating a minimal Solaris 
10 U2 image which would include ZFS functionality.  In quickly taking a 
look at the opensolaris.org site under pkgdefs, I see three packages 
that appear to be related to ZFS: SUNWzfskr, SUNWzfsr, and SUNWzfsu.  Is 
it naive to think that this would be all that is needed for ZFS?


Thanks,
-- Jim C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Big JBOD: what would you do?

2006-07-17 Thread Jim Mauro
I agree with Greg - For ZFS, I'd recommend a larger number of raidz 
luns, with a smaller number

of disks per LUN, up to 6 disks per raidz lun.

This will more closely align with performance best practices, so it 
would be cool to find

common ground in terms of a sweet-spot for performance and RAS.

/jim


Gregory Shaw wrote:
To maximize the throughput, I'd go with 8 5-disk raid-z{2} luns.  
 Using that configuration, a full-width stripe write should be a 
single operation for each controller.


In production, the application needs would probably dictate the 
resulting disk layout.  If the application doesn't need tons of i/o, 
you could bind more disks together for larger luns...


On Jul 17, 2006, at 3:30 PM, Richard Elling wrote:


ZFS fans,
I'm preparing some analyses on RAS for large JBOD systems such as
the Sun Fire X4500 (aka Thumper).  Since there are zillions of possible
permutations, I need to limit the analyses to some common or desirable
scenarios.  Naturally, I'd like your opinions.  I've already got a few
scenarios in analysis, and I don't want to spoil the brain storming, so
feel free to think outside of the box.

If you had 46 disks to deploy, what combinations would you use?  Why?

Examples,
46-way RAID-0  (I'll do this just to show why you shouldn't do this)
22x2-way RAID-1+0 + 2 hot spares
15x3-way RAID-Z2+0 + 1 hot spare
...

Because some people get all wrapped up with the controllers, assume 5
8-disk SATA controllers plus 1 6-disk controller.  Note: the 
reliability of

the controllers is much greater than the reliability of the disks, so
the data availability and MTTDL analysis will be dominated by the disks
themselves.  In part, this is due to using SATA/SAS (point-to-point disk
connections) rather than a parallel bus or FC-AL where we would also have
to worry about bus or loop common cause failures.

I will be concentrating on data availability and MTTDL as two views 
of RAS.

The intention is that the interesting combinations will also be analyzed
for performance and we can complete a full performability analysis on 
them.

Thanks
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-2773
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382  [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] (work)
Louisville, CO 80028-4382[EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus 
Torvalds






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sucking down my memory!?

2006-07-21 Thread Jim Mauro


I need to read through this more thoroughly to get my head around it, but
on my first pass, what jumps out at me is that something significant
_changed_ in terms of application behavior with the introduction of ZFS.

I'm saying that that is a bad thing, or a good thing, but it is an 
important thing,

and we should try to understand if application behavior will, in general,
change with the introduction of ZFS, so we can advise users accordingly.

Joe appears to be a user of Sun system for some time, with a lot of 
experience
deploying Solaris 8 and Solaris 9. He has succesfully deployed systems 
without
physical swap, and I understand his reason for doing so. If the 
introduction of
Solaris 10 and ZFS means we need to change a system parameter when 
transitioning

from S8 or S9, such as configured swap, we need to understand why, and make
sure understand the performance implications.


Why do you think your performance *improves* if you don't use
swap?  It is much more likely it *deteriates* because your swap
accumulates stuff you do not use.
  

I'm not sure what this is saying, but I don't think it came out right.

As I said, I need to do another pass on the information in the messages 
to get

a better handle on the observed behviour, but this certainly seems like
something we should explore further.

Watch this space.

/jim

  
At any rate, I don't think adding swap will fix the problem I am seeing 
in that ZFS is not releasing its unused cache when applications need it. 
Adding swap might allow the kernel to move it out of memory but when the 
system needs it again it will have to swap it back in, and only 
performance suffers, no?



Well, you have decided that all application data needs to be memory
resident all of the time; but executables don't need to be (they
are now tossed out on memory shortage) and that ZFS can use less cache
than it wants to.

  
FWIW, here's the current ::memstat and swap output for my system. The 
reserved number is only about 46M or about 2% of RAM. Considering the 
box has 3G, I'm willing to sacrifice 2% in the interest of performance.


Page SummaryPagesMB  %Tot
     
Kernel 249927  1952   64%
Anon34719   2719%
Exec and libs2415181%
Page cache   1676130%
Free (cachelist)11796923%
Free (freelist) 88288   689   23%

Total  388821  3037
Physical   382802  2990

[EMAIL PROTECTED]: swap -s
total: 260008k bytes allocated + 47256k reserved = 307264k used, 381072k 
available



So there's 47MB of memory which is not used at all.  (Adding swap will
give you 47MB of additional free memory without anything being written
to disk).  Execs are also pushed out on shortfall.

There is 265 MB of anon memory and we have no clue how much of it
is used at all; a large percentage is likely unused.

But OTOH, you have sufficient memory on the freelist so there is not
much of an issue.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS components for a minimal Solaris 10 U2 install?

2006-07-25 Thread Jim Connors


Included below is a a thread which dealt with trying to find the 
packages necessary for a minimal Solais 10 U2 install with ZFS 
functionality.  In addition to SUNWzfskr, SUNzfsr and SUNWzfsu the 
SUNWsmapi package needs to be installed.  The libdiskmgt.so.1 library is 
required for the zpool(1M) command.  Finding this out via trial and 
error, there is no dependency mentioned for SUNWsmapi in the SUNWzfsr 
depend file.


Apologies if this is nitpicking, but is this missing dependency worthy 
of submitting a P5 CR?


-- Jim C


Jason Schroeder wrote:

Dale Ghent wrote:


On Jun 28, 2006, at 4:27 PM, Jim Connors wrote:

For an embedded application, I'm looking at creating a minimal  
Solaris 10 U2 image which would include ZFS functionality.  In  
quickly taking a look at the opensolaris.org site under pkgdefs, I  
see three packages that appear to be related to ZFS: SUNWzfskr,  
SUNWzfsr, and SUNWzfsu.  Is it naive to think that this would be  
all that is needed for ZFS?



Those packages, as well as what's listed in the depend files for  
those packages.


Ahh, don't you love climbing the dependency tree?

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Glenn Brunette wrote a nifty little tool  ...  have to assume that all 
of the dependencies are appropriately doc'ed of course cough.


http://blogs.sun.com/roller/page/gbrunett?entry=solaris_package_companion

/jason


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors


I understand.  Thanks.

Just curious, ZFS manages NFS shares.  Have you given any thought to 
what might be involved for ZFS to manage SMB shares in the same manner.  
This all goes towards my stateless OS theme.


-- Jim C


Eric Schrock wrote:

You need the following file:

/etc/zfs/zpool.cache

This file 'knows' about all the pools on the system.  These pools can
typically be discovered via 'zpool import', but we can't do this at boot
because:

a. It can be really, really expensive (tasting every disk on the system)
b. Pools can be comprised of files or devices not in /dev/dsk

So, we have the cache file, which must be editable if you want to
remember newly created pools.  Note this only affects configuration
changes to pools - everything else is stored within the pool itself.

- Eric

On Tue, Jul 25, 2006 at 12:18:07PM -0400, Jim Connors wrote:
  

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C



--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS state between reboots for RAM rsident OS?

2006-07-25 Thread Jim Connors

Eric Schrock wrote:

You need the following file:

/etc/zfs/zpool.cache
  


So as a workaround (or more appropriately, a kludge) would it be 
possible to:


1. At boot time do a 'zpool import' of some pool guaranteed to exist.  
For the sake of this discussion call it 'system'


2. Have /etc/zfs/zpool.cache be symbolically linked to /system/ZPOOL.CACHE

-- Jim C

This file 'knows' about all the pools on the system.  These pools can
typically be discovered via 'zpool import', but we can't do this at boot
because:

a. It can be really, really expensive (tasting every disk on the system)
b. Pools can be comprised of files or devices not in /dev/dsk

So, we have the cache file, which must be editable if you want to
remember newly created pools.  Note this only affects configuration
changes to pools - everything else is stored within the pool itself.

- Eric

On Tue, Jul 25, 2006 at 12:18:07PM -0400, Jim Connors wrote:
  

Guys,

Thanks for the help so far,  now comes the more interesting questions ...

Piggybacking off of some work being done to minimize Solaris for 
embedded use, I have a version of Solaris 10 U2 with ZFS functionality 
with a disk footprint of about 60MB.   Creating a miniroot based upon 
this image, it can be compressed to under 30MB.  Currently, I load this 
image onto a USB keyring and boot from the USB device running the 
Solaris miniroot out of RAM.  Note: The USB key ring is a hideously slow 
device, but for the sake of this proof of concept it works fine.  In 
addition, some more packages will need to be added later on (i.e. NFS, 
Samba?) which will increase the footprint.


My ultimate goal here would be to demonstrate a network storage 
appliance using ZFS, where the OS is effectively stateless, or as 
stateless as possible.  ZFS goes a long way in assisting here since, for 
example, mount and nfs share information can be managed by ZFS.  But I 
suppose it's not as stateless as I thought.  Upon booting from USB 
device into memory, I can do a `zpool create poo1 c1d0',  but a 
subsequent reboot does not remember this work.  Doing a `zpool list' 
yields 'no pools available'.  So the question is, what sort of state is 
required between reboots for ZFS?


Regards,
-- Jim C



--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Assertion raised during zfs share?

2006-08-04 Thread Jim Connors

Eric Schrock wrote:

This indicates that share(1M) didn't produce any output, but returned
a non-zero exit status.  I'm not sure why this would happen - can you
run the following by hand?

# share /export
# echo $?
  


bash-3.00# share
bash-3.00# share /export
bash-3.00# echo $?
0

Looks like the NFS server is not completely configured yet, and that it 
requires this zfs share stuff to work first.


bash-3.00# svcs -a | grep nfs/server
disabled6:24:31 svc:/network/nfs/server:default
bash-3.00# more /var/svc/log/network-nfs-server\:default.log
[ Aug  4 06:15:31 Executing start method (/lib/svc/method/nfs-server 
start) ]
Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 
399, function zfs_share

Abort - core dumped
[ Aug  4 06:15:32 Method start exited with status 0 ]
[ Aug  4 06:15:32 Stopping because process dumped core. ]
[ Aug  4 06:15:32 Executing stop method (/lib/svc/method/nfs-server 
stop 30) ][ Aug  4 06:15:32 Method stop exited with status 0 ]
[ Aug  4 06:15:32 Executing start method (/lib/svc/method/nfs-server 
start) ]
Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 
399, function zfs_share

Abort - core dumped

-- Jim C


Incidentally, the explicit 'zfs share' isn't needed, as we automatically
share the filesystem when the options are set (which did succeed).

- Eric

On Fri, Aug 04, 2006 at 12:42:02PM -0400, Jim Connors wrote:
  
Working to get  ZFS to run on a minimal Solaris 10 U2 configuration.  In 
this scenario, ZFS is included the miniroot which is booted into RAM.  
When trying to share one of the filesystems, an assertion is raised - 
see below.   If the version of  source on OpenSolaris.org  matches 
Solaris 10 U2, then it looks like it's associated with a popen of 
/usr/sbin/share.  Can anyone shed any light on this?


Thanks,
-- Jim C


# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
SYS 83K   163M  30.5K  /SYS
export 110K  72.8G  25.5K  /export
export/home   24.5K  72.8G  24.5K  /export/home
# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
SYS 195M 90K195M 0%  ONLINE -
export   74G114K   74.0G 0%  ONLINE -
# zfs set sharenfs=on export
# zfs share export
Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 
399, function zfs_share

Abort - core dumped

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Assertion raised during zfs share?

2006-08-04 Thread Jim Connors

Richard Elling wrote:

Jim Connors wrote:


Working to get  ZFS to run on a minimal Solaris 10 U2 configuration. 


What does minimal mean?  Most likely, you are missing something.
  -- richard
Yeah.  Looking at package and SMF dependencies plus a whole lot of and 
trial and error, I've currently got Solaris down to 47 packages.  The 
nfs/server service for Solaris 10 U2 will first try to do a zfs share.  
For the next step, I'll probably comment out that stuff and see I can 
bring up the nfs server code and share a UFS filesystem using the 
traditional methods.  Once that's OK I'll move on to the ZFS portion and 
investigate.


Thanks,
-- Jim C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-08 Thread Jim Sloey
 Roch - PAE wrote:
 The hard part is getting a set of simple requirements. As you go into 
 more complex data center environments you get hit with older Solaris 
 revs, other OSs, SOX compliance issues, etc. etc. etc. The world where 
 most of us seem to be playing with ZFS is on the lower end of the 
 complexity scale. 
I've been watching this thread and unfortunately fit this model. I'd hoped that 
ZFS might scale enough to solve my problem but you seem to be saying that it's 
mostly untested in large scale environments.
About 7 years ago we ran out of inodes on our UFS file systems. We used bFile 
as middleware for a while to distribute the files across multiple disks and 
then switched to VFS on SAN about 5 years ago. Distribution across file systems 
and inode depletion continued to be a problem so we switched middleware to 
another vendor that essentially compresses about 200 files into a single 10Mb 
archive and uses a DB to find the file within the archive on the correct disk. 
Expensive, complex and slow but effective solution until the latest license 
renewal when we got hit with a huge bill. 
I'd love to go back to a pure file system model and looked at Reiser4, JFS, 
NTFS and now ZFS for a way to support over 100 million small documents and 
16Tb. We average 2 file reads and 1 file write per second 24/7 with expected 
growth to 24Tb. I'd be willing to scrap everything we have to find a 
non-proprietary long term solution.
ZFS looked like it might provide an answer. Are you saying it's not really 
suitable for this type of application?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs hot spare not automatically getting used

2006-11-28 Thread Jim Hranicky
So is there a command to make the spare get used, or
so I have to remove it as a spare and add it if it doesn't
get automatically used?

Is this a bug to be fixed, or will this always be the case when
the disks aren't exactly the same size?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs hot spare not automatically getting used

2006-11-29 Thread Jim Hranicky
I know this isn't necessarily ZFS specific, but after I reboot I spin the 
drives back
up, but nothing I do (devfsadm, disks, etc) can get them seen again until the
next reboot.

I've got some older scsi drives in an old Andataco Gigaraid enclosure which
I thought supported hot-swap, but I seem unable to hot swap them in. The PC
has an adaptec 39160 card in it and I'm running Nevada b51. Is this not a 
setup that can support hot swap? Or is there something I have to do other
than devfsadm to get the scsi bus rescanned?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Managed to corrupt my pool

2006-11-30 Thread Jim Hranicky
Platform:

  - old dell workstation with an Andataco gigaraid enclosure 
plugged into an Adaptec 39160
  - Nevada b51

Current zpool config:

   - one two-disk mirror with two hot spares

In my ferocious pounding of ZFS I've managed to corrupt my data
pool. This is what I've been doing to test it:

   - set zil_disable to 1 in /etc/system
   - continually untar a couple of files into the filesystem
   - manually spin down a drive in the mirror by holding down
 the button on the enclosure
   - for any system hangs reboot with a nasty

  reboot -dnq

I've gotten different results after the spindown:

   - works properly: short or no hang, hot spare successfully 
  added to the mirror
   - system hangs, and after a reboot the spare is not added
   - tar hangs, but after running zpool status the hot
  spare is added properly and tar continues
   - tar continues, but hangs on zpool status

The last is what happened just prior to the corruption. Here's the output
of zpool status:

nextest-01# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 1 errors on Thu Nov 30 11:37:21 2006
config:

NAMESTATE READ WRITE CKSUM
zmirDEGRADED 8 0 4
  mirrorDEGRADED 8 0 4
c3t3d0  ONLINE   0 024
c3t4d0  UNAVAIL  0 0 0  cannot open
spares
  c0t0d0AVAIL
  c3t1d0AVAIL

errors: The following persistent errors have been detected:

  DATASET  OBJECT  RANGE
  15   0   lvl=4294967295 blkid=0

So the questions are:

  - is this fixable? I don't see an inum I could run find on to remove, 
and I can't even do a zfs volinit anyway:

nextest-01# zfs volinit
cannot iterate filesystems: I/O error

   - would not enabling zil_disable have prevented this?

   - Should I have been doing a 3-way mirror?

   - Is there a more optimum configuration to help prevent this
  kind of corruption?

Ultimately, I want to build a ZFS server with performance and reliability
comparable to say, a Netapp, but the fact that I appear to have been
able to nuke my pool by simulating a hardware error gives me pause. 

I'd love to know if I'm off-base in my worries.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Managed to corrupt my pool

2006-12-05 Thread Jim Hranicky
 So the questions are:
 
 - is this fixable? I don't see an inum I could run
  find on to remove, 
and I can't even do a zfs volinit anyway:
nextest-01# zfs volinit
  cannot iterate filesystems: I/O error
 
 - would not enabling zil_disable have prevented
  this?
 
- Should I have been doing a 3-way mirror?
 - Is there a more optimum configuration to help
  prevent this  kind of corruption?

Anyone have any thoughts on this? I'd really like to be 
able to build a nice ZFS box for file service but if a 
hardware failure can corrupt a disk pool I'll have to 
try to find another solution, I'm afraid.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Managed to corrupt my pool

2006-12-05 Thread Jim Hranicky
 Anyone have any thoughts on this? I'd really like to
 be able to build a nice ZFS box for file service but if
 a  hardware failure can corrupt a disk pool I'll have to
  try to find another solution, I'm afraid.

Sorry, I worded this poorly -- if the loss of a disk in a mirror
can corrupt the pool it's going to give me pause in implementing
a ZFS solution. 

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Netapp to Solaris/ZFS issues

2006-12-06 Thread Jim Davis
We have two aging Netapp filers and can't afford to buy new Netapp gear, 
so we've been looking with a lot of interest at building NFS fileservers 
running ZFS as a possible future approach.  Two issues have come up in the 
discussion


- Adding new disks to a RAID-Z pool (Netapps handle adding new disks very 
nicely).  Mirroring is an alternative, but when you're on a tight budget 
losing N/2 disk capacity is painful.


- The default scheme of one filesystem per user runs into problems with 
linux NFS clients; on one linux system, with 1300 logins, we already have 
to do symlinks with amd because linux systems can't mount more than about 
255 filesystems at once.  We can of course just have one filesystem 
exported, and make /home/student a subdirectory of that, but then we run 
into problems with quotas -- and on an undergraduate fileserver, quotas 
aren't optional!


Neither of these problems are necessarily showstoppers, but both make the 
transition more difficult.  Any progress that could be made with them 
would help sites like us make the switch sooner.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Jim Mauro


Hey Ben - I need more time to look at this and connect some dots,
but real quick

Some nfsstat data that we could use to potentially correlate to the local
server activity would be interesting. zfs_create() seems to be the
heavy hitter, but a periodic kernel profile (especially if we can catch
a 97% SYS period) would help:

#lockstat -i997 -Ik -s 10 sleep 60

Alternatively:

#dtrace -n 'profile-997hz / arg0 != 0 / { @s[stack()]=count(); }'

It would also be interesting to see what the zfs_create()'s are doing.
Perhaps a quick:

#dtrace -n 'zfs_create:entry { printf(ZFS Create: %s\n, 
stringof(args[0]-v_path)); }'


It would also be interesting to see the network stats. Grab Brendan's 
nicstat

and collect some samples

You're reference to low traffic is in bandwidth, which, as you indicate, 
is really,
really low. But the data, at least up to this point, suggests the 
workload is not
data/bandwidth intensive, but more attribute intensive. Note again 
zfs_create()
is the heavy ZFS function, along with zfs_getattr. Perhaps it's the 
attribute-intensive

nature of the load that is at the root of this.

I can spend more time on this tomorrow (traveling today).

Thanks,
/jim


Ben Rockwood wrote:

I've got a Thumper doing nothing but serving NFS.  Its using B43 with 
zil_disabled.  The system is being consumed in waves, but by what I don't know. 
 Notice vmstat:

 3 0 0 25693580 2586268 0 0  0  0  0  0  0  0  0  0  0  926   91  703  0 25 75
 21 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 13 14 1720   21 1105  0 92  8
 20 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0 17 18 2538   70  834  0 100 0
 25 0 0 25693580 2586268 0 0 0  0  0  0  0  0  0  0  0  745   18  179  0 100 0
 37 0 0 25693552 2586240 0 0 0  0  0  0  0  0  0  7  7 1152   52  313  0 100 0
 16 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0 15 13 1543   52  767  0 100 0
 17 0 0 25693592 2586280 0 0 0  0  0  0  0  0  0  2  2  890   72  192  0 100 0
 27 0 0 25693572 2586260 0 0 0  0  0  0  0  0  0 15 15 3271   19 3103  0 98  2
 0 0 0 25693456 2586144 0 11 0  0  0  0  0  0  0 281 249 34335 242 37289 0 46 54
 0 0 0 25693448 2586136 0 2  0  0  0  0  0  0  0  0  0 2470  103 2900  0 27 73
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1062  105  822  0 26 74
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0 1076   91  857  0 25 75
 0 0 0 25693448 2586136 0 0  0  0  0  0  0  0  0  0  0  917  126  674  0 25 75

These spikes of sys load come in waves like this.  While there are close to a 
hundred systems mounting NFS shares on the Thumper, the amount of traffic is 
really low.  Nothing to justify this.  We're talking less than 10MB/s.

NFS is pathetically slow.  We're using NFSv3 TCP shared via ZFS sharenfs on a 
3Gbps aggregation (3*1Gbps).

I've been slamming my head against this problem for days and can't make 
headway.  I'll post some of my notes below.  Any thoughts or ideas are welcome!

benr.

===

Step 1 was to disable any ZFS features that might consume large amounts of CPU:

# zfs set compression=off joyous
# zfs set atime=off joyous
# zfs set checksum=off joyous

These changes had no effect.

Next was to consider that perhaps NFS was doing name lookups when it shouldn't. Indeed 
dns was specified in /etc/nsswitch.conf which won't work given that no DNS 
servers are accessable from the storage or private networks, but again, no improvement. 
In this process I removed dns from nsswitch.conf, deleted /etc/resolv.conf, and disabled 
the dns/client service in SMF.

Turning back to CPU usage, we can see the activity is all SYStem time and comes 
in waves:

[private:/tmp] root# sar 1 100

SunOS private.thumper1 5.11 snv_43 i86pc12/07/2006

10:38:05%usr%sys%wio   %idle
10:38:06   0  27   0  73
10:38:07   0  27   0  73
10:38:09   0  27   0  73
10:38:10   1  26   0  73
10:38:11   0  26   0  74
10:38:12   0  26   0  74
10:38:13   0  24   0  76
10:38:14   0   6   0  94
10:38:15   0   7   0  93
10:38:22   0  99   0   1  --
10:38:23   0  94   0   6  --
10:38:24   0  28   0  72
10:38:25   0  27   0  73
10:38:26   0  27   0  73
10:38:27   0  27   0  73
10:38:28   0  27   0  73
10:38:29   1  30   0  69
10:38:30   0  27   0  73

And so we consider whether or not there is a pattern to the frequency. The 
following is sar output from any lines in which sys is above 90%:

10:40:04%usr%sys%wio   %idleDelta
10:40:11   0  97   0   3
10:40:45   0  98   0   2   34 seconds
10:41:02   0  94   0   6   17 seconds
10:41:26   0 100   0   0   24 seconds
10:42:00   0 100   0   0   34 seconds
10:42:25   (end of sample) 25 seconds

Looking

Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Jim Mauro


Could be NFS synchronous semantics on file create (followed by 
repeated flushing of the write cache).  What kind of storage are you 
using (feel free to send privately if you need to) - is it a thumper? 
It's not clear why NFS-enforced synchronous semantics would induce 
different behavior than the same

load to a local ZFS.

File creates are metadata intensive, right? And these operations need to 
be synchronous to guarantee

file system consistency (yes, I am familiar with the ZFS COW model).

AnywayI'm feeling rather naive' here, but I've seen the NFS 
enforced synchronous semantics phrase
kicked around many times as the explanation for suboptimal performance 
for metadata-intensive
operations when ZFS is the underlying file system, but I never really 
understood what's unsynchronous

about doing the same thing to a local ZFS.

And yes, there is certainly a network latency component to the NFS 
configuration, so for any
synchronous operation, I would expect things to be slower when done over 
NFS.


Awaiting enlightment

:^)

/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
Ok, so I'm planning on wiping my test pool that seems to have problems 
with non-spare disks being marked as spares, but I can't destroy it:

# zpool destroy -f zmir
cannot iterate filesystems: I/O error

Anyone know how I can nuke this for good?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
BTW, I'm also unable to export the pool -- same error.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
Nevermind:

# zfs destroy [EMAIL PROTECTED]:28
cannot open '[EMAIL PROTECTED]:28': I/O error

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
 You are likely hitting:
 
 6397052 unmounting datasets should process
 /etc/mnttab instead of traverse DSL
 
 Which was fixed in build 46 of Nevada.  In the
 meantime, you can remove
 /etc/zfs/zpool.cache manually and reboot, which will
 remove all your
 pools (which you can then re-import on an individual
 basis).

I'm running b51, but I'll try deleting the cache.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
This worked. 

I've restarted my testing but I've been fdisking each drive before I 
add it to the pool, and so far the system is behaving as expected
when I spin a drive down, i.e., the hot spare gets automatically used. 
This makes me wonder if it's possible to ensure that the forced
addition of a drive to a pool wipes the pool of any previous data, 
especially any zfs metadata.

I'll keep the list posted as I continue my tests.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs exported a live filesystem

2006-12-11 Thread Jim Hranicky
By mistake, I just exported my test filesystem while it was up
and being served via NFS, causing my tar over NFS to start
throwing stale file handle errors. 

Should I file this as a bug, or should I just not do that :-

Ko,
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs exported a live filesystem

2006-12-12 Thread Jim Hranicky
For the record, this happened with a new filesystem. I didn't
muck about with an old filesystem while it was still mounted, 
I created a new one, mounted it and then accidentally exported
it.

  Except that it doesn't:
  
  # mount /dev/dsk/c1t1d0s0 /mnt
  # share /mnt
  # umount /mnt
  umount: /mnt busy
  # unshare /mnt
  # umount /mnt
 
 If you umount -f it will though!

Well, sure, but I was still surprised that it happened anyway.

 The system is working as designed, the NFS client did
 what it was  supposed to do.  If you brought the pool back in
 again with zpool import  things should have picked up where they left off.

Yep -- an import/shareall made the FS available again.

 Whats more you we probably running as root when you
 did that so you got  what you asked for - there is only so much protection
 we can give  without being annoying!  

Sure, but there are still safeguards in place even when running things
as root, such as requiring umount -f as above, or warning you
when running format on a disk with mounted partitions.

Since this appeared to be an operation that may warrant such a
safeguard I thought I'd check and see if this was to be expected or
if a safeguard should be put in.

Annoying isn't always bad :-

 Now having said that I personally wouldn't have
 expected that zpool  export should have worked as easily as that while
 there where shared  filesystems.  I would have expected that exporting
 the pool should have attempted to unmount all the ZFS filesystems first -
 which would have  failed without a -f flag because they were shared.
 
 So IMO it is a bug or at least an RFE.

Ok, where should I file an RFE?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Jim Hranicky
For my latest test I set up a stripe of two mirrors with one hot spare
like so:

zpool create -f -m /export/zmir zmir mirror c0t0d0 c3t2d0 mirror c3t3d0 c3t4d0 
spare c3t1d0

I spun down c3t2d0 and c3t4d0 simultaneously, and while the system kept 
running (my tar over NFS barely hiccuped), the zpool command hung again.

I rebooted the machine with -dnq, and although the system didn't come up
the first time, it did after a fsck and a second reboot. 

However, once again the hot spare isn't getting used:

# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
  the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Dec 12 09:15:49 2006
config:

  NAMESTATE READ WRITE CKSUM
  zmirDEGRADED 0 0 0
mirrorDEGRADED 0 0 0
  c0t0d0  ONLINE   0 0 0
  c3t2d0  UNAVAIL  0 0 0  cannot open
mirrorDEGRADED 0 0 0
  c3t3d0  ONLINE   0 0 0
  c3t4d0  UNAVAIL  0 0 0  cannot open
  spares
c3t1d0AVAIL

A few questions:

- I know I can attach it via the zpool commands, but is there a way to
kickstart the attachment process if it fails to attach automatically upon
disk failure?

- In this instance the spare is twice as big as the other
drives -- does that make a difference? 

- Is there something inherent to an old SCSI bus that causes spun-
down drives to hang the system in some way, even if it's just hanging
the zpool/zfs system calls? Would a thumper be more resilient to this?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-26 Thread Jim Dunham

Jason J. W. Williams wrote:
Could the replication engine eventually be integrated more tightly 
with ZFS?
Not it in the present form. The architecture and implementation of 
Availability Suite is driven off block-based replication at the device 
level (/dev/rdsk/...), something that allows the product to replicate 
any Solaris file system, database, etc., without any knowledge of what 
it is actually replicating.


To pursue ZFS replication in the manner of Availability Suite, one needs 
to see what replication looks like from an abstract point of view. So 
simplistically, remote replication is like the letter 'h', where the 
left side of the letter is the complete I/O path on the primary node, 
the horizontal part of the letter is the remote replication network 
link, and the right side of the letter is only the bottom half of the 
complete I/O path on the secondary node.


Next ZFS would have to have its functional I/O path split into two 
halves, a top and bottom piece.  Next we configure replication, the 
letter 'h', between two given nodes, running both a top and bottom piece 
of ZFS on the source node, and just the bottom half of ZFS on the 
secondary node.


Today, the SNDR component of Availability Suite works like the letter 
'h' today, where we split the Solaris I/O stack into a top and bottom 
half. The top half is that software (file system, database or 
application I/O) that directs its I/Os to the bottom half (raw device, 
volume manager or block device).


So all that needs to be done is to design and build a new variant of the 
letter 'h', and find the place to separate ZFS into two pieces.


- Jim Dunham



That would be slick alternative to send/recv.

Best Regards,
Jason

On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

Project Overview:

I propose the creation of a project on opensolaris.org, to bring to 
the community two Solaris host-based data services; namely volume 
snapshot and volume replication. These two data services exist today 
as the Sun StorageTek Availability Suite, a Solaris 8, 9  10, 
unbundled product set, consisting of Instant Image (II) and Network 
Data Replicator (SNDR).


Project Description:

Although Availability Suite is typically known as just two data 
services (II  SNDR), there is an underlying Solaris I/O filter 
driver framework which supports these two data services. This 
framework provides the means to stack one or more block-based, pseudo 
device drivers on to any pre-provisioned cb_ops structure, [ 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs 
], thereby shunting all cb_ops I/O into the top of a developed filter 
driver, (for driver specific processing), then out the bottom of this 
filter driver, back into the original cb_ops entry points.


Availability Suite was developed to interpose itself on the I/O stack 
of a block device, providing a filter driver framework with the means 
to intercept any I/O originating from an upstream file system, 
database or application layer I/O. This framework provided the means 
for Availability Suite to support snapshot and remote replication 
data services for UFS, QFS, VxFS, and more recently the ZFS file 
system, plus various databases like Oracle, Sybase and PostgreSQL, 
and also application I/Os. By providing a filter driver at this point 
in the Solaris I/O stack, it allows for any number of data services 
to be implemented, without regard to the underlying block storage 
that they will be configured on. Today, as a snapshot and/or 
replication solution, the framework allows both the source and 
destination block storage device to not only differ in physical 
characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical 
characteristics such as in RAID type, volume managed storage (i.e., 
SVM, VxVM), lofi, zvols, even ram disks.


Community Involvement:

By providing this filter-driver framework, two working filter drivers 
(II  SNDR), and an extensive collection of supporting software and 
utilities, it is envisioned that those individuals and companies that 
adopt OpenSolaris as a viable storage platform, will also utilize and 
enhance the existing II  SNDR data services, plus have offered to 
them the means in which to develop their own block-based filter 
driver(s), further enhancing the use and adoption on OpenSolaris.


A very timely example that is very applicable to Availability Suite 
and the OpenSolaris community, is the recent announcement of the 
Project Proposal: lofi [ compression  encryption ] - 
http://www.opensolaris.org/jive/click.jspamessageID=26841. By 
leveraging both the Availability Suite and the lofi OpenSolaris 
projects, it would be highly probable to not only offer compression  
encryption to lofi devices (as already proposed), but by collectively 
leveraging these two project, creating the means to support file 
systems, databases and applications, across all block-based storage 
devices.


Since Availability

[zfs-discuss] Re: ZFS panics system during boot, after 11/06 upgrade

2007-01-29 Thread Jim Walker
 There are ZFS file systems.  There are no zones.
 
 Any help would be greatly appreciated, this is my
 everyday computer.
 
Take a look at page 167 of the admin guide:
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

You need to delete /etc/zfs/zpool.cache. And, use 
zpool import to recover.

Cheers,
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jim Dunham

Jason,

Thank you for the detailed explanation. It is very helpful to
understand the issue. Is anyone successfully using SNDR with ZFS yet?
Of the opportunities I've been involved with the answer is yes, but so 
far I've not seen SNDR with  ZFS in a production environment, but that 
does not mean they don't exists. It was not until late June '06, that 
AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS 
has not been made available for the Solaris Express, Community Release, 
but it will be real soon.


While I have your attention, there are two issues between ZFS and AVS 
that needs mentioning.


1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS 
detect this, enabling SCSI write-caching on the LUN, and also opens the 
LUN with exclusive access, preventing other data services (like AVS) 
from accessing this device. The work-around is to manually format the 
LUN, typically placing all the blocks into a single partition, then just 
place this partition into the ZFS storage pool. ZFS detect this, not 
owning the entire LUN, and doesn't enable write-caching, which means it 
also doesn't open the LUN with exclusive access, and therefore AVS and 
ZFS can share the same LUN.


I thought about submitting an RFE to have ZFS provide a means to 
override this restriction, but I am not 100% certain that a ZFS 
filesystem directly accessing a write-cached enabled LUN is the same 
thing as a replicated ZFS filesystem accessing a write-cached enabled 
LUN. Even though AVS is write-order consistent, there are disaster 
recovery scenarios, when enacted, where block-order, verses write-order 
I/Os are issued.


2). One has to be very cautious in using zpool import -f   (forced 
import), especially on a LUN or LUNs in which SNDR is actively 
replicating into. If ZFS complains that the storage pool was not cleanly 
exported when issuing a zpool import ..., and one attempts a zpool 
import -f , without checking the active replication state, they are 
sure to panic Solaris. Of  course this failure scenario is no different 
then accessing a LUN or LUNs on dual-ported, or SAN based storage when 
another Solaris host is still accessing the ZFS filesystem, or 
controller based replication, as they are all just different operational 
scenarios of the same issue, data blocks changing out from underneath 
the ZFS filesystem, and its CRC checking mechanisms.


Jim



Best Regards,
Jason


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Read Only Zpool: ZFS and Replication

2007-02-05 Thread Jim Dunham

Ben,
I've been playing with replication of a ZFS Zpool using the recently released AVS.  I'm pleased with things, but just replicating the data is only part of the problem.  The big question is: can I have a zpool open in 2 places?  
  


No. The ability to have a zpool open in two place would required shared 
ZFS. The semantics of remote replication can be viewed to that of two 
Solaris hosts looking at the same SAN or dual-ported storage. Today, ZFS 
detects this with both SNDR and shared storage, as part of zpool 
import, warning that the pool is active elsewhere.



What I really want is a Zpool on node1 open and writable (production storage) 
and a replicated to node2 where its open for read-only access (standby storage).
  


The best you can do for this to use the II portion of Availability Suite 
to take a snapshot of the active SNDR replica on the remote node, 
getting a snapshot of the ZFS filesystem being replicated. In this case, 
ZFS on the remote node will see and detect replicated disk blocks 
changing in the zpool it is reading from.



This is an old problem.  I'm not sure its remotely possible.  Its bad enough 
with UFS, but ZFS maintains a hell of a lot more meta-data.  How is node2 
supposed to know that a snapshot has been created for instance.  With UFS you 
can at least get by some of these problems using directio, but thats not an 
option with a zpool.

I know this is a fairly remedial issue to bring up... but if I think about what I want Thumper-to-Thumper replication to look like, I want 2 usable storage systems.  As I see it now the secondary storage (node2) is useless untill you break replication and import the pool, do your thing, and then re-sync storage to re-enable replication.  


Am I missing something?  I'm hoping there is an option I'm not aware of.
  


No. Also just to be clear, after you  ... do your thing, and then 
re-sync storage ...  the re-sync is keep all of the data on the SNDR 
primary OR keep all the data on the SNDR secondary.There is no means to 
combine writes that occurred in two separate ZFS filesystems, back into 
one filesystem. The remote ZFS filesystem is essentially a clone of the 
original filesystem, and once a write I/O occurs to either side, the two 
filesystems take on a life of their own.


Of course this is not unique to the ZFS filesystem, as the same is true 
for all others, and this underlying storage behavior is not unique to 
SNDR as it happens with other host-based replication and 
controller-based replication.


Jim


benr.
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-02-05 Thread Jim Dunham

Frank,

On Fri, 2 Feb 2007, Torrey McMahon wrote:


Jason J. W. Williams wrote:

Hi Jim,

Thank you very much for the heads up. Unfortunately, we need the
write-cache enabled for the application I was thinking of combining
this with. Sounds like SNDR and ZFS need some more soak time together
before you can use both to their full potential together?


Well...there is the fact that SNDR works with other FS other then 
ZFS. (Yes, I know this is the ZFS list.) Working around architectural 
issues for ZFS and ZFS alone might cause issues for others.


SNDR has some issues with logging UFS as well. If you start a SNDR 
live copy on an active logging UFS (not _writelocked_), the UFS log 
state may not be copied consistently.


Treading very carefully, UFS logging may have issues with being 
replicated, not the other way around. SNDR replication (after 
synchronizing) maintains a write-order consistent volume, thus if there 
is an issue with UFS logging being able to access an SNDR secondary, 
then UFS logging will also have issues with accessing a volume after 
Solaris crashes. The end result of Solaris crashing, or SNDR replication 
stopping, is a write-ordered, crash-consistent volume.


Given that both UFS logging and SNDR are (near) perfect (or there would 
be a flood of escalations), this issue in all cases I've seen to date, 
is that the SNDR primary volume being replicated is mounted with UFS 
logging enable, but the SNDR secondary is not mounted with UFS logging 
enabled. Once this condition happens, the problem can be resolved by 
fixing /etc/vfstab to correct the inconsistent mount options, and then 
performing an SNDR update sync.




If you want a live remote replication facility, it _NEEDS_ to talk to 
the filesystem somehow. There must be a callback mechanism that the 
filesystem could use to tell the replicator and from exactly now on 
you start replicating. The only entity which can truly give this 
signal is the filesystem itself.


There is an RFE against SNDR for something called in-line PIT. I hope 
that this work will get done soon.




And no, that _not_ when the filesystem does a flush write cache 
ioctl. Or when the user has just issued a sync command or similar.
For ZFS, it'd be when a ZIL transaction is closed (as I understand 
it), for UFS it'd be when the UFS log is fully rolled. There's no 
notification to external entities when these two events happen.


Because ZFS is always on-disk consistent, this is not an issue. So far 
in ALL my testing with replicating ZFS with SNDR, I have not seen ZFS fail!


Of course be careful to not confuse my stated position with another 
closely related scenario. That being accessing ZFS on the remote node 
via a forced import zpool import -f name, with  active SNDR 
replication, as ZFS is sure to panic the system. ZFS, unlike other 
filesystems has 0% tolerance to corrupted metadata.


Jim


SNDR tries its best to achieve this detection, but without actually 
_stopping_ all I/O (on UFS: writelocking), there's a window of 
vulnerability still open.
And SNDR/II don't stop filesystem I/O - by basic principle. That's how 
they're sold/advertised/intended to be used.


I'm all willing to see SNDR/II go open - we could finally work these 
issues !


FrankH.



I think the best of both worlds approach would be to let SNDR plug-in 
to ZFS along the same lines the crypto stuff will be able to plug in, 
different compression types, etc. There once was a slide that showed 
how that workedor I'm hallucinating again.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Read Only Zpool: ZFS and Replication

2007-02-05 Thread Jim Dunham

Robert,

Hello Ben,

Monday, February 5, 2007, 9:17:01 AM, you wrote:

BR I've been playing with replication of a ZFS Zpool using the
BR recently released AVS.  I'm pleased with things, but just
BR replicating the data is only part of the problem.  The big
BR question is: can I have a zpool open in 2 places?  


BR What I really want is a Zpool on node1 open and writable
BR (production storage) and a replicated to node2 where its open for
BR read-only access (standby storage).

BR This is an old problem.  I'm not sure its remotely possible.  Its
BR bad enough with UFS, but ZFS maintains a hell of a lot more
BR meta-data.  How is node2 supposed to know that a snapshot has been
BR created for instance.  With UFS you can at least get by some of
BR these problems using directio, but thats not an option with a zpool.

BR I know this is a fairly remedial issue to bring up... but if I
BR think about what I want Thumper-to-Thumper replication to look
BR like, I want 2 usable storage systems.  As I see it now the
BR secondary storage (node2) is useless untill you break replication
BR and import the pool, do your thing, and then re-sync storage to re-enable 
replication.

BR Am I missing something?  I'm hoping there is an option I'm not aware of.


You can't mount rw on one node and ro on another (not to mention that
zfs doesn't offer you to import RO pools right now). You can mount the
same file system like UFS in RO on both nodes but not ZFS (no ro import).
  
One can not just mount a filesystem in RO mode if SNDR or any other 
host-based or controller-based replication is underneath. For all 
filesystems that I know of,  expect of course shared-reader QFS, this 
will fail given time.


Even if one has the means to mount a filesystem with DIRECTIO 
(no-caching), READ-ONLY (no-writes), it does not prevent a filesystem 
from looking at the contents of block A and then acting on block B. 
The reason being is that during replication at time T1 both blocks A  
B could be written and be consistent with each other. Next the file 
system reads block A. Now replication at time T2 updates blocks A  
B, also consistent with each other. Next the file system reads block 
B and panics due to an inconsistency only it sees between old A and 
new B. I know this for a fact, since a forced zpool import -f 
name, is a common instance of this exact failure, due most likely 
checksum failures between metadata blocks A  B.


Of course using an instantly accessible II snapshot of an SNDR secondary 
volume would work just fine, since the data being read is now 
point-in-time consistent, and static.


- Jim


I belive what you really need is 'zfs send continuos' feature.
We are developing something like this right now.
I expect we can give more details really soon now.


  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Read Only Zpool: ZFS and Replication

2007-02-05 Thread Jim Dunham

Ben Rockwood wrote:

Jim Dunham wrote:

Robert,

Hello Ben,

Monday, February 5, 2007, 9:17:01 AM, you wrote:

BR I've been playing with replication of a ZFS Zpool using the
BR recently released AVS.  I'm pleased with things, but just
BR replicating the data is only part of the problem.  The big
BR question is: can I have a zpool open in 2 places? BR What I 
really want is a Zpool on node1 open and writable

BR (production storage) and a replicated to node2 where its open for
BR read-only access (standby storage).

BR This is an old problem.  I'm not sure its remotely possible.  Its
BR bad enough with UFS, but ZFS maintains a hell of a lot more
BR meta-data.  How is node2 supposed to know that a snapshot has been
BR created for instance.  With UFS you can at least get by some of
BR these problems using directio, but thats not an option with a 
zpool.


BR I know this is a fairly remedial issue to bring up... but if I
BR think about what I want Thumper-to-Thumper replication to look
BR like, I want 2 usable storage systems.  As I see it now the
BR secondary storage (node2) is useless untill you break replication
BR and import the pool, do your thing, and then re-sync storage to 
re-enable replication.


BR Am I missing something?  I'm hoping there is an option I'm not 
aware of.



You can't mount rw on one node and ro on another (not to mention that
zfs doesn't offer you to import RO pools right now). You can mount the
same file system like UFS in RO on both nodes but not ZFS (no ro 
import).
  
One can not just mount a filesystem in RO mode if SNDR or any other 
host-based or controller-based replication is underneath. For all 
filesystems that I know of,  expect of course shared-reader QFS, this 
will fail given time.


Even if one has the means to mount a filesystem with DIRECTIO 
(no-caching), READ-ONLY (no-writes), it does not prevent a filesystem 
from looking at the contents of block A and then acting on block 
B. The reason being is that during replication at time T1 both 
blocks A  B could be written and be consistent with each other. 
Next the file system reads block A. Now replication at time T2 
updates blocks A  B, also consistent with each other. Next the 
file system reads block B and panics due to an inconsistency only 
it sees between old A and new B. I know this for a fact, since a 
forced zpool import -f name, is a common instance of this exact 
failure, due most likely checksum failures between metadata blocks 
A  B.


Ya, that bit me last night.  'zpool import' shows the pool fine, but 
when you force the import you panic:


Feb  5 07:14:10 uma ^Mpanic[cpu0]/thread=fe8001072c80: Feb  5 
07:14:10 uma genunix: [ID 809409 kern.notice] ZFS: I/O failure (write 
on unknown off 0: zio fe80c54ed380 [L0 unallocated] 400L/200P 
DVA[0]=0:36000:200 DVA[1]=0:9c0003800:200 
DVA[2]=0:20004e00:200 fletcher4 lzjb LE contiguous birth=57416 
fill=0 cksum=de2e56ffd:5591b77b74b:1101a91d58dfc:252efdf22532d0): error 5
Feb  5 07:14:11 uma unix: [ID 10 kern.notice] Feb  5 07:14:11 uma 
genunix: [ID 655072 kern.notice] fe8001072a40 zfs:zio_done+140 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072a60 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072ab0 
zfs:zio_wait_for_children+5d ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072ad0 
zfs:zio_wait_children_done+20 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072af0 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072b40 
zfs:zio_vdev_io_assess+129 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072b60 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072bb0 
zfs:vdev_mirror_io_done+2af ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072bd0 
zfs:zio_vdev_io_done+26 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072c60 
genunix:taskq_thread+1a7 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fe8001072c70 
unix:thread_start+8 ()

Feb  5 07:14:11 uma unix: [ID 10 kern.notice]

So without using II, whats the best method of bring up the secondary 
storage?  Is just dropping the primary into logging acceptable?

Yes, placing SNDR in logging mode stops the replication of writes.

Also performing a zpool export on the primary node, and waiting 
(sndradm -w) until all writes are replicated, means that on the SNDR 
secondary node, a zpool import can be done without using the -f, as a 
forced imported is not need, since the zpool export operation got 
replicated.


Be sure to remember to zpool export on the remote node, before 
resuming replication on the primary node, or another panic will likely 
occur.


Jim


benr.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] FROSUG February Meeting Announcement (2/22/2007)

2007-02-08 Thread Jim Walker
This month's FROSUG (Front Range OpenSolaris User Group) meeting is on
Thursday, February 22, 2007.  Our presentation is ZFS as a Root File
System by Lori Alt. In addition, Jon Bowman will be giving an OpenSolaris
Update, and we will also be doing an InstallFest. So, if you want help
installing an OpenSolaris distribution, backup your laptop and bring it
to the meeting!

About the presentation(s):
One of the next steps in the evolution of ZFS is to enable
its use as a root file system.  This presentation will focus
on how booting from ZFS will work, how installation
will be affected by ZFS's feature set, and the many advantages
that will result from being able to use ZFS as a root file system.

The presentation(s)s will be posted here prior to the meeting:
http://www.opensolaris.org/os/community/os_user_groups/frosug/

About our presenter(s):
Lori Alt is a Staff Engineer at Sun Microsystems, where
she has worked since 1991.  Lori worked on Solaris install
and upgrade and then on UFS, where she led the multi-terabyte
UFS project.  She has Bachelor's and Master's degrees in
computer science from Washington University in St. Louis, MO.

-

Meeting Details:

When: Thursday, February 22, 2007
Times: 6:00pm - 6:30pm Doors open and Pizza
   6:30pm - 6:45pm OpenSolaris Update (Jon Bowman)
   6:45pm - 8:30pm ZFS as a Root File System (Lori Alt)
Where: Sun Broomfield Campus
   Building 1 - Conference Center
   500 Eldorado Blvd.
   Broomfield, CO 80021

Note:  The location of this meeting may change. We will send out an
   additional email prior to the meeting if this happens.

Pizza and soft drinks will be served at the beginning of the meeting.
Please RSVP to frosug-rsvp(AT)opensolaris(DOT)org in order to help us
plan for food and setup access to the Sun campus.

We hope to see you there!
Thanks,
FROSUG

+++

Future Meeting Plans:
March 29, 2007: Doug McCallum presents sharemgr
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] UPDATE: FROSUG February Meeting (2/22/2007)

2007-02-15 Thread Jim Walker
***Meeting Update***
We will be having this month's meeting at the Omni Interlocken Resort
in Broomfield and a conference call number is being provided for those
who can not make the meeting in person, see Meeting Details below for
more information.

In addition, we will be discussing Solaris Express Developer Edition
during the OpenSolaris Update and providing free SXDE DVDs.

Hope to see you there. This month's meeting is getting a lot of interest!
***Meeting Update***

This month's FROSUG (Front Range OpenSolaris User Group) meeting is on
Thursday, February 22, 2007.  Our presentation is ZFS as a Root File
System by Lori Alt. In addition, Jon Bowman will be giving an OpenSolaris
Update, and we will also be doing an InstallFest. So, if you want help
installing Solaris Express Developer Edition, backup your laptop and bring
it to the meeting!

About the presentation:
One of the next steps in the evolution of ZFS is to enable
its use as a root file system.  This presentation will focus
on how booting from ZFS will work, how installation
will be affected by ZFS's feature set, and the many advantages
that will result from being able to use ZFS as a root file system.

The presentation will be posted here prior to the meeting:
http://www.opensolaris.org/os/community/os_user_groups/frosug/

About our presenter:
Lori Alt is a Staff Engineer at Sun Microsystems, where
she has worked since 1991.  Lori worked on Solaris install
and upgrade and then on UFS, where she led the multi-terabyte
UFS project.  She has Bachelor's and Master's degrees in
computer science from Washington University in St. Louis, MO.

-

Meeting Details

When: Thursday, February 22, 2007
Times: 6:00pm - 6:30pm Food and Drinks
   6:30pm - 6:45pm OpenSolaris Update (Jon Bowman)
   6:45pm - 8:30pm ZFS as a Root File System (Lori Alt)
Where: Omni Interlocken Resort (Fir Conference Room)
   500 Interlocken Blvd.
   Broomfield, CO 80021

Conference Call Information

US:   866-545-5198
INTL: 865-521-8904
Access Code: 5518835

-

The meeting is free and open to the public.

Snacks and soft drinks will be served at the beginning of the meeting.
Please RSVP to frosug-rsvp(AT)opensolaris(DOT)org in order to help us
plan for food.

We hope to see you there!
Thanks,
FROSUG

-

Future Meeting Plans:
March 29, 2007: Doug McCallum presents sharemgr

If you have ideas for meeting topics, send them to:
ug-frosug(AT)opensolaris(DOT)org
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-02-27 Thread Jim Mauro


You don't honestly, really, reasonably, expect someone, anyone, to look 
at the stack
trace of a few  hundred threads, and post something along the lines of 
This is what is
wrong with your NFS server.Do you? Without any other information at 
all?


We're here to help, but please reset your expectations around our 
abilities to

root-cause pathological behavior based an almost no information.

What size and type of server?
What size and type of storage?
What release of Solaris?
What how may networks, and what type?
What is being used to generate the load for the testing?
What is the zpool configuration?
What do the system stats look like while under load (e.g. mpstat), and how
to they change when you see this behavior?
What does zpool iostat zpool_name 1 data look like while under load?
Are you collecting nfsstat data - what is the rate of incoming NFS ops?
Can you characterize the load - read/write data intensive, metadata 
intensive?


Are the client machines Solaris, or something else?

Does this last for seconds, minutes, tens-of-minutes? Does the system 
remain in this

state indefinitely until reboot, or does it normalize?

Can you consistently reproduce this problem?

/jim


Leon Koll wrote:

Hello, gurus
I need your help. During the benchmark test of NFS-shared ZFS file systems at 
some moment the number of NFS threads jumps to the maximal value, 1027 
(NFSD_SERVERS was set to 1024). The latency also grows and the number of IOPS 
is going down.
I've collected the output of
echo ::pgrep nfsd | ::walk thread | ::findstack -v | mdb -k
that can be seen here:
http://tinyurl.com/yrvn4z

Could you please look at it and tell me what's wrong with my NFS server.
Appreciate,
-- Leon
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread Jim Mauro


FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:


 arc::print -tad
{
. . .
   c02e29e8 uint64_t size = 0t10527883264
   c02e29f0 uint64_t p = 0t16381819904
   c02e29f8 uint64_t c = 0t1070318720
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t1070318720
. . .

Perhaps c_max does not do what I think it does?

Thanks,
/jim


Jim Mauro wrote:

Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
(update 3). All file IO is mmap(file), read memory segment, unmap, close.

Tweaked the arc size down via mdb to 1GB. I used that value because
c_min was also 1GB, and I was not sure if c_max could be larger than
c_minAnyway, I set c_max to 1GB.

After a workload run:
 arc::print -tad
{
. . .
  c02e29e8 uint64_t size = 0t3099832832
  c02e29f0 uint64_t p = 0t16540761088
  c02e29f8 uint64_t c = 0t1070318720
  c02e2a00 uint64_t c_min = 0t1070318720
  c02e2a08 uint64_t c_max = 0t1070318720
. . .

size is at 3GB, with c_max at 1GB.

What gives? I'm looking at the code now, but was under the impression
c_max would limit ARC growth. Granted, it's not a factor of 10, and
it's certainly much better than the out-of-the-box growth to 24GB
(this is a 32GB x4500), so clearly ARC growth is being limited, but it
still grew to 3X c_max.

Thanks,
/jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread Jim Mauro




How/when did you configure arc_c_max?  

Immediately following a reboot, I set arc.c_max using mdb,
then verified reading the arc structure again.

arc.p is supposed to be
initialized to half of arc.c.  Also, I assume that there's a reliable
test case for reproducing this problem?
  

Yep. I'm using a x4500 in-house to sort out performance of a customer test
case that uses mmap. We acquired the new DIMMs to bring the
x4500 to 32GB, since the workload has a 64GB working set size,
and we were clobbering a 16GB thumper. We wanted to see how doubling
memory may help.

I'm trying clamp the ARC size because for mmap-intensive workloads,
it seems to hurt more than help (although, based on experiments up to this
point, it's not hurting a lot).

I'll do another reboot, and run it all down for you serially...

/jim


Thanks,

-j

On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
  


ARC_mru::print -d size lsize
  

size = 0t10224433152
lsize = 0t10218960896


ARC_mfu::print -d size lsize
  

size = 0t303450112
lsize = 0t289998848


ARC_anon::print -d size
  

size = 0

So it looks like the MRU is running at 10GB...


What does this tell us?

Thanks,
/jim



[EMAIL PROTECTED] wrote:


This seems a bit strange.  What's the workload, and also, what's the
output for:

 
  

ARC_mru::print size lsize
ARC_mfu::print size lsize
   


and
 
  

ARC_anon::print size
   


For obvious reasons, the ARC can't evict buffers that are in use.
Buffers that are available to be evicted should be on the mru or mfu
list, so this output should be instructive.

-j

On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:
 
  

FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:


   


arc::print -tad
 
  

{
. . .
  c02e29e8 uint64_t size = 0t10527883264
  c02e29f0 uint64_t p = 0t16381819904
  c02e29f8 uint64_t c = 0t1070318720
  c02e2a00 uint64_t c_min = 0t1070318720
  c02e2a08 uint64_t c_max = 0t1070318720
. . .

Perhaps c_max does not do what I think it does?

Thanks,
/jim


Jim Mauro wrote:
   


Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
(update 3). All file IO is mmap(file), read memory segment, unmap, close.

Tweaked the arc size down via mdb to 1GB. I used that value because
c_min was also 1GB, and I was not sure if c_max could be larger than
c_minAnyway, I set c_max to 1GB.

After a workload run:
 
  

arc::print -tad
   


{
. . .
c02e29e8 uint64_t size = 0t3099832832
c02e29f0 uint64_t p = 0t16540761088
c02e29f8 uint64_t c = 0t1070318720
c02e2a00 uint64_t c_min = 0t1070318720
c02e2a08 uint64_t c_max = 0t1070318720
. . .

size is at 3GB, with c_max at 1GB.

What gives? I'm looking at the code now, but was under the impression
c_max would limit ARC growth. Granted, it's not a factor of 10, and
it's certainly much better than the out-of-the-box growth to 24GB
(this is a 32GB x4500), so clearly ARC growth is being limited, but it
still grew to 3X c_max.

Thanks,
/jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread Jim Mauro


Following a reboot:
 arc::print -tad
{
. . .
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t16588228608
   c02e29f8 uint64_t c = 0t33176457216
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t33176457216
. . .
}  
 c02e2a08 /Z 0x2000 --- set 
c_max to 512MB

arc+0x48:   0x7b9789000 =   0x2000
 arc::print -tad
{
. . .
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t16588228608
   c02e29f8 uint64_t c = 0t33176457216
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t536870912  - c_max is 512MB
. . .
}  
 ARC_mru::print -d size lsize

size = 0t294912
lsize = 0t32768


Run the workload a couple times...

   c02e29e8 uint64_t size = 0t27121205248 --- ARC size is 27GB
   c02e29f0 uint64_t p = 0t10551351442
   c02e29f8 uint64_t c = 0t27121332576
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t536870912 - c_max is 512MB

 ARC_mru::print -d size lsize
size = 0t223985664
lsize = 0t221839360
 ARC_mfu::print -d size lsize
size = 0t26897219584  -- MFU list is almost 27GB ...
lsize = 0t26869121024

Thanks,
/jim




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread Jim Mauro

Will try that now...

/jim


[EMAIL PROTECTED] wrote:

I suppose I should have been more forward about making my last point.
If the arc_c_max isn't set in /etc/system, I don't believe that the ARC
will initialize arc.p to the correct value.   I could be wrong about
this; however, next time you set c_max, set c to the same value as c_max
and set p to half of c.  Let me know if this addresses the problem or
not.

-j

  
How/when did you configure arc_c_max?  
  

Immediately following a reboot, I set arc.c_max using mdb,
then verified reading the arc structure again.


arc.p is supposed to be
initialized to half of arc.c.  Also, I assume that there's a reliable
test case for reproducing this problem?
 
  

Yep. I'm using a x4500 in-house to sort out performance of a customer test
case that uses mmap. We acquired the new DIMMs to bring the
x4500 to 32GB, since the workload has a 64GB working set size,
and we were clobbering a 16GB thumper. We wanted to see how doubling
memory may help.

I'm trying clamp the ARC size because for mmap-intensive workloads,
it seems to hurt more than help (although, based on experiments up to this
point, it's not hurting a lot).

I'll do another reboot, and run it all down for you serially...

/jim



Thanks,

-j

On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
 
  
   


ARC_mru::print -d size lsize
 
  

size = 0t10224433152
lsize = 0t10218960896
   


ARC_mfu::print -d size lsize
 
  

size = 0t303450112
lsize = 0t289998848
   


ARC_anon::print -d size
 
  

size = 0
   
So it looks like the MRU is running at 10GB...


What does this tell us?

Thanks,
/jim



[EMAIL PROTECTED] wrote:
   


This seems a bit strange.  What's the workload, and also, what's the
output for:


 
  

ARC_mru::print size lsize
ARC_mfu::print size lsize
  
   


and

 
  

ARC_anon::print size
  
   


For obvious reasons, the ARC can't evict buffers that are in use.
Buffers that are available to be evicted should be on the mru or mfu
list, so this output should be instructive.

-j

On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:

 
  

FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:


  
   


arc::print -tad

 
  

{
. . .
 c02e29e8 uint64_t size = 0t10527883264
 c02e29f0 uint64_t p = 0t16381819904
 c02e29f8 uint64_t c = 0t1070318720
 c02e2a00 uint64_t c_min = 0t1070318720
 c02e2a08 uint64_t c_max = 0t1070318720
. . .

Perhaps c_max does not do what I think it does?

Thanks,
/jim


Jim Mauro wrote:
  
   


Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
(update 3). All file IO is mmap(file), read memory segment, unmap, 
close.


Tweaked the arc size down via mdb to 1GB. I used that value because
c_min was also 1GB, and I was not sure if c_max could be larger than
c_minAnyway, I set c_max to 1GB.

After a workload run:

 
  

arc::print -tad
  
   


{
. . .
c02e29e8 uint64_t size = 0t3099832832
c02e29f0 uint64_t p = 0t16540761088
c02e29f8 uint64_t c = 0t1070318720
c02e2a00 uint64_t c_min = 0t1070318720
c02e2a08 uint64_t c_max = 0t1070318720
. . .

size is at 3GB, with c_max at 1GB.

What gives? I'm looking at the code now, but was under the impression
c_max would limit ARC growth. Granted, it's not a factor of 10, and
it's certainly much better than the out-of-the-box growth to 24GB
(this is a 32GB x4500), so clearly ARC growth is being limited, but it
still grew to 3X c_max.

Thanks,
/jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-15 Thread Jim Mauro

All righty...I set c_max to 512MB, c to 512MB, and p to 256MB...

 arc::print -tad
{
...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t16588228608
   c02e29f8 uint64_t c = 0t33176457216
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t33176457216
...
}
 c02e2a08 /Z 0x2000
arc+0x48:   0x7b9789000 =   0x2000
 c02e29f8 /Z 0x2000
arc+0x38:   0x7b9789000 =   0x2000
 c02e29f0 /Z 0x1000
arc+0x30:   0x3dcbc4800 =   0x1000
 arc::print -tad
{
...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t268435456  -- p is 
256MB
   c02e29f8 uint64_t c = 0t536870912  -- c is 
512MB

   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t536870912--- c_max is 512MB
...
}

After a few runs of the workload ...

 arc::print -d size
size = 0t536788992



Ah - looks like we're out of the woods. The ARC remains clamped at 512MB.

Thanks!
/jim


[EMAIL PROTECTED] wrote:

I suppose I should have been more forward about making my last point.
If the arc_c_max isn't set in /etc/system, I don't believe that the ARC
will initialize arc.p to the correct value.   I could be wrong about
this; however, next time you set c_max, set c to the same value as c_max
and set p to half of c.  Let me know if this addresses the problem or
not.

-j

  
How/when did you configure arc_c_max?  
  

Immediately following a reboot, I set arc.c_max using mdb,
then verified reading the arc structure again.


arc.p is supposed to be
initialized to half of arc.c.  Also, I assume that there's a reliable
test case for reproducing this problem?
 
  

Yep. I'm using a x4500 in-house to sort out performance of a customer test
case that uses mmap. We acquired the new DIMMs to bring the
x4500 to 32GB, since the workload has a 64GB working set size,
and we were clobbering a 16GB thumper. We wanted to see how doubling
memory may help.

I'm trying clamp the ARC size because for mmap-intensive workloads,
it seems to hurt more than help (although, based on experiments up to this
point, it's not hurting a lot).

I'll do another reboot, and run it all down for you serially...

/jim



Thanks,

-j

On Thu, Mar 15, 2007 at 06:57:12PM -0400, Jim Mauro wrote:
 
  
   


ARC_mru::print -d size lsize
 
  

size = 0t10224433152
lsize = 0t10218960896
   


ARC_mfu::print -d size lsize
 
  

size = 0t303450112
lsize = 0t289998848
   


ARC_anon::print -d size
 
  

size = 0
   
So it looks like the MRU is running at 10GB...


What does this tell us?

Thanks,
/jim



[EMAIL PROTECTED] wrote:
   


This seems a bit strange.  What's the workload, and also, what's the
output for:


 
  

ARC_mru::print size lsize
ARC_mfu::print size lsize
  
   


and

 
  

ARC_anon::print size
  
   


For obvious reasons, the ARC can't evict buffers that are in use.
Buffers that are available to be evicted should be on the mru or mfu
list, so this output should be instructive.

-j

On Thu, Mar 15, 2007 at 02:08:37PM -0400, Jim Mauro wrote:

 
  

FYI - After a few more runs, ARC size hit 10GB, which is now 10X c_max:


  
   


arc::print -tad

 
  

{
. . .
 c02e29e8 uint64_t size = 0t10527883264
 c02e29f0 uint64_t p = 0t16381819904
 c02e29f8 uint64_t c = 0t1070318720
 c02e2a00 uint64_t c_min = 0t1070318720
 c02e2a08 uint64_t c_max = 0t1070318720
. . .

Perhaps c_max does not do what I think it does?

Thanks,
/jim


Jim Mauro wrote:
  
   


Running an mmap-intensive workload on ZFS on a X4500, Solaris 10 11/06
(update 3). All file IO is mmap(file), read memory segment, unmap, 
close.


Tweaked the arc size down via mdb to 1GB. I used that value because
c_min was also 1GB, and I was not sure if c_max could be larger than
c_minAnyway, I set c_max to 1GB.

After a workload run:

 
  

arc::print -tad
  
   


{
. . .
c02e29e8 uint64_t size = 0t3099832832
c02e29f0 uint64_t p = 0t16540761088
c02e29f8 uint64_t c = 0t1070318720
c02e2a00 uint64_t c_min = 0t1070318720
c02e2a08 uint64_t c_max = 0t1070318720
. . .

size is at 3GB, with c_max at 1GB.

What gives? I'm looking at the code now, but was under the impression
c_max would limit ARC growth. Granted, it's not a factor of 10, and
it's certainly much better than the out-of-the-box growth to 24GB
(this is a 32GB x4500), so clearly ARC growth is being limited, but it
still grew to 3X c_max.

Thanks,
/jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http

Re: [zfs-discuss] Re: ZFS with raidz

2007-03-20 Thread Jim Mauro
(I'm probably not the best person to answer this, but that has never 
stopped me
before, and I need to give Richard Elling a little more time to get the 
Goats, Cows

and Horses fed, sip his morning coffee, and offer a proper response...)
Would it benefit us to have the disk be setup as a raidz along with the hardware raid 5 that is already setup too?  
Way back when, we called such configurations plaiding, which described 
a host-based RAID configuration
that criss-crossed hardware RAID LUNs. In doing such things, we had 
potentially better data availability
with a configuration that could survive more failure modes. 
Alternatively, we used the hardware RAID
for the availability configuration (hardware RAID 5), and used 
host-based RAID to stripe across hardware

RAID5 LUNs for performance. Seemed to work pretty well.

In theory, a raidz pool spread across some number of underlying hardware 
raid 5 LUNs would
offer protection against more failure mode, such as the loss of an 
entire raid5 LUN. So from
a failure protection/data availability point of view, it offers some 
benefit. Now, as to whether or not
you experience a real, measurable benefit over time is hard to say. Each 
additional level of protection/redundancy
has a diminishing return, often times at a dramatic incremental cost 
(e.g. getting from four nines to five nines).
Or with this double raid slow our performance with both a software and hardware raid setup?  
You will certainly pay a performance - using raidz across the raid5 luns 
will reduce deliverable IOPS
from the raid 5 luns. Whether or not the performance trade-off is worth 
the RAS gain varies based on

your RAS and data availability requirements.

Or would raidz setup be better than the hardware raid5 setup?
  
Assuming a robust raid5 implementation with battery-backed nvram 
(protect against the write hole and
partial stripe writes), I think a raidz zpool covers more of the 
datapath then a hardware raid 5 LUN, but

I'll wait for Richard to elaborate here (or tell me I'm wrong).


Also if we do set the disks as a raidz  would it benefit use more if we 
specified each disks in the raidz or create them as Luns then specify the setup 
in raidz.
  
Isn't' this the same question as the first question? I'm not sure what 
you're asking here...


The questions you're asking are good ones, and date back to the decades 
old struggle

around configuration tradeoffs for performance / availability / cost.

My knee-jerk reaction is that one level of RAID, like either hardware 
raid5 ZFS raidz is sufficient
for availability, and keeps things relatively simple (and simple also 
improves RAS). The advantage
host-based RAID has always had of hardware RAID is the ability to create 
software LUNs
(like a raidz1 or raidz2 zpool) across physical disk controllers, which 
may also cross SAN
switches, etc. So, twas me, I'd go with non-hardware RAID5 devices from 
the storage frame,

and create raidz1 or raidz2 zpools across controllers.

But, that's me...
:^)

/jim

 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] The value of validating your backups...

2007-03-20 Thread Jim Mauro



http://www.cnn.com/2007/US/03/20/lost.data.ap/index.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS with raidz

2007-03-20 Thread Jim Mauro

Hi Kory - Your problem came our way through other Sun folks a few days ago,
and I wish I had that magic setting to help, but the reality is that I'm 
not aware

of anything that will improve the time required to mount 12k file systems.

I would add (not that this helps) that I'm not convinced this problem is 
unique to

ZFS, but I do not have experience or empirical data on mount time for 12k
UFS, QFS, ext4, etc, file systems.

There is an RFE filed on this:
http://bugs.opensolaris.org/view_bug.do?bug_id=6478980

As I said, I wish I had a better answer.

Thanks,
/jim


Kory Wheatley wrote:
Currently we are trying to setup zfs as file systems for all our user 
accounts under /homea /homec /homef /homei /homem /homep /homes and 
/homet. Right now on our Sun Fire v890 with 4 dual core processors and 
16gb of memory we have 12,000 zfs file systems setup.  Which Sun has 
promised will work, but we didn't know that it would take over an hour 
to do a reboot on this machine to mount and umount all these file 
systems.  What were trying to accomplish is the best performance along 
with best data protection.  Sun speaks that ZFS supports millions of 
fil e systems, but what they left out is how long it takes to do a 
reboot when you have thousand's of file systems.
Currently we have three LUNS on our EMC disk array that we've created 
one zfs storage pool, and we've created these 12,000 zfs file system 
to this zfs pool.


We really don't want to have to go ufs to create our user student 
accounts.  We like the flexibility of ZFS, but with the slow boot 
process it will kill us when we have to implement patches that require 
a reboot.  These ZFS file systems will contain all the student data, 
so reliability and performance is a key to us.   Do you know away  or 
a different setup for ZFS to allow our system to boot up faster?
I know each mount takes up memory so that's part of the slowness when 
mounting and umounting.  We know when the system is up that the kernel 
is using 3gb of memory out of the 16gb, and there's nothing else on 
this box right, but ZFS.  There's no data in those thousand's of file 
systems yet.


Richard Elling wrote:

Jim Mauro wrote:
(I'm probably not the best person to answer this, but that has never 
stopped me
before, and I need to give Richard Elling a little more time to get 
the Goats, Cows

and Horses fed, sip his morning coffee, and offer a proper response...)


chores are done, wading through the morning e-mail...

Would it benefit us to have the disk be setup as a raidz along with 
the hardware raid 5 that is already setup too?  
Way back when, we called such configurations plaiding, which 
described a host-based RAID configuration
that criss-crossed hardware RAID LUNs. In doing such things, we had 
potentially better data availability
with a configuration that could survive more failure modes. 
Alternatively, we used the hardware RAID
for the availability configuration (hardware RAID 5), and used 
host-based RAID to stripe across hardware

RAID5 LUNs for performance. Seemed to work pretty well.


Yep, there are various ways to do this and, in general, the more copies
of the data you have, the better reliability you have.  Space is also
fairly easy to calculate.  Performance can be tricky, and you may 
need to
benchmark with your workload to see which is better, due to the 
difficulty

in modeling such systems.

In theory, a raidz pool spread across some number of underlying 
hardware raid 5 LUNs would
offer protection against more failure mode, such as the loss of an 
entire raid5 LUN. So from
a failure protection/data availability point of view, it offers some 
benefit. Now, as to whether or not
you experience a real, measurable benefit over time is hard to say. 
Each additional level of protection/redundancy
has a diminishing return, often times at a dramatic incremental cost 
(e.g. getting from four nines to five nines).


If money was no issue, I'm sure we could come up with an awesome 
solution :-)


Or with this double raid slow our performance with both a software 
and hardware raid setup?  
You will certainly pay a performance - using raidz across the raid5 
luns will reduce deliverable IOPS
from the raid 5 luns. Whether or not the performance trade-off is 
worth the RAS gain varies based on

your RAS and data availability requirements.


Fast, inexpensive, reliable: pick two.


Or would raidz setup be better than the hardware raid5 setup?
  
Assuming a robust raid5 implementation with battery-backed nvram 
(protect against the write hole and
partial stripe writes), I think a raidz zpool covers more of the 
datapath then a hardware raid 5 LUN, but

I'll wait for Richard to elaborate here (or tell me I'm wrong).


In general, you want the data protection in the application, or as 
close to
the application as you can get.  Since programmers tend to be lazy 
(Gosling
said it, not me! :-) most rely on the file system and underlying 
constructs
to ensure data protection.  So

[zfs-discuss] REMINDER: FROSUG March Meeting Announcement (3/29/2007)

2007-03-28 Thread Jim Walker
== Reminder: this meeting is tomorrow ==

Also, we will briefly talk about the Project Blackbox tour that is
coming to the Denver area April 12-13. More information is at:
http://www.sun.com/emrkt/blackbox

== Reminder: this meeting is tomorrow ==

This month's FROSUG (Front Range OpenSolaris User Group) meeting is on
Thursday, March 29, 2007.  Our presentation is on Sharemgr by
Doug McCallum. In addition, we will be giving an OpenSolaris Update,
and will be having an InstallFest. So, if you want help installing
an OpenSolaris distribution, backup your laptop and bring it to the
meeting!

!! We will be providing FREE Solaris Express Developer Edition DVDs. !!

About the presentation:
The sharemgr project is a framework for managing file sharing servers.
It provides a mechanism to manage groups of shares as a single object 
and integrates share and group configuration into the Solaris Management
Framework (SMF).

The presentation has been posted on the frosug web page:
http://www.opensolaris.org/os/community/os_user_groups/frosug/

About our presenter:
Doug McCallum has been an engineer at Sun for more than 15 years. He has
worked on a variety of Solaris projects including the original Solaris
x86 port, networking, device support and volume management. More recently
he has been working on improving the manageability of file sharing.

-

Meeting Details:

When:  Thursday, March 29, 2007
Times: 6:00pm - 6:30pm Doors open and Pizza
   6:30pm - 6:45pm OpenSolaris Update (Jim Walker)
   6:45pm - 8:30pm Sharemgr (Doug McCallum)
Where: Sun Broomfield Campus
   Building 1 - Conference Center
   500 Eldorado Blvd.
   Broomfield, CO 80021

The meeting is free and open to the public.

Pizza and soft drinks will be served at the beginning of the meeting.
Please RSVP to frosug-rsvp(AT)opensolaris(DOT)org in order to help us
plan for food and setup access to the Sun campus.

We hope to see you there!
Thanks,
FROSUG

-

Future Meeting Plans:
April 2007: Dave McLoughlin (OpenLogic) presents Open Source Management 
May 2007: SunStudio Compiler


If you have ideas for meeting topics, send them to:
ug-frosug(AT)opensolaris(DOT)org
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-04-01 Thread Jim Mauro


So you're not really sure it's the ARC growing, but only that the kernel 
is growing

to 6.8GB.

Print the arc values via mdb:
# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace uppc scsi_vhci ufs 
ip hook neti sctp arp usba nca lofs zfs random sppp crypto ptm ipc ]

 arc::print -t size c p c_max
uint64_t size = 0x2a8000
uint64_t c = 0x1cdfe800
uint64_t p = 0xe707400
uint64_t c_max = 0x1cdfe800


Is size = c_max?

Assuming it is, you need to look through kmastats and see where the 
kernel memory is

being used (again, inside mdb):

::kmastat

The above generates a LOT of output that's not completely painless to 
parse, but it's not

too bad either.

If you think it's DNLC related, you can monitor the number of entries with:

# kstat -p unix:0:dnlcstats:dir_entries_cached_current
unix:0:dnlcstats:dir_entries_cached_current 9374
#

You can also monitor kernel memory for the dnlc (just using grep with 
the kmastat in

mdb):

 ::kmastat ! grep dnlc
dnlc_space_cache  16104254  4096   104 0


The 5th column starting from the left is mem in use, in this example 4096.

I'm not sure if the dnlc_space_cache represents all of kernel memory 
used for

the dnlc. It might, but I need to look at the code to be sure...

Let's start with this...

/jim


Jason J. W. Williams wrote:

Hi Guys,

Rather than starting a new thread I thought I'd continue this thread.
I've been running Build 54 on a Thumper since Mid January and wanted
to ask a question about the zfs_arc_max setting. We set it to 
0x1 #4GB, however its creeping over that till our Kernel
memory usage is nearly 7GB (::memstat inserted below).

This is a database server so I was curious if the DNLC would have this
affect over time, as it does quite quickly when dealing with small
files? Would it be worth upgrade to Build 59?

Thank you in advance!




Best Regards,
Jason

Page SummaryPagesMB  %Tot
     
Kernel1750044  6836   42%
Anon  1211203  4731   29%
Exec and libs7648290%
Page cache 220434   8615%
Free (cachelist)   318625  12448%
Free (freelist)659607  2576   16%

Total 4167561 16279
Physical  4078747 15932


On 3/23/07, Roch - PAE [EMAIL PROTECTED] wrote:


With latest Nevada setting zfs_arc_max in /etc/system is
sufficient. Playing with mdb on a live system is more
tricky and is what caused the problem here.

-r

[EMAIL PROTECTED] writes:
  Jim Mauro wrote:
 
   All righty...I set c_max to 512MB, c to 512MB, and p to 256MB...
  
 arc::print -tad
   {
...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t16588228608
   c02e29f8 uint64_t c = 0t33176457216
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t33176457216
   ...
   }
 c02e2a08 /Z 0x2000
   arc+0x48:   0x7b9789000 =   0x2000
 c02e29f8 /Z 0x2000
   arc+0x38:   0x7b9789000 =   0x2000
 c02e29f0 /Z 0x1000
   arc+0x30:   0x3dcbc4800 =   0x1000
 arc::print -tad
   {
   ...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t268435456  
-- p

   is 256MB
   c02e29f8 uint64_t c = 0t536870912  
-- c

   is 512MB
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t536870912--- 
c_max is

   512MB
   ...
   }
  
   After a few runs of the workload ...
  
 arc::print -d size
   size = 0t536788992

  
  
   Ah - looks like we're out of the woods. The ARC remains clamped 
at 512MB.

 
 
  Is there a way to set these fields using /etc/system?
  Or does this require a new or modified init script to
  run and do the above with each boot?
 
  Darren
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs boot image conversion kit is posted

2007-04-19 Thread Jim Mauro


I'm not sure I understand the question.
Virtual machines are built by either running a virtualization technology
in a host operating systems, such as running VMware Workstation in
Linux, running Parallels in Mac OS X, Linux or Windows, etc.
These are sometimes referred to as Type II VMMs, where the
VMM (Virtual Machine Monitor - the chunk of software responsible
for running the guest operating system) is hosted by a traditional
operating system.

In Type I VMMs, the VMM runs on the hardware. VMware ESX
Server is an example of this (although some argue it is not, since
technically there's an ESX kernel that runs on the hardware in
support of the VMM). So

Building a virtual machine on a zpool would require that the host
operating system supports ZFS. An example here would be our
forthcoming (no, I do not know when), Solaris/Xen integration,
assuming there is support for putting Xen domU's on a ZFS.

It may help to point out that when a virtual machine is created,
it includes defining a virtual hard drive, which is typically just a
file in the file system space of the hosting operating system.
Given that, a hosting operating system that supports ZFS can allow
for configuring virtual hard drives in the ZFS space.

So I guess the anwer to your question is theoretically yes, but I'm
not aware of an implementation that would allow for such a
configuration that exists today.

I think I just confused the issue...ah well...

/jim

PS - FWIW, I have a zpool configured in nv62 running in a Parallels
virtual machine on Mac OS X. The nv62 system disk is a virtual
hard disk that exists as a file in Mac OS X HFS+, thus this particular
zpool is a partition on that virtual hard drive.



Lori Alt wrote:

I was hoping that someone more well-versed in virtual machines
would respond to this so I wouldn't have to show my ignorance,
but no such luck, so here goes:

Is it even possible to build a virtual machine out of a
zfs storage pool?  Note that it isn't just zfs as a root file system
we're trying out.  It's the whole concept of booting from
a dataset within a storage pool.   I don't know enough about
how one sets up a virtual machine to know whether it's
possible or even meaningful to talk about generating a
b62-on-zfs virtual machine.

Lori

MC wrote:

If the goal is to test ZFS as a root file system, could I suggest 
making a virtual machine of b62-on-zfs available for download?  This 
would reduce duplicated effort and encourage new people to try it out.



This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Fwd: [zfs-discuss] Re: Mac OS X Leopard to use ZFS

2007-06-10 Thread Jim Mauro


Hello -

I think L4 still needs to evolve. BTW, i believe microkernels is the
_right_ way and L4 is a first step in that direction.


Perhaps you could elaborate on this? I thought the microkernel debate ended
in the 1990s, in terms of being a compelling technology direction for kernel
development targetting general purpose computing. Sure, there may be a niche
market for microkernels (which depends, in part, on your definition of 
what a
microkernel is), but in terms of broad applicability, I thought the jury 
was in.


CMU's Mach was the last run at this that had any momentum.

Thank you.
/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [storage-discuss] Performance expectations of iscsi targets?

2007-06-19 Thread Jim Dunham

Paul,

While testing iscsi targets exported from thumpers via 10GbE and  
imported 10GbE on T2000s I am not seeing the throughput I expect,  
and more importantly there is a tremendous amount of read IO  
happending on a purely sequential write workload. (Note all systems  
have Sun 10GbE cards and are running Nevada b65.)


The read IO activity you are seeing is a direct result of re-writes  
on the ZFS storage pool. If you were to recreate the test from  
scratch, you would notice that on the very first pass of write I/Os  
from 'dd', there would be no reads. This is an artifact of using  
zvols as backing store for iSCSI Targets.


The iSCSI Target software supports raw SCSI disks, Solaris raw  
devices (/dev/rdsk/), Solaris block devices (/dev/dsk/...),  
zvols, SVM volumes, files in file systems, including temps.





Simple write workload (from T2000):

# time dd if=/dev/zero of=/dev/rdsk/ 
c6t01144F210ECC2A004675E957d0 \

  bs=64k count=100


A couple of things, maybe missing here, or the commands are not true  
cut-n-paste of what is being tested.


1). From the iSCSI initiator, there is no device at /dev/rdsk/ 
c6t01144F210ECC2A004675E957d0, note the missing slice. (s0,  
s1, s2, etc).


2). Even if one was to specify a slice, as in /dev/rdsk/ 
c6t01144F210ECC2A004675E957d0s2, it is unlikely that the LUN  
has been formatted. When I run format the first time, I get the error  
message of Please run fdisk first.


Of course this does not have to be the case, because if the ZFS  
storage pool that backed up this LUN had previously been formatted  
with either a Solaris VTOC or Intel EFI label, then the disk would  
show up correctly.




Performance of iscsi target pool on new blocks:

bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  17.4G  2.70T  0526  0  63.6M
thumper1-vdev0  17.5G  2.70T  0564  0  60.5M
thumper1-vdev0  17.5G  2.70T  0  0  0  0
thumper1-vdev0  17.5G  2.70T  0  0  0  0
thumper1-vdev0  17.5G  2.70T  0  0  0  0

Configuration of zpool/iscsi target:

# zpool status thumper1-vdev0
  pool: thumper1-vdev0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
thumper1-vdev0  ONLINE   0 0 0
  c0t7d0ONLINE   0 0 0
  c1t7d0ONLINE   0 0 0
  c5t7d0ONLINE   0 0 0
  c6t7d0ONLINE   0 0 0
  c7t7d0ONLINE   0 0 0
  c8t7d0ONLINE   0 0 0

errors: No known data errors

The first thing is that for this pool I was expecting 200-300MB/s  
throughput, since it is a simple stripe across 6, 500G disks.  In  
fact, a direct local workload (directly on thumper1) of the same  
type confirms what I expected:


bash-3.00# dd if=/dev/zero of=/dev/zvol/rdsk/thumper1-vdev0/iscsi  
bs=64k count=100 


bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  20.4G  2.70T  0  2.71K  0   335M
thumper1-vdev0  20.4G  2.70T  0  2.92K  0   374M
thumper1-vdev0  20.4G  2.70T  0  2.88K  0   368M
thumper1-vdev0  20.4G  2.70T  0  2.84K  0   363M
thumper1-vdev0  20.4G  2.70T  0  2.57K  0   327M

The second thing, is that when overwriting already written blocks  
via the iscsi target (from the T2000) I see a lot of read bandwidth  
for blocks that are being completely overwritten.  This does not  
seem to slow down the write performance, but 1) it is not seem in  
the direct case; and 2) it consumes channel bandwidth unnecessarily.


bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  8.90G  2.71T279783  31.7M  95.9M
thumper1-vdev0  8.90G  2.71T281318  31.7M  29.1M
thumper1-vdev0  8.90G  2.71T139  0  15.8M  0
thumper1-vdev0  8.90G  2.71T279  0  31.7M  0
thumper1-vdev0  8.90G  2.71T139  0  15.8M  0

Can anyone help to explain what I am seeing, or give me some  
guidance on diagnosing the cause of the following:

- The bottleneck in accessing the iscsi target from the T2000


From the iSCSI Initiator's point of view, there are various  
(Negotiated) Login Parameters, which may have a direct effect on  
performance. Take a look at iscsiadm list target --verbose, then  
consult the iSCSI man pages, or documentation online at docs.sun.com.


Remember to keep track of what you change on a per-target basis, and  
only change one parameter at a time, and measure your results.


- The cause of the extra read bandwidth when overwriting blocks on  
the iscsi target from the T2000.


ZFS as the backing store, and it COW (Copy-on-write) in maintaining  
the ZFS zvols within the storage pool.






Any help is much appreciated,
paul
___
storage-discuss mailing list
[EMAIL PROTECTED]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss




Jim Dunham
Solaris, Storage Software

[zfs-discuss] ZFS test suite released on OpenSolaris.org

2007-06-26 Thread Jim Walker
The ZFS test suite is being released today on OpenSolaris.org along with
the Solaris Test Framework (STF), Checkenv and Runwattr test tools.

The source tarball, binary package and baseline can be downloaded from the test
consolidation download center at http://dlc.sun.com/osol/test/downloads/current.
And, the source code can be viewed in the Solaris Test Collection (STC) 2.0
source tree at: 
http://cvs.opensolaris.org/source/xref/test/ontest-stc2/src/suites/zfs.

The STF, Checkenv and Runwattr packages must be installed prior to executing
a ZFS test run. More information is available in the ZFS README file and on the
ZFS test suite webpage at: http://opensolaris.org/os/community/zfs/zfstestsuite.

Any questions about the ZFS test suite can be sent to zfs discuss at:
http://www.opensolaris.org/os/community/zfs/discussions.
Any questions about STF, and the test tools can be sent to testing discuss at: 
http://www.opensolaris.org/os/community/testing/discussions.

Happy Hunting,
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sharemgr Test Suite Released on OpenSolaris.org

2007-07-23 Thread Jim Walker
The Sharemgr test suite is available on OpenSolaris.org.  
  
The source tarball, binary package and baseline can be downloaded from the test
consolidation download center at:
http://dlc.sun.com/osol/test/downloads/current

The source code can be viewed in the Solaris Test Collection (STC) 2.0 source
tree at:
http://cvs.opensolaris.org/source/xref/test/ontest-stc2/src/suites/share

The SUNWstc-tetlite package must be installed prior to executing a Sharemgr
test run. More information on the Sharemgr test suite is available in the
Sharemgr README file at:
http://src.opensolaris.org/source/xref/test/ontest-stc2/src/suites/share/README

Any questions about the Sharemgr test suite can be sent to testing discuss at:
http://www.opensolaris.org/os/community/testing/discussions 

Cheers,  
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does iSCSI target support SCSI-3 PGR reservation ?

2007-07-27 Thread Jim Dunham

 A quick look through the source would seem to indicate that the  
 PERSISTENT RESERVE commands are not supported by the Solaris ISCSI  
 target at all.

Correct. There is an RFE outstanding for iSCSI Target to implement  
PGR for both raw SCSI-3 devices, and block devices.

http://bugs.opensolaris.org/view_bug.do?bug_id=6415440


 http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/ 
 iscsi/iscsitgtd/t10_spc.c


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Solaris, Storage Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063
Email: [EMAIL PROTECTED]
http://blogs.sun.com/avs



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] New version of the ZFS test suite released

2007-08-03 Thread Jim Walker
Version 1.8 of the ZFS test suite was released today on opensolaris.org.

The ZFS test suite source tarballs, packages and baseline can be
downloaded at:
http://dlc.sun.com/osol/test/downloads/current/

The ZFS test suite source can be browsed at:
http://src.opensolaris.org/source/xref/test/ontest-stc2/src/suites/zfs/  

More information on the ZFS test suite is at:
http://opensolaris.org/os/community/zfs/zfstestsuite/

Questions about the ZFS test suite can be sent to zfs-discuss at:
http://www.opensolaris.org/jive/forum.jspa?forumID=80

Cheers,
Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] do zfs filesystems isolate corruption?

2007-08-11 Thread Jim Dunham
Chris,

 In the old days of UFS, on occasion one might create multiple file  
 systems (using multiple partitions) of a large LUN if filesystem  
 corruption was a concern.  It didn’t happen often  but filesystem  
 corruption has happened.  So, if filesystem X was corrupt  
 filesystem Y would be just fine.

 With ZFS, does the same logic hold true for two filesystems coming  
 from the same pool?

For the purposes of isolating corruption, the separation of two or  
more filesystems coming from the same ZFS storage pool does not help.  
An entire ZFS storage pool is the unit of I/O consistency, as all ZFS  
filesystems created within this single storage pool share the same  
physical storage.

When configuring a ZFS storage pool the [poor] decision of choosing a  
non-redundant (single or concatenation of disks) verses redundant  
(mirror, raidz, raidz2) storage pool, offers no means for ZFS to  
automatically recover for some forms of corruption.

Even when using a redundant storage pool, there are scenarios in  
which this is not good enough. This is when filesystem needs  
transitions into availability, such as when the loss or accessibility  
of two or more disks, causes mirroring or raidz to be ineffective.

As of Solaris Express build 68, Availability Suite [http:// 
www.opensolaris.org/os/project/avs/] is part of base Solaris,  
offering both local snapshots and remote mirrors, both of which work  
with ZFS.

Locally on a single Solaris host, snapshots of the entire ZFS storage  
pool can be taken at intervals of ones choosing, and with multiple  
snapshots of a single master, collections of snapshots, say at  
intervals of one hour, can be retained. Options allow for 100%  
independent snapshots (much like your UFS analogy above), dependent  
where only the Copy-On-Write data is retained, or compact dependent  
where the snapshots physical storage is some percentage of the master.

Remotely between to or more Solaris hosts, remote mirrors of  the  
entire ZFS storage pool can be configured, where synchronous  
replication can offer zero data loss, or asynchronous replication can  
offer near zero data loss, but both offering write-order, on disk  
consistency. A key aspect of remote replication with Availability  
Suite, is that the replicated ZFS storage pool can be quiesced on the  
remote node and accessed, or in a disaster recover scenario, take  
over instantly where the primary left off. When the primary site is  
restored, the MTTR (Mean Time To Recovery) is essentially zero, since  
Availability Suite supports on-demand pull, so yet to be replicated  
blocks are retrieved synchronously, allowing the ZFS filesystem and  
applications to be resumed without waiting for a potentially length  
resynchronization.



 Said slightly differently, I’m assuming that if the pool becomes  
 mangled some how then all filesystems will be toast … but is it  
 possible to have one filesystem be corrupted while the other  
 filesystems are fine?

 Hmmm, does the answer depend on if the filesystems are nested
 ex: 1  /my_fs_1  /my_fs_2
 ex: 2  /home_dirs/home_dirs/chris

 TIA!


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Solaris, Storage Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063
Email: [EMAIL PROTECTED]
http://blogs.sun.com/avs



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Under the Hood Presentation Slides

2007-08-17 Thread Jim Mauro

Is the referenced Laminated Handout on slide 3 available anywhere in any 
form electronically?

If not, I'd be happy to create an electronic copy and make it pubically 
available.

Thanks,
/jim


Joy Marshall wrote:
 It's taken a while but at last we have been able to post the ZFS Under the 
 Hood presentation slides from the session back at May's LOSUG.

 You can view both the presentation slides and a layered overview here:

 Presentation: 
 http://www.opensolaris.org/os/community/os_user_groups/losug/ZFS-UTH_3_v1.1_LOSUG.pdf

 Overview: 
 http://www.opensolaris.org/os/community/os_user_groups/losug/ZFS-UTH_LayeredOverview_v2.3.pdf

 Joy
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-26 Thread Jim Dunham
Rainer,

If you are looking for a means to safely READ any filesystem,  
please take a look at Availability Suite.

One can safely take Point-in-Time copies of any Solaris supported  
filesystem, including ZFS, at any snapshot interval of one's  
choosing, and then access the shadow volume on any system within the  
SAN, be it Fibre Channel or iSCSI. If the node wanting access to the  
data is distant, Available Suite also offers Remote Replication.

http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/

Jim

 Ronald,

 thanks for your comments.

 I was thinking about this scenario:

 Host w continuously has a UFS mounted with read/write access.
 Host w writes to the file f/ff/fff.
 Host w ceases to touch anything under f.
 Three hours later, host r mounts the file system read-only,
 reads f/ff/fff, and unmounts the file system.

 My assumption was:

 a1) This scenario won't hurt w,
 a2) this scenario won't damage the data on the file system,
 a3) this scenario won't hurt r, and
 a4) the read operation will succeed,

 even if w continues with arbitrary I/O, except that it doesn't
 touch anything under f until after r has unmounted the file system.

 Of course everything that you and Tim and Casper said is true,
 but I'm still inclined to try that scenario.

 Rainer
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Solaris, Storage Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063
Email: [EMAIL PROTECTED]
http://blogs.sun.com/avs



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, XFS, and EXT4 compared

2007-08-30 Thread Jim Mauro

I'll take a look at this. ZFS provides outstanding sequential IO performance
(both read and write). In my testing, I can essentially sustain 
hardware speeds
with ZFS on sequential loads. That is, assuming 30-60MB/sec per disk 
sequential
IO capability (depending on hitting inner or out cylinders), I get 
linear scale-up
on sequential loads as I add disks to a zpool, e.g. I can sustain 
250-300MB/sec
on a 6 disk zpool, and it's pretty consistent for raidz and raidz2.

Your numbers are in the 50-90MB/second range, or roughly 1/2 to 1/4 what was
measured on the other 2 file systems for the same test. Very odd.

Still looking...

Thanks,
/jim

Jeffrey W. Baker wrote:
 I have a lot of people whispering zfs in my virtual ear these days,
 and at the same time I have an irrational attachment to xfs based
 entirely on its lack of the 32000 subdirectory limit.  I'm not afraid of
 ext4's newness, since really a lot of that stuff has been in Lustre for
 years.  So a-benchmarking I went.  Results at the bottom:

 http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html

 Short version: ext4 is awesome.  zfs has absurdly fast metadata
 operations but falls apart on sequential transfer.  xfs has great
 sequential transfer but really bad metadata ops, like 3 minutes to tar
 up the kernel.

 It would be nice if mke2fs would copy xfs's code for optimal layout on a
 software raid.  The mkfs defaults and the mdadm defaults interact badly.

 Postmark is somewhat bogus benchmark with some obvious quantization
 problems.

 Regards,
 jwb

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Jim Mauro
About 2 years ago I was able to get a little closer to the patent 
litigation process,
by way of giving a deposition in litigation that was filed against Sun 
and Apple
(and has been settled).

Apparently, there's an entire sub-economy built on patent litigation 
among the
technology players. Suits, counter-suits, counter-counter-suits, etc, 
are just
part of every day business. And the money that gets poured down the drain!

Here's an example. During my deposition, the lawyer questioning me opened
a large box, and removed 3 sets of a 500+ slide deck created by myself and
Richard McDougall for seminars and tutorials on Solaris. Each set was
color print on heavy, glossy paper. That represented color printing of about
1600 pages total. All so the attorney could question me about 2 of the 
slides.

I almost fell off my chair

/jim



Rob Windsor wrote:
 http://news.com.com/NetApp+files+patent+suit+against+Sun/2100-1014_3-6206194.html

 I'm curious how many of those patent filings cover technologies that 
 they carried over from Auspex.

 While it is legal for them to do so, it is a bit shady to inherit 
 technology (two paths; employees departing Auspex and the Auspex 
 bankruptcy asset buyout), file patents against that technology, and then 
 open suits against other companies based on (patents covering) that 
 technology.

 (No, I'm not defending Sun in it's apparent patent-growling, either, it 
 all sucks IMO.)

 Rob++
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about uberblock blkptr

2007-09-17 Thread Jim Mauro

Hey Max - Check out the on-disk specification document at
http://opensolaris.org/os/community/zfs/docs/.

Page 32 illustration shows the rootbp pointing to a dnode_phys_t
object (the first member of a objset_phys_t data structure).

The source code indicates ub_rootbp is a blkptr_t, which contains
a 3 member array of dva_t 's called blk_dva (blk_dva[3]).
Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]).

So it looks like each blk_dva contains 3 128-bit DVA's

You probably figured all this out alreadydid you try using
a objset_phys_t to format the data?

Thanks,
/jim



[EMAIL PROTECTED] wrote:
 Hi All,
 I have modified mdb so that I can examine data structures on disk using 
 ::print.
 This works fine for disks containing ufs file systems.  It also works 
 for zfs file systems, but...
 I use the dva block number from the uberblock_t to print what is at the 
 block
 on disk.  The problem I am having is that I can not figure out what (if 
 any) structure to use.
 All of the xxx_phys_t types that I try do not look right.  So, the 
 question is, just what is
 the structure that the uberblock_t dva's refer to on the disk?

 Here is an example:

 First, I use zdb to get the dva for the rootbp (should match the value 
 in the uberblock_t(?)).

 # zdb - usbhard | grep -i dva
 Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 DMU 
 objset] 400L/200P DVA[0]=0:111f79000:200 DVA[1]=0:506bde00:200 
 DVA[2]=0:36a286e00:200 fletcher4 lzjb LE contiguous birth=621838 
 fill=167 cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb
 bp = [L0 DMU objset] 400L/200P DVA[0]=0:111f6ae00:200 
 DVA[1]=0:502efe00:200 DVA[2]=0:36a284e00:200 fletcher4 lzjb LE 
 contiguous birth=621838 fill=34026 
 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
 Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp [L0 
 DMU objset] 400L/200P DVA[0]=0:111f6ae00:200 DVA[1]=0:502efe00:200 
 DVA[2]=0:36a284e00:200 fletcher4 lzjb LE contiguous birth=621838 
 fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529
 first block: [L0 ZIL intent log] 9000L/9000P 
 DVA[0]=0:36aef6000:9000 zilog uncompressed LE contiguous birth=263950 
 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1
 ^C
 #

 Then I run my modified mdb on the vdev containing the usbhard pool
 # ./mdb /dev/rdsk/c4t0d0s0

 I am using the DVA[0} for the META data set above.  Note that I have 
 tried all of the xxx_phys_t structures
 that I can find in zfs source, but none of them look right.  Here is 
 example output dumping the data as a objset_phys_t.
 (The shift by 9 and adding 40 is from the zfs on-disk format paper, 
 I have tried without the addition, without the shift,
 in all combinations, but the output still does not make sense).

   (111f790009)+40::print zfs`objset_phys_t
 {
 os_meta_dnode = {
 dn_type = 0x4f
 dn_indblkshift = 0x75
 dn_nlevels = 0x82
 dn_nblkptr = 0x25
 dn_bonustype = 0x47
 dn_checksum = 0x52
 dn_compress = 0x1f
 dn_flags = 0x82
 dn_datablkszsec = 0x5e13
 dn_bonuslen = 0x63c1
 dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ]
 dn_maxblkid = 0x20a34fa97f3ff2a6
 dn_used = 0xac2ea261cef045ff
 dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, 
 0x315efac9cb693387, 0x2d56c54db5da75bf ]
 dn_blkptr = [
 {
 blk_dva = [
 {
 dva_word = [ 0x87c9ed7672454887, 
 0x760f569622246efe ]
 }
 {
 dva_word = [ 0xce26ac20a6a5315c, 
 0x38802e5d7cce495f ]
 }
 {
 dva_word = [ 0x9241150676798b95, 
 0x9c6985f95335742c ]
 }
 ]
 None of this looks believable.  So, just what is the rootbp in the 
 uberblock_t referring to?

 thanks,
 max


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] io:::start and zfs filenames?

2007-09-26 Thread Jim Mauro

Hi Neel - Thanks for pushing this out. I've been tripping over this for 
a while.

You can instrument zfs_read() and zfs_write() to reliably track filenames:

#!/usr/sbin/dtrace -s

#pragma D option quiet

zfs_read:entry,
zfs_write:entry
{
printf(%s of %s\n,probefunc, stringof(args[0]-v_path));
}



I'm not sure why the io:::start does not work for ZFS. I didn't spend 
any real time on this,
but it appears none of the ZFS code calls bdev_strategy() directly, and
instrumenting bdev_strategy:enter (which is where io:::start lives) to track
filenames via stringof(args[0]-b_vp-v_path) does not work either.

Use the zfs r/w function entry points for now.

What sayeth the ZFS team regarding the use of a stable DTrace provider 
with their file system?

Thanks,
/jim


Neelakanth Nadgir wrote:
 io:::start probe does not seem to get zfs filenames in
 args[2]-fi_pathname. Any ideas how to get this info?
 -neel

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] io:::start and zfs filenames?

2007-09-26 Thread Jim Mauro

 What sayeth the ZFS team regarding the use of a stable DTrace provider 
 with their file system?
   

For the record, the above has a tone to it that I really did not intend 
(antagonistic?), so

I had a good chat with Roch about this. The file pathname is derived via 
a translator
from the vnode v_path structure member, and thus requires an 
instantiated vnode
when the probe fires - this is why instrumenting bdev_strategy:entry and 
tracing
args[0]-b_vp-v_path has the same problem; no vnode.

An alternative approach to tracking filenames with IOs is using the 
fsinfo provider
(Solaris 10 Update 2) . This is a handy place to start:

#!/usr/sbin/dtrace -s

#pragma D option quiet

fsinfo:::
/ execname != dtrace /
{
@[execname, args[0]-fi_pathname, args[0]-fi_fs, probename] = 
count();
}

END
{
printf(%-16s %-24s %-8s %-16s 
%-8s\n,EXEC,PATH,FS,NAME,COUNT);
printa(%-16s %-24s %-8s %-16s [EMAIL PROTECTED],@);
}

Which yields...

EXEC PATH FS   NAME COUNT  
gnome-panel  /zp  ufs  lookup   1  
gnome-panel  /zp/home zfs  lookup   1  
gnome-panel  /zp/home/mauroj  zfs  lookup   1  
gnome-panel  /zp/home/mauroj/.recently-used.xbel.HKF3YT zfs  
getattr  1  
gnome-panel  /zp/home/mauroj/.recently-used.xbel.HKF3YT zfs  
lookup   1  
snip
metacity unknownsockfs   poll 1031   
vmware-user  unknownsockfs   poll 1212   
Xorg unknownsockfs   rwlock   1573   
Xorg unknownsockfs   rwunlock 1573   
gnome-terminal   unknownsockfs   poll 2084   
dbwriter /zp/spacezfs  realvp   4254   
dbwriter /zp/spacezfs  remove   4254   
dbwriter /zp/space/f33zfs  close4254   
dbwriter /zp/space/f33zfs  lookup   4254   
dbwriter /zp/space/f33zfs  read 4254   
dbwriter /zp/space/f33zfs  realvp   4254   
dbwriter /zp/space/f33zfs  seek 4254   
dbwriter /zp/space/f33zfs  write4254   
dbwriter /zp/spacezfs  getsecattr   4255   
dbwriter /zp/space/f33zfs  ioctl4255   
dbwriter /zp/space/f33zfs  open 4255   
dbwriter unknownzfs  create   4255   
dbwriter /zp/space/f33zfs  rwunlock 8508   
dbwriter /zp/spacezfs  lookup   8509   
dbwriter /zp/space/f33zfs  rwlock   8509   
dbwriter /zp  ufs  lookup   8515   

Thanks,
/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] io:::start and zfs filenames?

2007-09-26 Thread Jim Mauro

Hey Neel - Try this:

nv70b cat zfs_page.d
#!/usr/sbin/dtrace -s

#pragma D option quiet

zfs_putpage:entry
{
printf(zfs write to %s\n,stringof(args[0]-v_path));
}
zfs_getpage:entry
{
printf(zfs read from %s\n,stringof(args[0]-v_path));
}


I did some quick tests with mmap'd ZFS files, and it seems to work

/jim


Neelakanth Nadgir wrote:
 Jim I can't use zfs_read/write as the file is mmap()'d so no read/write!

 -neel

 On Sep 26, 2007, at 5:07 AM, Jim Mauro [EMAIL PROTECTED] wrote:

   
 Hi Neel - Thanks for pushing this out. I've been tripping over this  
 for a while.

 You can instrument zfs_read() and zfs_write() to reliably track  
 filenames:

 #!/usr/sbin/dtrace -s

 #pragma D option quiet

 zfs_read:entry,
 zfs_write:entry
 {
   printf(%s of %s\n,probefunc, stringof(args[0]-v_path));
 }



 I'm not sure why the io:::start does not work for ZFS. I didn't  
 spend any real time on this,
 but it appears none of the ZFS code calls bdev_strategy() directly,  
 and
 instrumenting bdev_strategy:enter (which is where io:::start lives)  
 to track
 filenames via stringof(args[0]-b_vp-v_path) does not work either.

 Use the zfs r/w function entry points for now.

 What sayeth the ZFS team regarding the use of a stable DTrace  
 provider with their file system?

 Thanks,
 /jim


 Neelakanth Nadgir wrote:
 
 io:::start probe does not seem to get zfs filenames in
 args[2]-fi_pathname. Any ideas how to get this info?
 -neel

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolar
   
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Jim Mauro

Hey Roch -
 We do not retain 2 copies of the same data.

 If the DB cache is made large enough to consume most of memory,
 the ZFS copy will quickly be evicted to stage other I/Os on
 their way to the DB cache.

 What problem does that pose ?

Can't answer that question empirically, because we can't measure this, but
I imagine there's some overhead to ZFS cache management in evicting and
replacing blocks, and that overhead could be eliminated if ZFS could be
told not to cache the blocks at all.

Now, obviously, whether this overhead would be in the noise level, or
something that actually hurts sustainable performance will depend on
several things, but I can envision scenerios where it's overhead I'd
rather avoid if I could.

Thanks,
/jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Jim Mauro

 Where does the win come from with directI/O?  Is it 1), 2), or some  
 combination?  If its a combination, what's the percentage of each  
 towards the win?
   
That will vary based on workload (I know, you already knew that ... :^).
Decomposing the performance win between what is gained as a result of 
single writer
lock breakup and no caching is something we can only guess at, because, 
at least
for UFS, you can't do just one - it's all or nothing.
 We need to tease 1) and 2) apart to have a full understanding.  

We can't. We can only guess (for UFS).

My opinion - it's a must-have for ZFS if we're going to get serious 
attention
in the database space. I'll bet dollars-to-donuts that, over the next 
several years,
we'll burn many tens-of-millions of dollars on customer support 
escalations that
come down to memory utilization issues and contention between database
specific buffering and the ARC. This is entirely my opinion (not that of 
Sun),
and I've been wrong before.

Thanks,
/jim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS File system and Oracle raw files compatibility

2007-10-19 Thread Jim Mauro

If the question is can Oracle files (datafiles, log files, etc) exist 
on a ZFS, the
answer is absolutely yes. More simply put, can you configure you Oracle 
database
on ZFS - absolutely.

The question, as stated, is confusing, because the term compatible can 
have pretty
broad meaning. So, I answered the question I think you wanted to ask.

Thanks,
/jim


Dale Pannell wrote:

 I have a customer that would like to know if the ZFS file system is 
 compatible with Oracle raw files.

  

 Any help you can provide is greatly appreciated.  Please respond 
 directly to me since I am not part of the zfs-discuss email alias.

  

  //Dale Pannell//
 SR Systems Engineer
 Office: 972.546.4111
 Mobile: 214.284.6057
 Email:  [EMAIL PROTECTED]
 *Sun Storage Group*

  

 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirroring

2007-10-22 Thread Jim Dunham

Mertol,


Hi;

 Do any of you know when ZFS remote mirroring will be available ?


Host-based replication of ZFS, and all other Solaris filesystems is  
available using Sun StorageTek Available Suite. AVS has been part of  
OpenSolaris since build 68.


http://www.opensolaris.org/os/project/avs/

Jim Dunham
Storage Platform Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063
http://blogs.sun.com/avs



regards



image001.gif

Mertol Ozyoney
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]





image001.gif
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI target using ZFS filesystem as backing

2007-11-21 Thread Jim Dunham
John,

 I'm working on a Sun Ultra 80 M2 workstation. It has eight 750 GB  
 SATA disks installed. I've tried the following on both ON build 72,  
 Solaris 10 update 4, and Indiana with the same results.

 If I create a ZFS filesystem using 1-7 hard drives (I've tried 1  
 and 7), and then try to make an iSCSI target on that pool, when a  
 client machine tries to access the iSCSI volume, the memory usage  
 on the Ultra 80 goes to the same size as the ZFS filesystem. For  
 example:

 I'm creating a RaidZ ZFS pool:
 zpool create -f telephone raidz c9d0 c10d0 c11d0 c12d0 c13d0 c14d0  
 c15d0

 I then create a two terabyte filesystem on that zvol:
 zfs create -V 2000g telephone/jelley

 And make it into an iSCSI target:
 iscsitadm create target -b /dev/zvol/dsk/telephone/jelley jelley

Try changing from a cached ZVOL to a raw ZVOL

iscsitadm create target -b /dev/zvol/Rdsk/telephone/jelley jelley

You can also try:

zpool set shareiscsi=on telephone/jelley

- Jim


 Now if I perform a 'iscsitadm list target', the iSCSI target  
 appears like it should:
 Target: jelley
 iSCSI Name: iqn.1986-03.com.sun:02:fcaa1650-f202-4fef-b44b- 
 b9452a237511.jelley
 Connections: 0

 Now when I try to connect to it with my Windows 2003 server running  
 the MS iSCSI initiator, I see the memory usage climb to the point  
 that the totally exhausts all available physical memory (prstat):

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/ 
 NLWP
511 root 2000G  106M sleep   590   0:02:58 1.1%  
 iscsitgtd/15
   2139 root 8140K 4204K sleep   590   0:00:00 0.0% sshd/1
   2164 root 3276K 2740K cpu1490   0:00:00 0.0% prstat/1
   2144 root 2672K 1752K sleep   490   0:00:00 0.0% bash/1
574 noaccess  173M   92M sleep   590   0:03:18 0.0% java/25

 Do you see the iscsitgtd process trying to use 2000 gigabytes of  
 RAM?  I can sit there and hold down spacebar while the Windows  
 workstation is trying to access it, and the memory usage climbs at  
 an astronomical rate, until it exhausts all the available memory on  
 the box (several hundred megabytes per minute). The total ram it  
 tries to allocate depends totally on the size of the iSCSI volume.  
 If it's a 1000 megabyte volume, then it only allocates a gig... if  
 it's 600 gigs, it tries to allocate 600 gigs.

 Now here is the real kicker. I took this down to as simple of a  
 configuration as possible--one single drive with a ZFS filesystem  
 on it. The memory utilization was the same. I then tried creating  
 the iSCSI target on a UFS filesystem. Everything work beautifully,  
 and memory utilization was no longer directly proportional to the  
 size of the iSCSI volume.

 If I create something small, like a 100 gig iSCSI target, the  
 system does eventually get around to finishing and releases the  
 ram. When what's really strange is when I try to access the iSCSI  
 volume, the memory usage then climbs megabyte per megabyte until it  
 is exhausted, and then access to the iSCSI volume is terribly slow.

 I can copy a 300 meg file in just six seconds when the memory  
 utilization on the iscsitgtd process is low. But if I try a 2.5 gig  
 file, once it get's about 1500 megs into it, performance drops  
 about 99.9% and it's incredibly slow... again, until it's done and  
 the iscsitgtd releases the ram, then it's plenty zippy for small IO  
 operations.

 Has anybody else been making iSCSI targets on ZFS pools?

 I've had a case open with Sun since Oct 3, if any Sun folks want to  
 look at the details (case #65684887).

 I'm getting very desperate to get this fixed, as this massive  
 amount of storage was the only reason I got this M80...

 Any pointers would be greatly appreciated.

 Thanks-
 John Tracy


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-12-13 Thread Jim Mauro
Would you two please SHUT THE F$%K UP.

Dear God, my kids don't go own like this.

Please - let it die already.

Thanks very much.

/jim


can you guess? wrote:
 Hello can,

 Thursday, December 13, 2007, 12:02:56 AM, you wrote:

 cyg On the other hand, there's always the
 possibility that someone
 cyg else learned something useful out of this.  And
 my question about

 To be honest - there's basically nothing useful in
 the thread,
 perhaps except one thing - doesn't make any sense to
 listen to you.
 

 I'm afraid you don't qualify to have an opinion on that, Robert - because you 
 so obviously *haven't* really listened.  Until it became obvious that you 
 never would, I was willing to continue to attempt to carry on a technical 
 discussion with you, while ignoring the morons here who had nothing 
 whatsoever in the way of technical comments to offer (but continued to babble 
 on anyway).

 - bill
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What does dataset is busy actually mean?

2007-12-13 Thread Jim Klimov
I've hit the problem myself recently, and mounting the filesystem cleared 
something in the brains of ZFS and alowed me to snapshot.

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg00812.html

PS: I'll use Google before asking some questions,  a'la (C) Bart Simpson
That's how I found your question ;)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with array-level block replication (TrueCopy, SRDF, etc.)

2007-12-14 Thread Jim Dunham
-in-Time  
Copy software, the software can be configured to automatically take a  
snapshot prior to re-synchronization, and automatically delete the  
snapshot if completed successfully. The use of I/O consistency groups  
assure that not only are the replicas write-order consistent during  
replication, but also that snapshots taken prior to re- 
synchronization are consistent too.


 Thanks

 Steve


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Auto backup and auto restore of ZFS via Firewire drive

2007-12-17 Thread Jim Klimov
It's good he didn't mail you, now we all know some under-the-hood details via 
Googling ;)

Thanks to both of you for this :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backup/replication system

2008-01-10 Thread Jim Dunham
Łukasz K wrote:

 Hi
I'm using ZFS on few X4500 and I need to backup them.
 The data on source pool keeps changing so the online replication
 would be the best solution.

As I know AVS doesn't support ZFS - there is a problem with
 mounting backup pool.

This is not true, if replication is configured correctly.
Where are you getting information about the aforementioned problem?

Have you looked at the following?

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/


Other backup systems (disk-to-disk or block-to-block) have the
 same problem with mounting ZFS pool.
I hope I'm wrong ?

In case of any problem I want the backup pool to be operational
 within 1 hour.

 Do you know any solution ?

 --Lukas

 
 Zagłosuj i zgarnij 10.000 złotych!
 Wybierz z nami Internetowego SportoWWWca Roku.
 Oddaj swój głos na najlepszego. - Kliknij:
 http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fsportowiec2007.htmlsid=166


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backup/replication system

2008-01-10 Thread Jim Dunham
Eric,


 On Jan 10, 2008, at 4:50 AM, Łukasz K wrote:

 Hi
I'm using ZFS on few X4500 and I need to backup them.
 The data on source pool keeps changing so the online replication
 would be the best solution.

As I know AVS doesn't support ZFS - there is a problem with
 mounting backup pool.
Other backup systems (disk-to-disk or block-to-block) have the
 same problem with mounting ZFS pool.
I hope I'm wrong ?

In case of any problem I want the backup pool to be operational
 within 1 hour.

 Do you know any solution ?

 If it doesn't need to be synchronous, then you can use 'zfs send -R'.

The prior statement could lead one to believe that 'zfs send -R' is  
asynchronous replication, which it is not.

The functionality ZFS provides via send/recv is known as time-fixed,  
or snapshot replication. Here, a non-changing data source, the  
snapshot, is synchronized from the source to destination node based on  
either a full or differential set of changes.

Unlike synchronous or asynchronous replication, where data is  
continuously replicated in a write-order consistent manner, time-fixed  
replication is discontinuous, often driven by taking periodic  
snapshots of the changing data, performing the differential  
synchronization of the non-changing source data to the remote host,  
then waiting until the next interval.

The most common problem with time-fixed replication is trying to  
determine, or calculate the periodic interval to use, since its  
optimal value is based on many variables, most of which are changing  
over time and usage patterns.



 eric

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Break a ZFS mirror and concatenate the disks

2008-01-10 Thread Jim Dunham
Kory,

 Yes, I get it now. You want to detach one of the disks and then readd
 the same disk, but lose the redundancy of the mirror.

 Just as long as you realize you're losing the redundancy.

 I'm wondering if zpool add will complain. I don't have a system to
 try this at the moment.

The correct, just verified steps are as follows:

zpool detach moodle c2t3d0
zpool add moodle c2t3d0

I performed these steps  while the zpool was online, under heavy I/O,  
with an I/O tool that does data validation. When done, I then  
performed a final zpool scrub moodle, with no issues, and then  
revalidated all the data.

As stated earlier, sacrificing redundancy (RAID 1 mirroring) for  
double the storage (RAID 0 concatenation) is being penny wise, and  
pound foolish.

Jim


 Cindy

 Kory Wheatley wrote:
 Currently c2t2d0 c2t3d0 are setup in a mirror.  I want to break the  
 mirror and save the data on c2t2d0 (which both drives are 73g.   
 Then I want to concatenate c2t2do to c2t3d0 so I have a pool of  
 146GB no longer in a mirror just concatenated.  But since their  
 mirror right now I need the data save on one disk so I don't lose  
 everything.  I don't need to add new disks that not an option I  
 want to break the mirror so I can expand the disks together in a  
 pool but save the data.


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iscsi on zvol

2008-01-24 Thread Jim Dunham
Jan,

 I'm wondering if it's possible to import a zpool on an iscsi-device  
 LOCALLY.

 Following scenario:

 HostA (Sol10u4):
 - Pool-1 (a striped-raidz-pool)
 - iscsi-zvol on Pool-1

 HostB (Sol10u3):
 - Pool-2 is a Mirror of one local device and the iscsi-vol of HostA

 Is ist possible to mount the iscsi-vol (or import Pool-2) on HostA?


No, due to a common misconception in the iSCSI space, spanning an  
iSCSI Target's backing store, to the resulting LUN as seen by iSCSI  
Initiators.

On HostA, the ZVOL called iscsi-zvol has a volume size, a size  
specified in the zfs create -V size Pool-1/iscsi-zvol. When an  
iSCSI Target is created out of this ZVOL, then the iSCSI Initiator  
discovers and enables this LUN on HostB, but this LUN is unformatted.  
In other words, this LUN does not contain a Solaris VTOC or an Intel  
EFI disk label, as its just a bunch of blocks. When issuing the zpool  
create Pool-2 mirror local-disk iscsi-vol, an Intel EFI disk  
label is placed on the disk (consuming some of the blocks), then all  
the remaining space is placed in partition (or slice 0), after which  
ZFS lays down its filesystem metadata in the space occupied by  
partition 0.

Now back on HostA, the ZVOL ( /dev/zvol/rdsk/Pool-1/iscsi-zvol ) looks  
like a bunch of blocks. Since this is a ZVOL, not a SCSI or iSCSI LUN,  
Solaris does not see the Intel EFI disk label, thus ZFS will not be  
able to see the ZFS filesystem metadata.

So even though the ZVOL contains all the right data, from the point of  
view of Solaris, this disk is not a LUN, and thus can not be accessed  
as such.

Jim



 I know, this is (also) iSCSI-related, but mostly a ZFS-question.

 Thanks for your answers,
 Jan Dreyer
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iscsi on zvol

2008-01-24 Thread Jim Dunham
After posting my reply to the initial note on this thread, and then  
reading it again, I have some followup comments:

The following statement should have said  ... this ZVOL in not a  
LUN, .

 So even though the ZVOL contains all the right data, from the point of
 view of Solaris, this disk is not a LUN, and thus can not be accessed
 as such.


But then could it be?

On HostA, where the ZVOL (iscsi-zvol) is served out as an iSCSI  
Target, there is nothing to prevent the iSCSI Initiator on HostA to  
discover the iSCSI Target on its own node. Doing so it will create an  
iSCSI LUN, which will be seen by Solaris. This is an example of iSCSI  
loopback, which works quite well.

This raises a key point that that you should be aware of. ZFS does not  
support shared access to the same ZFS filesystem.

If the ZFS storage pool Pool-2, is currently imported on HostB, an  
attempt to zpool import the iSCSI LUN on HostA, ZFS will report that  
this ZPOOL is being access on another host, which it is, HostB. Do not  
try to force a zpool import of this iSCSI LUN, or a Solaris panic will  
soon follow. (See key point above).

If the ZFS storage pool Pool-2 is currently exported on HostB, an  
attempt to zpool import the iSCSI LUN on HostA will work, except that  
now 1/2 of the mirrored zpool will not be accessible, since its a  
local device on HostB, and therefore not accessible. Maybe the local  
device on HostB should also be an iSCSI Target too.

One more thing. ZFS and iSCSI start and stop at different times during  
Solaris boot and shutdown, so I would recommend using legacy mount  
points, or manual zpool import / exports when trying configurations at  
this level.

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [Fwd: Re: Presales support on ZFS]

2008-01-29 Thread Jim Dunham
Enrico,


 Hello
 I'm offering a solution based on our disks where replication and
 storage management should be made using only ZFS...
 The test change few bytes on one file ( 10bytes ) and check how many
 bytes the source sends to target.
 The customer tried the replication between 2 volume...They compared
 ZFS replica with true copy replica and they realized the following
 considerations:

  1. ZFS uses a block bigger than HDS true copy
  2. true copy sends 32Kbytes and ZFS 100K and more changing only 10
 file bytes

 Can we configure ZFS to improve replication efficiencies ?

 The solution should consider 5 remote site replicating on one  
 central
 data-center. Considering the zfs block overhead the customer is
 thinking to buy a solution based on traditional storage arrays like
 HDS entry level arrays ( our 2530/2540 ). If so ..with the ZFS the
 network traffic, storage space become big problems for the customer
 infrastructures.

 Are there any documentation explaining internal ZFS replication
 mechanism to face the customer doubts ? Thanks
 Do we need of AVS in our solution to solve the problem ?

AVS, not unlike HDS, does block-based replication based on actual  
write I/Os to configured devices.  Therefore if the means for ZFS to  
change 10 bytes, results in ZFS writing 100KB or more, AVS will be  
essentially be no different than HDS in this specific area.

Of course this begs to question, is the measure of a 10 byte change to  
a given file, a viable metric for choosing one form of replication  
over another? I think, or would hope not.  What is need is a  
characterization of the application(s) write-rate to one or more ZFS  
filesystems, over the customers requirements for data replication.

A good place to start is:
http://www.sun.com/storagetek/white-papers/data_replication_strategies.pdf
http://www.sun.com/storagetek/white-papers/enterprise_continuity.pdf



 Thanks

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL controls in Solaris 10 U4?

2008-01-29 Thread Jim Mauro


http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

The above link shows how to disable to ZIL for testing purposes (it's 
not generally recommended to
keep it disabled in production).

As to the putpack schedule of recent ZFS features into Solaris 10, I'm 
afraid I
don't have the information. Hopefully, someone else will know...

Thanks,
/jim

Jonathan Loran wrote:
 Is it true that Solaris 10 u4 does not have any of the nice ZIL controls 
 that exist in the various recent Open Solaris flavors?  I would like to 
 move my ZIL to solid state storage, but I fear I can't do it until I 
 have another update.  Heck, I would be happy to just be able to turn the 
 ZIL off to see how my NFS on ZFS performance is effected before spending 
 the $'s.  Anyone know when will we see this in Solaris 10?

 Thanks,

 Jon

   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS replication strategies

2008-02-01 Thread Jim Dunham
Erast,

 Take a look on NexentaStor - its a complete 2nd tier solution:

 http://www.nexenta.com/products

 and AVS is nicely integrated via management RPC interface which is
 connecting multiple NexentaStor nodes together and greatly simplifies
 AVS usage with ZFS... See demo here:

 http://www.nexenta.com/demos/auto-cdp.html

Very nice job.. Its refreshing to see something I know oh too well,  
with an updated management interface, and a good portion of the  
plumbing hidden away.

- Jim



 On Fri, 2008-02-01 at 10:15 -0800, Vincent Fox wrote:
 Does anyone have any particularly creative ZFS replication  
 strategies they could share?

 I have 5 high-performance Cyrus mail-servers, with about a Terabyte  
 of storage each of which only 200-300 gigs is used though even  
 including 14 days of snapshot space.

 I am thinking about setting up a single 3511 with 4 terabytes of  
 storage at a remote site as a backup device for the content.   
 Struggling with how to organize the idea of wedging 5 servers into  
 the one array though.

 Simplest way that occurs is one big RAID-5 storage pool with all  
 disks.  Then slice out 5 LUNs each as it's own ZFS pool.  Then use  
 zfs send  receive to replicate the pools.

 Ideally I'd love it if ZFS directly supported the idea of rolling  
 snapshots out into slower secondary storage disks on the SAN, but  
 in the meanwhile looks like we have to roll our own solutions.


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mounting a copy of a zfs pool /file system while orginal is still active

2008-02-04 Thread Jim Dunham

Darren J Moffat wrote:

Dave Lowenstein wrote:

Nope, doesn't work.

Try presenting one of those lun snapshots to your host, run cfgadm - 
al,

then run zpool import.


#zpool import
no pools available to import


Does format(1M) see the luns ?  If format(1M) can't see them it is
unlikely that ZFS will either.


It would make my life so much simpler if you could do something like
this: zpool import --import-as yourpool.backup yourpool


 zpool import [-o mntopts] [ -o property=value] ... [-d dir |
 -c cachefile] [-D] [-f] [-R root] pool | id [newpool]

 Imports a specific pool. A pool can be identified by its
 name or the numeric identifier. If newpool is specified,
 the pool is imported using the name newpool.  Otherwise,
 it is imported with the same name as its exported name.


Given that the pool is snapshot of one or more vdevs in an existing  
ZFS storage pool, not only is the name identical, so it is the  
numeric identifier. If can be determined that when using zpool  
import , duplicates are suppressed, even if those duplicates are  
entirely separate vdevs containing block-based snapshots, physical  
copies, remote mirrors or iSCSI Targets.


The steps to reproduce this behavior on a single node, using files and  
stand Solaris utilities is as follows:


# mkfile 500m /var/tmp/pool_file
# zpool create pool /var/tmp/pool_file
# zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATEREAD WRITE CKSUM
pool  ONLINE  0 0 0
  /var/tmp/pool_file  ONLINE  0 0 0

errors: No known data errors

# zpool export pool
# dd if=/var/tmp/pool_file of=/var/tmp/pool_snapshot
  { wait, wait, wait, ... more on this later ...}
1024000+0 records in
1024000+0 records out
# zpool import -d /var/tmp
  pool: pool
id: 14424098069460077054
 state: ONLINE
action: The pool can be imported using its name or numeric identified
config:

pool  ONLINE
  /var/tmp/pool_file  ONLINE

Question: What happened to the other ZFS storage pool call  
pool_snapshot?


Answer: Its presence is suppressed by zpool import. If one was to  
rename /var/tmp/pool_file to some other directory, the /var/tmp/ 
pool_snapshot will now appear.


# mv /var/tmp/pool_file /var/pool_file
# zpool import -d /var/tmp
  pool: pool
id: 14424098069460077054
 state: ONLINE
action: The pool can be imported using its name or numeric identified
config:

pool  ONLINE
  /var/tmp/pool_snapshot  ONLINE  0 0 0

At this point, if one was to go ahead with the import of pool, (which  
would work) then rename /var/pool_file back to /var/tmp/pool_file, its  
presence would now be suppressed. Conversely, if the rename was done  
first, then a zpool import was attempted, again only one storage pool  
would exists at any given time.


Clearly there is some explicit suppressing of duplicate storage pools  
going on here. Browsing the ZFS code looking for answer, the logic  
surrounding zfs_inuse(), seem to cause this behavior, expected or not.


http://cvs.opensolaris.org/source/search?q=vdev_inuseproject=%2Fonnv

=

As mentioned earlier, the  {wait, wait, wait, ...} can be eliminated  
by using Availability Suite Point-in-Time Copy, by itself, or in  
combination with Availability Suite Remote Copy or iSCSI Target, all  
of which are present in OpenSolaris today, and all are much fast then  
the dd utility.


As one that supports both Availability Suite and iSCSI Target, not  
suppressing duplicate pool names and pool identifiers, in combination  
with a rename on import, zpool import -new name ..., would provide  
a means to support various copies, or nearly identical copies of a ZFS  
storage pool on the same Solaris host.


While browsing the ZFS source code, I noticed that usr/src/cmd/ztest/ 
ztest.c, includes ztest_spa_rename(), a ZFS test which renames a ZFS  
storage pool to a different name, tests the pool under its new name,  
and then renames it back. I wonder why this functionality was not  
exposed as part of zpool support?


- Jim



# zpool import foopool barpool



--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
work: 781.442.4042
cell:   603-724-3972

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [Fwd: Re: Presales support on ZFS]

2008-02-11 Thread Jim Dunham

Enrico,


Is there any forecast to improve the efficiency of the replication
mechanisms of ZFS ? Fishwork - new NAS release 


I would take some time to talk with and understand exactly what the  
customer's expectation are for replication. i would not base my  
decision on the cost of replicating 10 bytes, regardless of how  
inefficient it may be.


These two documents should help:

http://www.sun.com/storagetek/white-papers/data_replication_strategies.pdf
http://www.sun.com/storagetek/white-papers/enterprise_continuity.pdf

Two key metrics of replication are:

Recovery Point Objective (RPO), is the amount of data lost (or less),  
measured as a unit time. Once a day backups yield a 24 hour RPO, once  
an hour snapshots yields ~1 hour RPO, asynchronous replication yields  
zero seconds to a few minutes RPO, and synchronous replication means  
zero seconds RPO.


Recovery Time Objective (RTO), is the amount of time after a failure,  
until normal operations are restored. Tapes backups could be minutes  
to hours, local snapshots could be nearly instantaneous, assuming the  
local site survived the failure. Remote snapshots or replicas could be  
minutes, hours or days, depending on the amount of data to  
resynchronize, impacted by network bandwidth and latency.


Availability Suite has a unique feature in this last area, called on- 
demand pull. Assuming that the primary site's volumes are lost, after  
they have been re-provisioned, a reverse update can be initiated.  
Besides the background resilvering in the reverse direction being  
active, eventually restoring all lost data, on-demand pull performs  
synchronous replication of data blocks on demand, as needed by the  
filesystem, database or application. Although the performance will be  
less then synchronous replication, the RTO is quite low. This type of  
recovery is analogous to loosing one's entire email account, having  
recovery initiated, but also selected email can be open as needed  
before the entire volume is restored, using  on demand requests to  
satisfy data blocks for relevant email requests.


Jim





Considering the solution we are offering to our customer ( 5 remote
sites replicating in one central data-center ) with ZFS ( cheapest
solution )  I should consider
3 times the network load of a solution based on SNDR-AVS and 3 times  
the

storage space too..correct ?

I there any documentation on that ?
Thanks

Richard Elling ha scritto:

Enrico Rampazzo wrote:

Hello
I'm offering a solution based on our disks where replication and
storage management should be made using only ZFS...
The test change few bytes on one file ( 10bytes ) and check how
many bytes the source sends to target.
The customer tried the replication between 2 volume...They  
compared

ZFS replica with true copy replica and they realized the following
considerations:

 1. ZFS uses a block bigger than HDS true copy



ZFS uses dynamic block sizes.  Depending on the configuration and
workload, just a few disk blocks will change, or a bunch of redundant
metadata might change.  In either case, changing the ZFS recordsize
will make little, if any, change.

 2. true copy sends 32Kbytes and ZFS 100K and more changing only  
10

file bytes

Can we configure ZFS to improve replication efficiencies ?



By default, ZFS writes two copies of metadata. I would not recommend
reducing this because it will increase your exposure to faults.   
What may

be happening here is that a 10 byte write may cause a metadata change
resulting in a minimum of three 512 byte physical blocks being
changed. The metadata copies are on spatially diverse, so you may see
these three
blocks starting at non-contiguous boundaries.  If Truecopy sends only
32kByte blocks (speculation), then the remote transfer will be  
96kBytes

for 3 local, physical block writes.

OTOH, ZFS will coalesce writes.  So you may be able to update a
number of files yet still only replicate 96kBytes through Truecopy.
YMMV.

Since the customer is performing replication, I'll assume they are  
very

interested in data protection, so keeping the redundant metadata is a
good idea. The customer should also be aware that replication at the
application level is *always* more efficient than replicating  
somewhere

down the software stack where you lose data context.
-- richard


The solution should consider 5 remote site replicating on one
central data-center. Considering the zfs block overhead the
customer is thinking to buy a solution based on traditional  
storage

arrays like HDS entry level arrays ( our 2530/2540 ). If so ..with
the ZFS the network traffic, storage space become big problems for
the customer infrastructures.

Are there any documentation explaining internal ZFS replication
mechanism to face the customer doubts ? Thanks


Do we need of AVS in our solution to solve the problem ?


Thanks



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http

Re: [zfs-discuss] iscsi core dumps when under IO

2008-02-22 Thread Jim Dunham
Stephen,

 I am getting a strange issue when using zfs/iscsi shares out of it.

 when I have attached a a cent os 5 initiator to the zfs target it  
 works fine normally until i start doing heavy 100MB/s+ copies to a  
 seperate cfs/nfs export on the same zfs pool.

 the error I am getting is:

 [ Feb 20 10:41:07 Stopping because process dumped core. ]
 [ Feb 20 10:41:07 Executing stop method (/lib/svc/method/svc- 
 iscsitgt stop 143) ]

 I was wondering if any one had any ideas. I am running 10U4 with all  
 of the latest and greatest patches. Thank you.

There are a set of issues we have recently have been resolved in  
Nevada regarding the iSCSI Target under load. We are looking at back  
porting these changes to S10.

The nature of the failure appears to be an iSCSI Initiator seeing long  
service times (in seconds), triggering a LUN reset. The LUN reset  
causes all I/O to be cleaned up specific to that LUN. Given the multi- 
threaded nature of the iSCSI Target, the odds are pretty high that  
cleanup across every possible I/O state would be possible, and some of  
the states were not handled correctly.

The follow command is likely to show the reason for the process  
dumped core, being an assert in the T10 state machine.

# mdb /core
::status
::quit


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cpying between pools

2008-03-14 Thread Jim Dunham
Vahid,

 We need to move about 1T of data from one zpool on EMC dmx-3000 to  
 another storage device (dmx-3). DMX-3 can be visible on the same  
 host where dmx-3000 is being used on or from another host.
 What is the best way to transfer the data from dmx-3000 to dmx-3?
 Is it possible to add the new dmx as a sub mirror of the old dmx and  
 after the sync is finished, remove the old dmx from the mirror.

See:zpool replace [-f] pool old_device [new_device]

- Jim


 Thank you,


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI targets mapped to a VMWare ESX server

2008-04-07 Thread Jim Dunham

Mertol Ozyoney wrote:



Hi All ;


There are a set of issues being looked at that prevent the VMWare ESX  
server from working with the Solaris iSCSI Target.


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6597310

At this time there is no target date when this issues will be resolved.

Jim




We are running latest Solaris 10 a X4500 Thumper. We defined a test  
iSCSI Lun. Out put below


Target: AkhanTemp/VM
iSCSI Name: iqn.1986-03.com.sun:02:72406bf8-2f5f-635a-f64c- 
cb664935f3d1

Alias: AkhanTemp/VM
Connections: 0
ACL list:
TPGT list:
LUN information:
LUN: 0
GUID: 01144fa709302a0047fa50e6
VID: SUN
PID: SOLARIS
Type: disk
Size:  100G
Backing store: /dev/zvol/rdsk/AkhanTemp/VM
Status: online

We tried to access the LUN from a windows laptop, and it worked  
without any problems. However VMWare ESX 3,2 Server is unable to  
access the LUN’s. We checked that the virtual interface can ping  
X4500.
Sometimes it sees the Lun , but 200+ Lun’s with the same proporties  
are listed and we cant add them as storage. Then after a rescan they  
vanish.


Any help appraciated


Mertol

image001.gif
Mertol Ozyoney
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving zfs pool to new machine?

2008-05-07 Thread Jim Dunham
Steve,

 Can someone tell me or point me to links that describe how to
 do the following.

 I had a machine that crashed and I want to move to a newer machine
 anyway.  The boot disk on the old machine is fried.  The two disks I  
 was
 using for a zfs pool on that machine need to be moved to a newer  
 machine
 now running 2008.05 OpenSolaris.

 What is the procedure for getting back the pool on the new machine and
 not losing any of the files I had in that pool?  I searched the docs,
 but did not find a clear answer to this and experimenting with various
 zsh and zpool commands did not see the two disks or their contents.

To see all available pools to import:

zpool import

 From this list, it should include your prior storage pool name

zpool import pool-name

- Jim



 The new disks are c6t0d0s0 and c6t1d0s0.  They are identical disks set
 that were set up in a mirrored pool on the old machine.

 Thanks,

 Steve Christensen
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Image with DD from ZFS partition

2008-05-08 Thread Jim Dunham
Hans,

 hello,
 can i create a image from ZFS with the DD command?

Yes, with restrictions.

First, a ZFS storage pool must be in the zpool export state to be  
copied, so that a write-order consistent set of data exists in the  
copy. ZFS does an excellent job of detecting inconsistencies in those  
volumes making up a single ZFS storage pool, so a copy of a imported  
storage pool is sure to be inconsistent, and thus unusable by ZFS.

Although there are various means to copy ZFS (actually copy the  
individual vdevs in a single ZFS storage pool), one can not zpool  
import this copy of ZFS on the same node as the original ZFS storage  
pool. Unlike other Solaris filesystems, ZFS maintains metadata on each  
vdev that is used to reconstruct a ZFS storage pool at zpool import  
time. The logic within zpool import processing will correctly find  
all constituent volumes (vdevs) of a single ZFS storage pool, but  
ultimately hides / excludes other volumes (the copies) from being  
considered as part of the current or any other zpool import  
operation. Only the original, nots its copy, can be seen or utilized  
by zpool import

If possible, the ZFS copy can be moved or accessed (using dual-ported  
disks, FC SAN, iSCSI SAN, Availability Suite, etc.) from another host,  
and then only there can the ZFS copy undergo a successful zpool  
import.

As a slight segue, Availability Suite (AVS), can create an instantly  
accessible copy of the constituent volumes (vdevs) of a ZFS storage  
pool (in lieu of using DD which can take minutes, or hours). This is  
the Point-in-Time Copy, or II (Instant Image) part of AVS. This copy  
can also be replicated to a remote Solaris host where it can be  
imported. This is the Remote Copy, of SNDR (Network Data Replicator)  
part of AVS.  AVS also supports the ability to synchronously, or  
asynchronously replicate the actual ZFS storage pool to a another  
host, (no local copy needed), and then zpool imported the replica  
remotely.

See: opensolaris.org/os/project/avs/, plus the demos.



 when i work with linux i use partimage to create an image from one  
 partitino and store it on another. so i can restore it if an error.
 partimage do not work with zfs, so i must use the DD command.
 i think so:
 DD IF=/dev/sda1 OF=/backup/image
 can i create an image this way, and restore it the other:
 DD IF=/backup/image OF=/dev/sda1
 when i have two partitions with zfs, can i boot from the live cd,  
 mount one partition to use it as backup target?
 or is it possible to create a ext2 partition and use a linux rescue  
 cd to backup the zfs partition with dd ?


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89

2008-05-29 Thread Jim Klimov
I've installed SXDE (snv_89) and found that the web console only listens on 
https://localhost:6789/ now, and the module for ZFS admin doesn't work.

When I open the link, the left frame lists a stacktrace (below) and the right 
frame is plain empty. Any suggestions?

I tried substituting different SUNWzfsgr and SUNWzfsgu packages from older 
Solarises (x86/sparc, snv_77/84/89, sol10u3/u4), and directly substituting the 
zfs.jar file, but these actions resulted in either the same error or 
crash-and-restart of SMC Webserver.

I didn't yet try installing an older SUNWmco* packages (a 10u4 system with SMC 
3.0.2 works ok), I'm not sure it's a good idea ;)

The system has JDK 1.6.0_06 per default, maybe that's the culprit? I tried 
setting it to JDL 1.5.0_15 and web-module zfs refused to start and register 
itself...

===
Application Error
com.iplanet.jato.NavigationException: Exception encountered during forward
Root cause = [java.lang.IllegalArgumentException: No enum const class 
com.sun.zfs.common.model.AclInheritProperty$AclInherit.restricted]
Notes for application developers:

* To prevent users from seeing this error message, override the 
onUncaughtException() method in the module servlet and take action specific to 
the application
* To see a stack trace from this error, see the source for this page

Generated Thu May 29 17:39:50 MSD 2008
===

In fact, the traces in the logs are quite long (several screenfulls) and nearly 
the same; this one starts as:
===
com.iplanet.jato.NavigationException: Exception encountered during forward
Root cause = [java.lang.IllegalArgumentException: No enum const class 
com.sun.zfs.common.model.AclInheritProperty$AclInherit.restricted]
at com.iplanet.jato.view.ViewBeanBase.forward(ViewBeanBase.java:380)
at com.iplanet.jato.view.ViewBeanBase.forwardTo(ViewBeanBase.java:261)
at 
com.iplanet.jato.ApplicationServletBase.dispatchRequest(ApplicationServletBase.java:981)
at 
com.iplanet.jato.ApplicationServletBase.processRequest(ApplicationServletBase.java:615)
at 
com.iplanet.jato.ApplicationServletBase.doGet(ApplicationServletBase.java:459)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
...
===
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Liveupgrade snv_77 with a ZFS root to snv_89

2008-05-29 Thread Jim Klimov
We have a test machine installed with a ZFS root (snv_77/x86 and 
rootpol/rootfs with grub support).

Recently tried to update it to snv_89 which (in Flag Days list) claimed more 
support for ZFS boot roots, but the installer disk didn't find any previously 
installed operating system to upgrade.

Then we tried to install SUNWlu* packages from snv_89 disk onto snv_77 system. 
It worked in terms of package updates, but lucreate fails:

# lucreate -n snv_89
ERROR: The system must be rebooted after applying required patches.
Please reboot and try again.

Apparently we rebooted a lot and it did not help...

How can we upgrade the system?

In particular, how does LU do it? :)

Now working on an idea to update all existing packages in the cloned root, 
using pkgrm/pkgadd -R. Updating only some packages didn't help much (kernel, 
zfs, libs).

A backup plan is to move the ZFS root back to UFS, update and move it back. 
Probably would work, but not an elegant job ;)

Suggestions welcome, maybe we'll try out some of them and report ;)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Liveupgrade snv_77 with a ZFS root to snv_89

2008-05-30 Thread Jim Klimov
You mean this:
https://www.opensolaris.org/jive/thread.jspa?threadID=46626tstart=120

Elegant script, I like it, thanks :)

Trying now...

Some patching follows:
-for fs in `zfs list -H | grep ^$ROOTPOOL/$ROOTFS | awk '{ print $1 };'`
+for fs in `zfs list -H | grep ^$ROOTPOOL/$ROOTFS | grep -w $ROOTFS | grep 
-v '@' | awk '{ print $1 };'`

In essence, skip snapshots (@) and non-rootpool/rootfs/subfs paths.

On my system I happen to have both problems (a clone rootpool/rootfs_snv77 
and some snapshots of both the clone and rootfs).

Alas, so far the upgrade didn't get going (ttinstall doesn't see the old 
system, neither ZFS root nor the older UFS SVM-mirror root), although 
rootpool/rootfs got mounted to /a. I'm now reboot-and-retrying - perhaps early 
tests and script rewrites/reruns messed something up.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Liveupgrade snv_77 with a ZFS root to snv_89

2008-05-30 Thread Jim Klimov
Alas, didn't work so far.

Can the problem be that the zfs-root disk is not the first on the controller 
(system boots from the grub on the older ufs-root slice), and/or that  zfs is 
mirrored? And that I have snapshots and a data pool too?

These are the boot disks (SVM mirror with ufs and grub):
[EMAIL PROTECTED] /]# metastat -c
d1   m  4.0GB d12 d10
d12  s  4.0GB c3t2d0s0
d10  s  4.0GB c3t0d0s0

This is the actual system:
[EMAIL PROTECTED] /]# zpool status
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c3t0d0s3  ONLINE   0 0 0
c3t1d0s3  ONLINE   0 0 0
c3t2d0s3  ONLINE   0 0 0
c3t3d0s3  ONLINE   0 0 0

errors: No known data errors

  pool: rootpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rootpool  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t1d0s0  ONLINE   0 0 0
c3t3d0s0  ONLINE   0 0 0

errors: No known data errors
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   4   5   6   7   8   >