from:"Dennis Clarke"

Re: [zfs-discuss] /bin/cp vs /usr/gnu/bin/pc

2010-06-26 Thread Dennis Clarke

 Hello,
 
 I came across this blog post:
 
 http://kevinclosson.wordpress.com/2007/03/15/copying-files-on-solaris-slow-or-fast-its-your-choice/
 
 and would like to hear from you performance gurus how this 2007 
 article relates to the 2010 ZFS implementation?  What should I use and 
 why?
 

[ WARNING : red herring non-sequiter follows ]

My PATH looks like so : 

$ echo $PATH
/opt/SUNWspro/bin:/usr/xpg6/bin:/usr/xpg4/bin:/usr/ccs/bin:/usr/bin:/usr/sbin:/bin:/sbin

Thus I have no such issues with the GNU vs OpenGroup/POSIX compliance tools.

Dennis 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs iostat - which unit bit vs. byte

2010-06-17 Thread Dennis Clarke


 Hi--

 ZFS command operations involving disk space take input and display using
 numeric values specified as exact values, or in a human-readable form
 with a suffix of B, K, M, G, T, P, E, Z for bytes, kilobytes, megabytes,
 gigabytes, terabytes, petabytes, exabytes, or zettabytes.


Let's play a game here. :-)

Suppose you wanted a 1PB zpool and you wanted dedup. How much memory would
you need for that and would you separate out ZIL cache etc?

I'm guessing ( total WAG ) that one would want at least 16TB of memory. I
have no idea of any system out there that can pop that many 8G ECC SIMMs (
2048 of them ) into.

But really .. what sort of theoretical machine would be needed to handle a
single 1024 TB zpool ?

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup... still in beta status

2010-06-16 Thread Dennis Clarke



 I think, with current bits, it's not a simple matter
 of ok for
 enterprise, not ok for desktops.  with an ssd for
 either main storage
 or l2arc, and/or enough memory, and/or a not very
 demanding workload, it
 seems to be ok.


 The main problem is not performance (for a home server is not a
 problem)... but what really is a BIG PROBLEM is when you try to delete a
 snapshot a little big... (try yourself...create a big random file with 90
 Gb of data... then snapshot... then delete the file and delete the
 snapshotyou will see)... and better... try removing the SSD disk. just
 out of curiosity... my test sytem (8 Gb ram)... takes over 30 hours to
 delete a dataset of 1.7 TB (still not finished...)... and the system does
 not respond (is working but does not respond... not even a simple ls
 command)
 --

Hold on a sec.

I have been lurking in this thread for a while for various reasons and
only now does a thought cross my mind worth posting : Are you saying that
a reasonably fast computer with 8GB of memory is entirely non-responsive
due to a ZFS related function?

Does the machine respond to ping?

If there is a gui does the mouse pointer move?

Does the keyboard numlock key respond at all ?

I just find it very hard to believe that such a situation could exist as I
have done some *abusive* tests on a SunFire X4100 with Sun 6120 fibre
arrays ( in HA config ) and I could not get it to become a warm brick like
you describe.

How many processors does your machine have ?

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup performance hit

2010-06-14 Thread Dennis Clarke



 You are severely RAM limited.  In order to do dedup, ZFS has to maintain
 a catalog of every single block it writes and the checksum for that
 block. This is called the Dedup Table (DDT for short).

 So, during the copy, ZFS has to (a) read a block from the old
 filesystem, (b) check the current DDT to see if that block exists and
 (c) either write the block to the new filesytem (and add an appropriate
 DDT entry for it), or write a metadata update with the dedup reference
 block reference.

 Likely, you have two problems:

 (1) I suspect your source filesystem has lots of blocks (that is, it's
 likely made up smaller-sized files).  Lots of blocks means lots of
 seeking back and forth to read all those blocks.

 (2) Lots of blocks also means lots of entries in the DDT.  It's trivial
 to overwhelm a 4GB system with a large DDT.  If the DDT can't fit in
 RAM, then it has to get partially refreshed from disk.

 Thus, here's what's likely going on:

 (1)  ZFS reads a block and it's checksum from the old filesystem
 (2)  it checks the DDT to see if that checksum exists
 (3) finding that the entire DDT isn't resident in RAM, it starts a cycle
 to read the rest of the (potential) entries from the new filesystems'
 metadata.  That is, it tries to reconstruct the DDT from disk.  Which
 involves a HUGE amount of random seek reads on the new filesystem.

 In essence, since you likely can't fit the DDT in RAM, each block read
 from the old filesystem forces a flurry of reads from the new
 filesystem. Which eats up the IOPS that your single pool can provide.
 It thrashes the disks.  Your solution is to either buy more RAM, or find
 something you can use as an L2ARC cache device for your pool.  Ideally,
 it would be an SSD.  However, in this case, a plain hard drive would do
 OK (NOT one already in a pool).To add such a device, you would do:
 'zpool add tank mycachedevice'



That was an awesome response!  Thank you for that :-)
I tend to config my servers with 16G of ram minimum these days and now I
know why.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Dennis Clarke


 Re-read the section onSwap Space and Virtual Memory for particulars on
 how Solaris does virtual memory mapping, and the concept of Virtual Swap
 Space, which is what 'swap -s' is really reporting on.

The Solaris Internals book is awesome for this sort of thing. A bit over
the top in detail but awesome regardless.

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread Dennis Clarke

c4t2004CFA4D655d0s0  ONLINE   0 0 0
c4t2004CF9B63D0d0s0  ONLINE   0 0 0

So the manner in which any given IO transaction gets to the zfs filesystem
just gets ever more complicated and convoluted and it makes me wonder if I
am tossing away performance to get higher levels of safety.

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs/lofi/share panic

2010-05-27 Thread Dennis Clarke


  FWIW (even on a freshly booted system after a panic)
  # lofiadm zyzzy.iso /dev/lofi/1
  # mount -F hsfs /dev/lofi/1 /mnt
  mount: /dev/lofi/1 is already mounted or /mnt is busy
  # mount -O -F hsfs /dev/lofi/1 /mnt
  # share /mnt
  #

  If you unshare /mnt and then do this again, it will panic.
  This has been a bug since before Open Solaris came out.


I just tried this with a UFS based filesystem just for a lark.

r...@aequitas:/# mkdir /testfs
r...@aequitas:/# mount -F ufs -o noatime,nologging /dev/dsk/c0d1s0 /testfs
r...@aequitas:/# ls -l /testfs/sol\-nv\-b130\-x86\-dvd.iso
-rw-r--r-- 1 root root 3818782720 Feb  5 16:02
/testfs/sol-nv-b130-x86-dvd.iso

r...@aequitas:/# lofiadm -a /testfs/sol-nv-b130-x86-dvd.iso
May 27 21:08:58 aequitas pseudo: pseudo-device: lofi0
May 27 21:08:58 aequitas genunix: lofi0 is /pseudo/l...@0
May 27 21:08:58 aequitas rootnex: xsvc0 at root: space 0 offset 0
May 27 21:08:58 aequitas genunix: xsvc0 is /x...@0,0
May 27 21:08:58 aequitas pseudo: pseudo-device: devinfo0
May 27 21:08:58 aequitas genunix: devinfo0 is /pseudo/devi...@0
/dev/lofi/1
r...@aequitas:/# mount -F hsfs -o ro /dev/lofi/1 /mnt
r...@aequitas:/# share -F nfs -o nosub,nosuid,sec=sys,ro,anon=0 /mnt

Then at a Sol 10 server :

# uname -a
SunOS jupiter 5.10 Generic_142900-11 sun4u sparc SUNW,Sun-Fire-480R

# dfshares aequitas
RESOURCE  SERVER ACCESSTRANSPORT
  aequitas:/mnt aequitas  - -
#
# mount -F nfs -o bg,intr,nosuid,ro,vers=4 aequitas:/mnt /mnt

# ls /mnt
Copyrightautorun.inf
JDS-THIRDPARTYLICENSEREADME  autorun.sh
License  boot
README.txt   installer
Solaris_11   sddtool
Sun_HPC_ClusterTools
# umount aequitas:/mnt
# dfshares aequitas
RESOURCE  SERVER ACCESSTRANSPORT
  aequitas:/mnt aequitas  - -

Then back at the snv_138 box I unshare and re-share and ... nothing bad
happens.

r...@aequitas:/# unshare /mnt
r...@aequitas:/# share -F nfs -o nosub,nosuid,sec=sys,ro,anon=0 /mnt
r...@aequitas:/# unshare /mnt
r...@aequitas:/#

Guess I must now try this with a ZFS fs under that iso file.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-17 Thread Dennis Clarke


On 05-17-10, Thomas Burgess wonsl...@gmail.com wrote: 
psrinfo -pv shows:

The physical processor has 8 virtual processors (0-7)
    x86  (AuthenticAMD 100F91 family 16 model 9 step 1 clock 200 MHz)
               AMD Opteron(tm) Processor 6128   [  Socket: G34 ]


That's odd.

Please try this : 

# kstat -m cpu_info -c misc
module: cpu_infoinstance: 0
name:   cpu_info0   class:misc
brand   VIA Esther processor 1200MHz
cache_id0
chip_id 0
clock_MHz   1200
clog_id 0
core_id 0
cpu_typei386
crtime  3288.24125364
current_clock_Hz1199974847
current_cstate  0
family  6
fpu_typei387 compatible
implementation  x86 (CentaurHauls 6A9 family 6 model 10 
step 9 clock 1200 MHz)
model   10
ncore_per_chip  1
ncpu_per_chip   1
pg_id   -1
pkg_core_id 0
snaptime1526742.97169617
socket_type Unknown
state   on-line
state_begin 1272610247
stepping9
supported_frequencies_Hz1199974847
supported_max_cstates   0
vendor_id   CentaurHauls

You should get a LOT more data.

Dennis 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-15 Thread Dennis Clarke

- Original Message -
From: Thomas Burgess wonsl...@gmail.com
Date: Saturday, May 15, 2010 8:09 pm
Subject: Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
To: Orvar Korvar knatte_fnatte_tja...@yahoo.com
Cc: zfs-discuss@opensolaris.org

 Well i just wanted to let everyone know that preliminary results are good.
  The livecd booted, all important things seem to be recognized. It 
 sees all
 16 gb of ram i installed and all 8 cores of my opteron 6128

 The only real shocker is how loud the norco RPC-4220 fans are (i have
 another machine with a norco 4020 case so i assumed the fans would be
 similar.this was a BAD assumption)  This thing sounds like a hair 
 dryer

 Anyways, I'm running the install now so we'll see how that goes. It 
 did take
 about 10 minutes to find a disk durring the installer, but if i remember
 right, this happened on other machines as well.

Once you have the install done could you post ( somewhere ) what you see during 
a single user mode boot with options -srv ? 

I would like to see all the gory details.

Also, could you run cpustat -h ? 

At the bottom, according to usr/src/uts/intel/pcbe/opteron_pcbe.c you shoud see 
: 

See BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors 
(AMD publication 31116)

The following registers should be listed : 

 #defineAMD_FAMILY_10h_generic_events   
\
{ PAPI_tlb_dm,DC_dtlb_L1_miss_L2_miss,  0x7 },  \
{ PAPI_tlb_im,IC_itlb_L1_miss_L2_miss,  0x3 },  \
{ PAPI_l3_dcr,L3_read_req,  0xf1 }, \
{ PAPI_l3_icr,L3_read_req,  0xf2 }, \
{ PAPI_l3_tcr,L3_read_req,  0xf7 }, \
{ PAPI_l3_stm,L3_miss,  0xf4 }, \
{ PAPI_l3_ldm,L3_miss,  0xf3 }, \
{ PAPI_l3_tcm,L3_miss,  0xf7 }

You should NOT see anything like this : 

r...@aequitas:/root# uname -a
SunOS aequitas 5.11 snv_139 i86pc i386 i86pc Solaris
r...@aequitas:/root# cpustat -h
cpustat: cannot access performance counters - Operation not applicable

... as well as psrinfo -pv please ? 

When I get my HP Proliant with the 6174 procs I'll be sure to post whatever I 
see. 

Dennis 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?

2010-05-12 Thread Dennis Clarke


 
 Bit of a chicken and egg that, isn't it?

 You need to run the tool to see if the board's worth buying and you need
 to buy the board to run the tool!


 *Somebody* has to be that first early adopter.  After that, we all get
 to ride on their experience.

I am sure the Tier-1 stuff will work just fine. I have an HP unit on order
thus :

HP Proliant DL165G7 server, 1U Rack Server,
2 × AMD Opteron Processor Model 6172 ( 12 core, 2.1 GHz, 12MB Level 3
Cache, 80W),
dual socket configuration for 24-cores in total, 16GB (8 x 2GB) Advanced
ECC PC3-10600R (RDIMM) memory,
Twenty Four DIMM slots, 2 PCI-E Slots ( 1 PCI Express expansion slot 1,
low-profile, half-length and PCI Express expansion slot 2 full height full
length ×16 75W +EXT 75W with optional PCI-X support ),
2x HP NC362i Integrated Dual Port Gigabit Server Adapter,
Storage Controller (1) Smart Array P410i/256MB BBWC, single HP 500W CS HE
Power Supply,
no internal HDD, slim height 9.5mm DVD included, no OS - no Monitor, 3
year warranty

So when it gets in I'll toss it into a rack, hook up a serial cable and
then boot *whatever* as verbosely as possible.[1]

If you want you can ssh in to the blastwave server farm and jump on that
also ... I'm always game to play with such things.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris

[1] ummm No, I won't be installing Microsoft Windows 7 64-bit Ultimate
Edition.

 .. or maybe I will :-P



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] why both dedup and compression?

2010-05-07 Thread Dennis Clarke


 On 06/05/2010 21:07, Erik Trimble wrote:
 VM images contain large quantities of executable files, most of which
 compress poorly, if at all.

 What data are you basing that generalisation on ?

note : I can't believe someone said that.

warning : I just detected a fast rise time on my pedantic input line and I
am in full geek herd mode :
http://www.blastwave.org/dclarke/blog/?q=node/160

The degree to which a file can be compressed is often related to the
degree of randomness or entropy in the bit sequences in that file. We
tend to look at files in chunks of bits called bytes or words or
blocks of some given length but the harsh reality is that it is just a
sequence of ones and zero values and nothing more. However I can spot
blocks or patterns in there and then create tokens that represent
repeating blocks. If you want a really random file that you are certain
has nearly perfect high entropy then just get a coin and flip it 1024
times while recording the heads and tails results. Then input that data
into a file as a sequence of ones and zero bits and you have a very neatly
random chunk of data.

Good luck trying to compress that thing.

Pardon me .. here it comes. I spent waay too many years in labs doing work
with RNG hardware and software to just look the other way. And I'm in a
good mood.

Suppose that C is soem discrete random variable. That means that C can
have well defined values like HEAD or TAIL. You usually have a bunch ( n
of them ) of possible values x1, x2, x3, ..., xn that C can be. Each of
those shows up in the data set with specific propabilities p1, p2, p3,
..., pn where the sum of those add to exactly one. This means that x1 will
appear in the dataset with an expected probability of p1. All of those
propabilities are expressed as a value between 0 and 1. A value of 1 means
certainty. Okay, so in the case of a coin ( not the one in Bat Man The
Dark Knight ) you have x1=TAIL and x2=HEAD with ( we hope ) p1=0.5=p2 such
that p1+p2 = 1 exactly unless the coin lands on its edge and the universe
collapses due to entropy implosion. That is a joke. I used to teach this
as a TA in university so bear with me.

So go flip a coin a few thousand times and you will get fairly random
data. That is a Random Number Generator that you have and its always
kicking around your lab or in your pocket or on the street. Pretty cheap
but the baud rate is hellish low.

If you get tired of flipping bits using a coin then you may have to just
give up on that ( or buy a radioactive source where you can monitor
particles emitted as it decays for input data ) OR be really cheap and
look at /dev/urandom on a decent Solaris machine :

$ ls -lap /dev/urandom
lrwxrwxrwx   1 root root  34 Jul  3  2008 /dev/urandom -
../devices/pseudo/ran...@0:urandom

That thing right there is a pseudo random number generator. It will make
for really random data but there is no promise that over a given number of
bits that the sum p1 + p2 will be precisely 1.  It will be real real close
however to a very random ( high entropy ) data source.

Need 1024 bits of random data ?

$ /usr/xpg4/bin/od -Ax -N 128 -t x1 /dev/urandom
000 ef c6 2b ba 29 eb dd ec 6d 73 36 06 58 33 c8 be
010 53 fa 90 a2 a2 70 25 5f 67 1b c3 72 4f 26 c6 54
020 e9 83 44 c6 b9 45 3f 88 25 0c 4d c7 bc d5 77 58
030 d3 94 8e 4e e1 dd 71 02 dc c2 d0 19 f6 f4 5c 44
040 ff 84 56 9f 29 2a e5 00 33 d2 10 a4 d2 8a 13 56
050 d1 ac 86 46 4d 1e 2f 10 d9 0b 33 d7 c2 d4 ef df
060 d9 a2 0b 7f 24 05 72 39 2d a6 75 25 01 bd 41 6c
070 eb d9 4f 23 d9 ee 05 67 61 7c 8a 3d 5f 3a 76 e3
080

There ya go. That was faster than flipping a coin eh? ( my Canadian bit
just flipped )

So you were saying ( or someone somewhere had the crazy idea that ZFS with
dedupe and compression enabled ) won't really be of great benefit because
of all the binary files in the filesystem. Well thats just nuts. Sorry but
it is. Those binary files are made up of ELF headers and opcodes from a
specific set of opcodes for a given architecture and that means the input
set C consists of a discrete set of possible values and NOT pure random
high entropy data.

Want a demo ?

Here :

(1) take a nice big lib

$ uname -a
SunOS aequitas 5.11 snv_138 i86pc i386 i86pc
$ ls -lap /usr/lib | awk '{ print $5   $9 }' | sort -n | tail
4784548 libwx_gtk2u_core-2.8.so.0.6.0
4907156 libgtkmm-2.4.so.1.1.0
6403701 llib-lX11.ln
8939956 libicudata.so.2
9031420 libgs.so.8.64
9300228 libCg.so
9916268 libicudata.so.3
14046812 libicudata.so.40.1
21747700 libmlib.so.2
40736972 libwireshark.so.0.0.1

$ cp /usr/lib/libwireshark.so.0.0.1 /tmp

$ ls -l /tmp/libwireshark.so.0.0.1
-r-xr-xr-x   1 dclarke  csw  40736972 May  7 14:20
/tmp/libwireshark.so.0.0.1

What is the SHA256 hash for that file ?

$ cd /tmp

Now compress it with gzip ( a good test case ) :

$ /opt/csw/bin/gzip -9v libwireshark.so.0.0.1
libwireshark.so.0.0.1:   76.1% -- replaced with libwireshark.so.0.0.1.gz

$ ls -l

Re: [zfs-discuss] ZFS kstat Stats

2010-04-08 Thread Dennis Clarke


 Do the following ZFS stats look ok?

 ::memstat
 Page Summary Pages MB %Tot
    
 Kernel 106619 832 28%
 ZFS File Data 79817 623 21%
 Anon 28553 223 7%
 Exec and libs 3055 23 1%
 Page cache 18024 140 5%
 Free (cachelist) 2880 22 1%
 Free (freelist) 146309 1143 38%

 Total 385257 3009
 Physical 367243 2869

Looks beautiful.

Just for giggles try this :

r...@aequitas:/root# uname -a
SunOS aequitas 5.11 snv_136 i86pc i386 i86pc Solaris
r...@aequitas:/root#
r...@aequitas:/root# /bin/printf ::kmastat\n | mdb -k
cachebufbufbufmemory alloc alloc
namesize in use  totalin use   succeed  fail
- -- -- -- -- - -
kmem_magazine_18   8595   8736 212992B  8595 0
kmem_magazine_3   16   3697   3780 122880B  3697 0
kmem_magazine_7   32   7633   7686 499712B  7633 0
kmem_magazine_15  64  11642  116561540096B 11642 0
.
. etc etc
.
nfs4_access_cache 32  0  0  0B 0 0
client_handle4_cache  16  0  0  0B 0 0
nfs4_ace4vals_cache   36  0  0  0B 0 0
nfs4_ace4_list_cache 176  0  0  0B 0 0
NFS_idmap_cache   24  0  0  0B 0 0
pty_map   48  0 64   4096B 1 0
-- - -- -- -- - -
Total [hat_memload]974848B   1306984 0
Total [kmem_msb] 56860672B506215 0
Total [kmem_va]  78249984B 12180 0
Total [kmem_default] 76316672B   8546762 0
Total [kmem_io_1G]   36712448B  8643 0
Total [bp_map]  0B   212 0
Total [segkp] 6356992B186825 0
Total [umem_np] 0B   148 0
Total [ip_minor_arena_sa]  64B   180 0
Total [spdsock] 0B 1 0
Total [namefs_inodes]  64B18 0
-- - -- -- -- - -
.
. etc etc
.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-10 Thread Dennis Clarke


 ea == erik ableson eable...@me.com writes:
 dc == Dennis Clarke dcla...@blastwave.org writes:

   rw,ro...@100.198.100.0/24, it works fine, and the NFS client
   can do the write without error.

 ea I' ve found that the NFS host based settings required the
 ea FQDN, and that the reverse lookup must be available in your
 ea DNS.

 I found, oddly, the @a.b.c.d/y syntax works only if the client's IP
 has reverse lookup.  I had to add bogus hostnames to /etc/hosts for
 the whole /24 because if I didn't, for v3 it would reject mounts
 immediately, and for v4 mountd would core dump (and get restarted)
 which you see from the client as a mount that appears to hang.  This
 is all using the @ip/mask syntax.

I have LDAP and DNS in place for name resolution and NFS v4 works fine
with either format in the sharenfs parameter. Never seen a problem. The
Solaris 8 an 9 NFS clients work fine also.


  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6901832

 If you use hostnames instead, it makes sense that you would have to
 use FQDN's.  If you want to rewrite mountd to allow using short
 hostnames, the access checking has to be done like this:

   at export time:
 given hostname- forward nss lookup - list of IP's - remember IP's

   at mount time:
 client IP - check against list of remembered IP's

 but with fqdn's it can be:

   at export time:
 given hostname - remember it

   at mount time:
  client IP - reverse nss lookup - check against remembered list
\--forward lookup-verify client IP among results

 The second way, all the lookups happen at mount time rather than
 export time.  This way the data in the nameservice can change without
 forcing you to learn and then invoke some kind of ``rescan the
 exported filesystems'' command or making mountd remember TTL's for its
 cached nss data, or any such complexity.  Keep all the nameservice
 caching inside nscd so there is only one place to flush it!  However
 the forward lookup is mandatory for security, not optional OCDism.
 Without it, anyone from any IP can access your NFS server so long as
 he has control of his reverse lookup, which he probably does.  I hope
 mountd is doing that forward lookup!

 dc Try to use a backslash to escape those special chars like so :

 dc zfs set
 dc sharenfs=nosub\,nosuid\,rw\=hostname1\:hostname2\,root\=hostname2
 dc zpoolname/zfsname/pathname

 wth?  Commas and colons are not special characters.  This is silly.

Works real well.

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-09 Thread Dennis Clarke


 Hi All,
 I had create a ZFS filesystem test and shared it with zfs set
 sharenfs=root=host1 test, and I checked the sharenfs option and it
 already update to root=host1:

Try to use a backslash to escape those special chars like so :


zfs set sharenfs=nosub\,nosuid\,rw\=hostname1\:hostname2\,root\=hostname2
zpoolname/zfsname/pathname

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] getting drive serial number

2010-03-07 Thread Dennis Clarke


 On Sun, Mar 7, 2010 at 12:30 PM, Ethan notet...@gmail.com wrote:

 I have a failing drive, and no way to correlate the device with errors
 in
 the zpool status with an actual physical drive.
 If I could get the device's serial number, I could use that as it's
 printed
 on the drive.
 I come from linux, so I tried dmesg, as that's what's familiar (I see
 that
 the man page for dmesg on opensolaris says that I should be using
 syslogd
 but I haven't been able to figure out how to get the same output from
 syslogd). But, while I see at the top the serial numbers for some other
 drives, I don't see the one I want because it seems to be scrolled off
 the
 top.
 Can anyone tell me how to get the serial number of my failing drive? Or
 some other way to correlate the device with the physical drive?

 -Ethan



 smartctl will do what you're looking for.  I'm not sure if it's included
 by
 default or not with the latest builds.  Here's the package if you need to
 build from source:
 http://smartmontools.sourceforge.net/


You can find it at http://blastwave.network.com/csw/unstable/

Just install it with pkgadd or use pkgtrans to extract it and then run the
binary.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] false DEGRADED status based on cannot open device at boot.

2010-02-17 Thread Dennis Clarke


I find that some servers display a DEGRADED zpool status at boot. More
troubling is that this seems to be silent and no notice is given on the
console or via a snmp message or other notification process.

Let me demonstrate :

{0} ok boot -srv

Sun Blade 2500 (Silver), No Keyboard
Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.17.3, 4096 MB memory installed, Serial #64510477.
Ethernet address 0:3:ba:d8:5a:d, Host ID: 83d85a0d.



Rebooting with command: boot -srv
Boot device: /p...@1d,70/s...@4,1/d...@0,0:a  File and args: -srv
module /platform/sun4u/kernel/sparcv9/unix: text at [0x100, 0x10a3695]
data at 0x180
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10a3698,
0x126bbf7] data at 0x1866840
module /platform/SUNW,Sun-Blade-2500/kernel/misc/sparcv9/platmod: text at
[0x126bbf8, 0x126c1e7] data at 0x18bc0c8
.
.
. many lines of verbose messages
.
.

dump on /dev/zvol/dsk/mercury_rpool/swap size 0 MB
Loading smf(5) service descriptions: 2/2
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode

# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
mercury_rpool68G  27.4G  40.6G40%  DEGRADED  -
# zpool status mercury_rpool
  pool: mercury_rpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
mercury_rpool  DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c3t0d0s0  ONLINE   0 0 0
c1t2d0s0  UNAVAIL  0 0 0  cannot open

errors: No known data errors

This is trivial to remedy :

# zpool online mercury_rpool c1t2d0s0
# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
mercury_rpool68G  27.4G  40.6G40%  ONLINE  -
# zpool status mercury_rpool
  pool: mercury_rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: resilver completed after 0h0m with 0 errors on Wed Feb 17 21:26:11
2010
config:

NAME  STATE READ WRITE CKSUM
mercury_rpool  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t0d0s0  ONLINE   0 0 0
c1t2d0s0  ONLINE   0 0 0  14.5M resilvered

errors: No known data errors
#

I have many systems where I keep mirrors on multiple controllers, either
fibre or SCSI. It seems that the SCSI devices don't get detected at boot
on the Sparc systems. The x86/AMD64 systems do not seem to have this
problem but I may be wrong.

Is this a known bug or am I seeing something due to a missing line in
/etc/system ?

Oh, also, I should point out that it does not matter if I boot with init S
or 3 or 6.

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Detach ZFS Mirror

2010-02-11 Thread Dennis Clarke


 I have a 2-disk/2-way mirror and was wondering if I can remove 1/2 the
 mirror and plunk it in another system?

You can remove it fine. You can plunk it in another system fine.
I think you will end up with the same zpool name and id number.
Also, I do not know if that disk would be bootable. You probably have to
go through the installboot procedure for that.

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] possible to remove a mirror pair from a zpool?

2010-01-10 Thread Dennis Clarke


Suppose the requirements for storage shrink ( it can happen ) is it
possible to remove a mirror set from a zpool?

Given this :

# zpool status array03
  pool: array03
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: resilver completed after 0h41m with 0 errors on Sat Jan  9
22:54:11 2010
config:

NAME STATE READ WRITE CKSUM
array03  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t19d0  ONLINE   0 0 0
c5t5d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors

Suppose I want to power down the disks c2t19d0 and c5t5d0 because they are
not needed. One can easily picture a thumper with many disks unused and
see reasons why one would want to power off disks.

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] possible to remove a mirror pair from a zpool?

2010-01-10 Thread Dennis Clarke


 No, sorry Dennis, this functionality doesn't exist yet, but
 is being worked,
 but will take a while, lots of corner cases to handle.

 James Dickens
 uadmin.blogspot.com

1 ) dammit

2 ) looks like I need to do a full offline backup and then restore
to shrink a zpool.

As usual, Thanks for always being there James.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Clearing a directory with more than 60 million files

2010-01-05 Thread Dennis Clarke


 On Tue, January 5, 2010 10:12, casper@sun.com wrote:

How about creating a new data set, moving the directory into it, and
 then
destroying it?

Assuming the directory in question is /opt/MYapp/data:
  1. zfs create rpool/junk
  2. mv /opt/MYapp/data /rpool/junk/
  3. zfs destroy rpool/junk

 The move will create and remove the files; the remove by mv will be
 as
 inefficient removing them one by one.

 rm -rf would be at least as quick.

 Normally when you do a move with-in a 'regular' file system all that's
 usually done is the directory pointer is shuffled around. This is not the
 case with ZFS data sets, even though they're on the same pool?


You can also use star which may speed things up, safely.

star -copy -p -acl -sparse -dump -xdir -xdot -fs=96m -fifostats -time \
-C source_dir . destination_dir


that will buffer the transport of the data from source to dest via memory
and work to keep that buffer full as data is written on the output side.
Its probably at least as fast as mv and probably safer because you never
delete the original until after the copy is complete.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] invalid mountpoint 'mountpoint=legacy' ?

2009-12-22 Thread Dennis Clarke


Anyone seen this odd message ? It seems a tad counter intuitive.

# uname -a
SunOS gamma 5.11 snv_126 sun4u sparc SUNW,Sun-Fire-480R
# cat /etc/release
 Solaris Express Community Edition snv_126 SPARC
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 19 October 2009

# ptime zpool create -f -o autoreplace=on -o version=10 \
 -m mountpoint=legacy \
 fibre01 mirror c2t0d0 c3t16d0 \
 mirror c2t1d0 c3t17d0 \
 mirror c2t2d0 c3t18d0 \
 mirror c2t3d0 c3t19d0 \
 mirror c2t4d0 c3t20d0 \
 mirror c2t5d0 c3t21d0 \
 mirror c2t6d0 c3t22d0 \
 mirror c2t7d0 c3t23d0 \
 mirror c2t8d0 c3t24d0 \
 spare c2t10d0
invalid mountpoint 'mountpoint=legacy': must be an absolute path,
'legacy', or 'none'

real   14.884950400
user0.998020300
sys 3.334027400


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] invalid mountpoint 'mountpoint=legacy' ?

2009-12-22 Thread Dennis Clarke


I hate it when I do that .. 30 secs later I see -m mountpoint which is a
Property but not specified as -o foo-bar format.

erk

# ptime zpool create -f -o autoreplace=on -o version=10 \
 -m legacy \
 fibre01 mirror c2t0d0 c3t16d0 \
 mirror c2t1d0 c3t17d0 \
 mirror c2t2d0 c3t18d0 \
 mirror c2t3d0 c3t19d0 \
 mirror c2t4d0 c3t20d0 \
 mirror c2t5d0 c3t21d0 \
 mirror c2t6d0 c3t22d0 \
 mirror c2t7d0 c3t23d0 \
 mirror c2t8d0 c3t24d0 \
 spare c2t10d0

real   12.367204400
user0.712670500
sys 2.022335900
#

# zpool status fibre01
  pool: fibre01
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fibre01  ONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
c2t0d0   ONLINE   0 0 0
c3t16d0  ONLINE   0 0 0
  mirror-1   ONLINE   0 0 0
c2t1d0   ONLINE   0 0 0
c3t17d0  ONLINE   0 0 0
  mirror-2   ONLINE   0 0 0
c2t2d0   ONLINE   0 0 0
c3t18d0  ONLINE   0 0 0
  mirror-3   ONLINE   0 0 0
c2t3d0   ONLINE   0 0 0
c3t19d0  ONLINE   0 0 0
  mirror-4   ONLINE   0 0 0
c2t4d0   ONLINE   0 0 0
c3t20d0  ONLINE   0 0 0
  mirror-5   ONLINE   0 0 0
c2t5d0   ONLINE   0 0 0
c3t21d0  ONLINE   0 0 0
  mirror-6   ONLINE   0 0 0
c2t6d0   ONLINE   0 0 0
c3t22d0  ONLINE   0 0 0
  mirror-7   ONLINE   0 0 0
c2t7d0   ONLINE   0 0 0
c3t23d0  ONLINE   0 0 0
  mirror-8   ONLINE   0 0 0
c2t8d0   ONLINE   0 0 0
c3t24d0  ONLINE   0 0 0
spares
  c2t10d0AVAIL

errors: No known data errors

Dennis



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)

2009-12-10 Thread Dennis Clarke


 We have just update a major file server to solaris 10 update 9 so that we
 can control user and group disk usage on a single filesystem.

 We were using qfs and one nice thing about samquota was that it told you
 your soft limit, your hard limit and your usage on disk space and on the
 number of files.

 Is there , on solaris 10 U9 a command which will report

A lot of folks will want a similar functionality on everything from Sol 10
up to snv_129.

I have a few experimental systems running thus :

Sun Microsystems Inc.   SunOS 5.11  snv_129 Dec. 01, 2009
SunOS Internal Development: root 2009-Nov-30 [onnv_129-tonic]
bfu'ed from /build/archives-nightly-osol/sparc on 2009-12-04
Sun Microsystems Inc.   SunOS 5.11  snv_126 November 2008

$ zpool upgrade
This system is currently running ZFS pool version 22.

All pools are formatted using this version.


When we take into consideration the effects of compression and dedupe it
can get difficult to answer the very basic question How much space do I
have left?

Perhaps a better question is How much space do I have left given a worst
case scenario?

I have pushed many copies of the exact same data into a ZFS filesystem
with both compression and dedupe and watched as the actual space used was
trivial. With a classic filesystem ( UFS ) we can generally answer the
question quickly.

One blunt object method would be to allocate a filesystem per user such
that zfs list reports a long list of names under /export/home or similar.
Then you can easily see the used space per filesystem.  Allocating user
quotas and then asking the simple questions seems mysterious to me also.

I am looking into this for my own reasons and will stay in touch.

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Dennis Clarke


 FYI,
 OpenSolaris b128a is available for download or image-update from the
 dev repository.  Enjoy.

I thought that dedupe has been out for weeks now ?

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] b128a available w/deduplication

2009-12-03 Thread Dennis Clarke


 Dennis Clarke wrote:
 FYI,
 OpenSolaris b128a is available for download or image-update from the
 dev repository.  Enjoy.

 I thought that dedupe has been out for weeks now ?

 The source has, yes. But what Richard was referring to was the
 respun build now available via IPS.

Oh, sorry. Thought I had missed something. I hadn't :-)

I'm not on version 22 for ZFS and am not even entirely sure what that is :


# uname -a
SunOS europa 5.11 snv_129 sun4u sparc SUNW,UltraAX-i2

# zpool upgrade -v
This system is currently running ZFS pool version 22.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties

For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.

HOWEVER, that URL no longer works for N  19 and in fact, the entire URL
has changed to :

http://hub.opensolaris.org/bin/view/Community+Group+zfs/22



-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe question

2009-11-08 Thread Dennis Clarke


 On Sat, 7 Nov 2009, Dennis Clarke wrote:

 Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2
 directories named [a-z][a-z] where each file is 64K of random
 non-compressible data and then some english text.

 What method did you use to produce this random data?

I'm using the tt800 method from Makoto Matsumoto described here :

see http://random.mat.sbg.ac.at/generators/

and then here :

/*
 * Generate the random text before we need it and also
 * outside of the area that measures the IO time.
 * We could have just read bytes from /dev/urandom but
 * you would be *amazed* how slow that is.
 */
random_buffer_start_hrt = gethrtime();
if ( random_buffer_start_hrt == -1 ) {
perror(Could not get random_buffer high res start time);
exit(EXIT_FAILURE);
}
for ( char_count = 0; char_count  65535; ++char_count ) {
k_index = (int) ( genrand() * (double) 62 );
buffer_64k_rand_text[char_count]=alph[k_index];
}
/* would be nice to break this into 0x40h char lines */
for ( p = 0x03fu; p  65535; p = p + 0x040u )
buffer_64k_rand_text[p]='\n';
buffer_64k_rand_text[65535]='\n';
buffer_64k_rand_text[65536]='\0';
   random_buffer_end_hrt = gethrtime();

That works well.

You know what ... I'm a schmuck.  I didn't grab a time based seed first.
All those files with random text .. have identical twins on the filesystem
somewhere. :-P  damn

I'll go fix that.

 The dedupe ratio has climbed to 1.95x with all those unique files that
 are less than %recordsize% bytes.

 Perhaps there are other types of blocks besides user data blocks (e.g.
 metadata blocks) which become subject to deduplication?  Presumably
 'dedupratio' is based on a count of blocks rather than percentage of
 total data.

I have no idea .. yet.  I figure I'll try a few more experiments to see
what it does and maybe, dare I say it, look at the source :-)

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe question

2009-11-08 Thread Dennis Clarke



 You can get more dedup information by running 'zdb -DD zp_dd'. This
 should show you how we break things down. Add more 'D' options and get
 even more detail.

 - George

OKay .. thank you. Looks like I have piles of numbers here :

# zdb -DDD zp_dd
DDT-sha256-zap-duplicate: 37317 entries, size 342 on disk, 210 in core

bucket  allocated   referenced
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 218.4K763M355M355M37.9K   1.52G727M727M
 418.0K   1.16G   1.15G   1.15G72.4K   4.67G   4.61G   4.61G
 8   70   1.47M849K849K  657   12.0M   6.78M   6.78M
16   27   39.5K   31.5K   31.5K  535747K598K598K
326  4K  4K  4K  276180K180K180K
644   9.00K   6.50K   6.50K  340680K481K481K
   1281  2K   1.50K   1.50K  170340K255K255K
   2561  1K  1K  1K  313313K313K313K
   5121 512 512 512  522261K261K261K
 Total36.4K   1.91G   1.50G   1.50G 113K   6.21G   5.33G   5.33G

DDT-sha256-zap-unique: 154826 entries, size 335 on disk, 196 in core

bucket  allocated   referenced
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 1 151K   5.61G   2.52G   2.52G 151K   5.61G   2.52G   2.52G
 Total 151K   5.61G   2.52G   2.52G 151K   5.61G   2.52G   2.52G


DDT histogram (aggregated over all DDTs):

bucket  allocated   referenced
__   __   __
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
--   --   -   -   -   --   -   -   -
 1 151K   5.61G   2.52G   2.52G 151K   5.61G   2.52G   2.52G
 218.4K763M355M355M37.9K   1.52G727M727M
 418.0K   1.16G   1.15G   1.15G72.4K   4.67G   4.61G   4.61G
 8   70   1.47M849K849K  657   12.0M   6.78M   6.78M
16   27   39.5K   31.5K   31.5K  535747K598K598K
326  4K  4K  4K  276180K180K180K
644   9.00K   6.50K   6.50K  340680K481K481K
   1281  2K   1.50K   1.50K  170340K255K255K
   2561  1K  1K  1K  313313K313K313K
   5121 512 512 512  522261K261K261K
 Total 188K   7.52G   4.01G   4.01G 264K   11.8G   7.85G   7.85G

dedup = 1.96, compress = 1.51, copies = 1.00, dedup * compress / copies =
2.95

#

I have no idea what any of that means, yet :-)

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Quick dedup question

2009-11-07 Thread Dennis Clarke

 18   local
neptune_rpool  dedupratio  1.00x-
neptune_rpool  free12.5G-
neptune_rpool  allocated   21.3G-

I'm currently running tests with this :

http://www.blastwave.org/dclarke/crucible_source.txt

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] dedupe question

2009-11-07 Thread Dennis Clarke


Does the dedupe functionality happen at the file level or a lower block
level?

I am writing a large number of files that have the fol structure :

-- file begins
1024 lines of random ASCII chars 64 chars long
some tilde chars .. about 1000 of then
some text ( english ) for 2K
more text ( english ) for 700 bytes or so
--

Each file has the same tilde chars and then english text at the end of 64K
of random character data.

Before writing the data I see :

# zpool get size,capacity,version,dedupratio,free,allocated zp_dd
NAME   PROPERTYVALUESOURCE
zp_dd  size67.5G-
zp_dd  capacity6%   -
zp_dd  version 21   default
zp_dd  dedupratio  1.16x-
zp_dd  free63.3G-
zp_dd  allocated   4.19G-

After I see this :

# zpool get size,capacity,version,dedupratio,free,allocated zp_dd
NAME   PROPERTYVALUESOURCE
zp_dd  size67.5G-
zp_dd  capacity6%   -
zp_dd  version 21   default
zp_dd  dedupratio  1.11x-
zp_dd  free63.1G-
zp_dd  allocated   4.36G-


Note the drop in dedup ratio from 1.16x to 1.11x which seems to indicate
that dedupe does not detect the english text is identical in every file.



-- 
Dennis



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe question

2009-11-07 Thread Dennis Clarke


 On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote:
 Does the dedupe functionality happen at the file level or a lower block
 level?

 it occurs at the block allocation level.

 I am writing a large number of files that have the fol structure :

 -- file begins
 1024 lines of random ASCII chars 64 chars long
 some tilde chars .. about 1000 of then
 some text ( english ) for 2K
 more text ( english ) for 700 bytes or so
 --

 ZFS's default block size is 128K and is controlled by the recordsize
 filesystem property.  Unless you changed recordsize, each of the files
 above would be a single block distinct from the others.

 you may or may not get better dedup ratios with a smaller recordsize
 depending on how the common parts of the file line up with block
 boundaries.

 the cost of additional indirect blocks might overwhelm the savings from
 deduping a small common piece of the file.

   - Bill

Well, I as curious about these sort of things and figured that a simple
test would show me the behavior.

Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2
directories named [a-z][a-z] where each file is 64K of random
non-compressible data and then some english text.

I guess I was wrong about the 64K random text chunk also .. because I
wrote out that data as chars from the set { [A-Z][a-z][0-9] } and thus ..
compressible ASCII data as opposed to random binary data.

So ... after doing that a few times I now see something fascinating :

$ ls -lo /tester/foo/*/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:38 /tester/foo/1/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:45 /tester/foo/2/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:43 /tester/foo/3/aa/aa.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:43 /tester/foo/4/aa/aa.dat
$ ls -lo /tester/foo/*/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:39 /tester/foo/1/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:47 /tester/foo/2/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:45 /tester/foo/3/zz/az.dat
-rw-r--r--   1 dclarke68330 Nov  7 22:47 /tester/foo/4/zz/az.dat

$ find /tester/foo -type f | wc -l
   70304

Those files, all 70,000+ of them, are unique and smaller than the
filesystem blocksize.

However :

$ zfs get
used,available,referenced,compressratio,recordsize,compression,dedup
zp_dd/tester
NAME  PROPERTY   VALUE SOURCE
zp_dd/tester  used   4.51G -
zp_dd/tester  available  3.49G -
zp_dd/tester  referenced 4.51G -
zp_dd/tester  compressratio  1.00x -
zp_dd/tester  recordsize 128K  default
zp_dd/tester  compressionoff   local
zp_dd/tester  dedup  onlocal

Compression factors don't interest me at the moment .. but see this :

$ zpool get all zp_dd
NAME   PROPERTY   VALUE   SOURCE
zp_dd  size   67.5G   -
zp_dd  capacity   6%  -
zp_dd  altroot-   default
zp_dd  health ONLINE  -
zp_dd  guid   14649016030066358451  default
zp_dd  version21  default
zp_dd  bootfs -   default
zp_dd  delegation on  default
zp_dd  autoreplaceoff default
zp_dd  cachefile  -   default
zp_dd  failmode   waitdefault
zp_dd  listsnapshots  off default
zp_dd  autoexpand off default
zp_dd  dedupratio 1.95x   -
zp_dd  free   63.3G   -
zp_dd  allocated  4.22G   -

The dedupe ratio has climbed to 1.95x with all those unique files that are
less than %recordsize% bytes.

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] SunOS neptune 5.11 snv_127 sun4u sparc SUNW, Sun-Fire-880

2009-11-03 Thread Dennis Clarke


I just went through a BFU update to snv_127 on a V880 :

neptune console login: root
Password:
Nov  3 08:19:12 neptune login: ROOT LOGIN /dev/console
Last login: Mon Nov  2 16:40:36 on console
Sun Microsystems Inc.   SunOS 5.11  snv_127 Nov. 02, 2009
SunOS Internal Development: root 2009-Nov-02 [onnv_127-tonic]
bfu'ed from /build/archives-nightly-osol/sparc on 2009-11-03

I have [ high ] hopes that there was a small tarball somewhere which
contained the sources listed in :

http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html

Is there such a tarball anywhere at all or shall I just wait
 for the putback to hit the mercurial repo ?

Yes .. this is sort of begging .. but I call it enthusiasm :-)


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SunOS neptune 5.11 snv_127 sun4u sparc SUNW, Sun-Fire-880

2009-11-03 Thread Dennis Clarke


 Dennis Clarke wrote:
 I just went through a BFU update to snv_127 on a V880 :

 neptune console login: root
 Password:
 Nov  3 08:19:12 neptune login: ROOT LOGIN /dev/console
 Last login: Mon Nov  2 16:40:36 on console
 Sun Microsystems Inc.   SunOS 5.11  snv_127 Nov. 02, 2009
 SunOS Internal Development: root 2009-Nov-02 [onnv_127-tonic]
 bfu'ed from /build/archives-nightly-osol/sparc on 2009-11-03

 I have [ high ] hopes that there was a small tarball somewhere which
 contained the sources listed in :

 http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html

 Is there such a tarball anywhere at all or shall I just wait
  for the putback to hit the mercurial repo ?

 Yes .. this is sort of begging .. but I call it enthusiasm :-)

 Hi Dennis,
 we haven't done source tarballs or Mercurial bundles in quite
 some time, since it's more efficient for you to pull from the
 Mercurial repo and build it yourself :)

Well, funny you should mention it.

I was this close ( --|.|-- ) to running a nightly build and then I had a
minor brainwave .. why bother? because the sparc archive bits were there
already.

 Also, the build 127 tonic bits that I generated today (and
 which you appear to be using) won't contain Jeff's push from
 yesterday, because that changeset is part of build 128 - and
 I haven't closed the build yet.

 The push is in the repo, btw:


 changeset:   10922:e2081f502306
 user:Jeff Bonwick jeff.bonw...@sun.com
 date:Sun Nov 01 14:14:46 2009 -0800
 comments:
 PSARC 2009/571 ZFS Deduplication Properties
  6677093 zfs should have dedup capability


funny .. I didn't see it last night.  :-\

I'll blame the coffee and go get a nightly happening right away :-)

Thanks for the reply!

-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe is in

2009-11-02 Thread Dennis Clarke


 Terrific! Can't wait to read the man pages / blogs about how to use
 it...

 Just posted one:

 http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup

 Enjoy, and let me know if you have any questions or suggestions for
 follow-on posts.

Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the
major leap from SHA256 to SHA512 was a 32-bit to 64-bit step.

If the implementation of the SHA256 ( or possibly SHA512 at some point )
algorithm is well threaded then one would be able to leverage those
massively multi-core Niagara T2 servers. The SHA256 hash is based on six
32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT
Niagara T2 can easily process those 64-bit hash functions and the
multi-core CMT trend is well established. So long as context switch times
are very low one would think that IO with a SHA512 based de-dupe
implementation would be possible and even realistic. That would solve the
hash collision concern I would think.

Merely thinking out loud here ...


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] root pool can not have multiple vdevs ?

2009-10-27 Thread Dennis Clarke


This seems like a bit of a restriction ... is this intended ?

# cat /etc/release
 Solaris Express Community Edition snv_125 SPARC
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 05 October 2009

# uname -a
SunOS neptune 5.11 snv_125 sun4u sparc SUNW,Sun-Fire-880

# zpool status
  pool: neptune_rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
neptune_rpool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t3d0s0  ONLINE   0 0 0

errors: No known data errors

Now I want to add two more mirrors to that pool because the V880 has more
drives to offer, that are not used at the moment.

So I'd like to add in a mirror of c1t1d0 and c1t4d0 :

# zpool add -f neptune_rpool c1t1d0
cannot label 'c1t1d0': EFI labeled devices are not supported on root pools.

OKay .. I can live with that.

# prtvtoc -h /dev/rdsk/c1t0d0s0 | fmthard -s - /dev/rdsk/c1t1d0s0
fmthard:  New volume table of contents now in place.
# prtvtoc -h /dev/rdsk/c1t0d0s0 | fmthard -s - /dev/rdsk/c1t4d0s0
fmthard:  New volume table of contents now in place.

# zpool add -f neptune_rpool c1t1d0s0
cannot add to 'neptune_rpool': root pool can not have multiple vdevs or
separate logs

So essentially there is no way to grow that zpool. Is this the case?

-- 
Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] You really do need ECC RAM

2009-10-10 Thread Dennis Clarke


 You really do need ECC RAM, but for the naysayers:
 http://www.cs.toronto.edu/%7Ebianca/papers/sigmetrics09.pdf

There are people that still question that?  Really ?

From section 3.2 Errors per DIMM in that paper :

The mean number of correctable errors per
 DIMM are more comparable, ranging from 33514530
 correctable errors per year.

B. Schroeder, E. Pinheiro, W.-D. Weber. DRAM errors in the wild: A
Large-Scale Field Study.  Sigmetrics/Performance 2009
see http://www.cs.toronto.edu/~bianca/


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] True in U4? Tar and cpio...save and restore ZFS File attributes and ACLs

2009-10-01 Thread Dennis Clarke

/libiberty/objalloc.o is sparse
gcc-4.3.4_SunOS_5.10-release/libiberty/cplus-dem.o is sparse
gcc-4.3.4_SunOS_5.10-release/libiberty/cp-demint.o is sparse
.
.
.
gcc-4.3.4_SunOS_5.10-release/prev-libiberty/vasprintf.o is sparse
/home/dclarke/bin/star_1.5a89: fifo had 57001 puts 55368 gets.
/home/dclarke/bin/star_1.5a89: fifo was 2 times empty and 33 times full.
/home/dclarke/bin/star_1.5a89: fifo held 100669440 bytes max, size was
100669440 bytes
/home/dclarke/bin/star_1.5a89: 0 blocks + 1593968128 bytes (total of
1593968128 bytes = 1556609.50k).
/home/dclarke/bin/star_1.5a89: Total time 1735.341sec (897 kBytes/sec)
$

I should mention really poor performance also.

Look in the output dir and see the ACL indicator ?

$ ls -la /home/dclarke/test/destination
total 11
drwxr-xr-x   3 dclarke  csw3 Oct  1 08:17 .
drwxr-xr-x   3 dclarke  csw5 Oct  1 08:16 ..
drwxr-xr-x+ 22 dclarke  csw   31 Sep 25 11:40
gcc-4.3.4_SunOS_5.10-release

That should not be there.

$ cd /home/dclarke/test/destination
$ ls -lVdE gcc-4.3.4_SunOS_5.10-release
drwxr-xr-x+ 22 dclarke  csw   31 2009-09-25 11:40:09.491951000
+ gcc-4.3.4_SunOS_5.10-release
owner@:-DaA--cC-s:--:allow
owner@:--:--:deny
group@:--a---c--s:--:allow
group@:-D-A---C--:--:deny
 everyone@:--a---c--s:--:allow
 everyone@:-D-A---C--:--:deny
owner@:--:--:deny
owner@:rwxp---A-W-Co-:--:allow
group@:-w-p--:--:deny
group@:r-x---:--:allow
 everyone@:-w-p---A-W-Co-:--:deny
 everyone@:r-x---a-R-c--s:--:allow

If I look down into that dir I see ACL's on all the ddir entries :

$ cd gcc-4.3.4_SunOS_5.10-release
$ ls -l
total 1119
-rw-r--r--   1 dclarke  csw   577976 Sep 24 02:59 Makefile
drwxr-xr-x+  4 dclarke  csw5 Sep 24 03:11
build-i386-pc-solaris2.10
-rw-r--r--   1 dclarke  csw   10 Sep 24 20:58 compare
-rw-r--r--   1 dclarke  csw30323 Sep 24 02:59 config.log
-rwxr-xr-x   1 dclarke  csw31724 Sep 24 02:59 config.status
-rwxr-xr-x   1 dclarke  csw   400174 Sep 24 02:59 configure.lineno
drwxr-xr-x+  2 dclarke  csw   20 Sep 24 20:58 fixincludes
drwxr-xr-x+ 15 dclarke  csw  535 Sep 25 03:55 gcc
drwxr-xr-x+ 10 dclarke  csw   10 Sep 24 21:23 i386-pc-solaris2.10
drwxr-xr-x+  2 dclarke  csw   32 Sep 25 10:39 intl
drwxr-xr-x+  3 dclarke  csw   29 Sep 24 20:44 libcpp
drwxr-xr-x+  2 dclarke  csw   15 Sep 24 20:44 libdecnumber
drwxr-xr-x+  4 dclarke  csw   72 Sep 24 20:43 libiberty
drwxr-xr-x+ 14 dclarke  csw  532 Sep 24 20:41 prev-gcc
drwxr-xr-x+  4 dclarke  csw4 Sep 24 20:40
prev-i386-pc-solaris2.10
drwxr-xr-x+  2 dclarke  csw   32 Sep 24 19:56 prev-intl
drwxr-xr-x+  3 dclarke  csw   29 Sep 24 19:57 prev-libcpp
drwxr-xr-x+  2 dclarke  csw   15 Sep 24 19:58 prev-libdecnumber
drwxr-xr-x+  4 dclarke  csw   72 Sep 24 19:56 prev-libiberty
-rw-r--r--   1 dclarke  csw   13 Sep 24 02:59 serdep.tmp
drwxr-xr-x+ 14 dclarke  csw  520 Sep 24 19:53 stage1-gcc
drwxr-xr-x+  4 dclarke  csw4 Sep 24 19:51
stage1-i386-pc-solaris2.10
drwxr-xr-x+  2 dclarke  csw   32 Sep 24 03:09 stage1-intl
drwxr-xr-x+  3 dclarke  csw   29 Sep 24 03:15 stage1-libcpp
drwxr-xr-x+  2 dclarke  csw   15 Sep 24 03:16 stage1-libdecnumber
drwxr-xr-x+  4 dclarke  csw   72 Sep 24 03:08 stage1-libiberty
-rw-r--r--   1 dclarke  csw7 Sep 24 20:58 stage_current
-rw-r--r--   1 dclarke  csw7 Sep 24 03:01 stage_final
-rw-r--r--   1 dclarke  csw7 Sep 24 20:58 stage_last

I'll delete this now .. I can't use it with those strange ACL's there.
$ cd /home/dclarke/test
$ rm -rf destination

I'll do some more testing with star 1.5a89 and let you know what I see.


-- 
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Almost empty ZFS filesystem - 14GB?

2009-08-16 Thread Dennis Clarke


 Chris Murray wrote:
 Accidentally posted the below earlier against ZFS Code, rather than ZFS
 Discuss.

 My ESXi box now uses ZFS filesystems which have been shared over NFS.
 Spotted something odd this afternoon - a filesystem which I thought
 didn't have any files in it, weighs in at 14GB. Before I start deleting
 the empty folders to see what happens, any ideas what's happened here?

 # zfs list | grep temp
 zp/nfs/esx_temp 14.0G 225G 14.0G /zp/nfs/esx_temp
 # ls -la /zp/nfs/esx_temp
 total 20
 drwxr-xr-x 5 root root 5 Aug 13 12:54 .
 drwxr-xr-x 7 root root 7 Aug 13 12:40 ..
 drwxr-xr-x 2 root root 2 Aug 13 12:53 iguana
 drwxr-xr-x 2 root root 2 Aug 13 12:54 meerkat
 drwxr-xr-x 2 root root 2 Aug 16 19:39 panda
 # ls -la /zp/nfs/esx_temp/iguana/
 total 8
 drwxr-xr-x 2 root root 2 Aug 13 12:53 .
 drwxr-xr-x 5 root root 5 Aug 13 12:54 ..
 # ls -la /zp/nfs/esx_temp/meerkat/
 total 8
 drwxr-xr-x 2 root root 2 Aug 13 12:54 .
 drwxr-xr-x 5 root root 5 Aug 13 12:54 ..
 # ls -la /zp/nfs/esx_temp/panda/
 total 8
 drwxr-xr-x 2 root root 2 Aug 16 19:39 .
 drwxr-xr-x 5 root root 5 Aug 13 12:54 ..
 #

 Could there be something super-hidden, which I can't see here?

 There don't appear to be any snapshots relating to zp/nfs/esx_temp.

 On a suggestion, I have ran the following:

 # zfs list -r zp/nfs/esx_temp
 NAME USED AVAIL REFER MOUNTPOINT
 zp/nfs/esx_temp 14.0G 225G 14.0G /zp/nfs/esx_temp
 # du -sh /zp/nfs/esx_temp
 8K /zp/nfs/esx_temp
 #


 Does

 zfs list -t snapshot -r zp/nfs/esx_temp

 show anything?

 What about

 zfs get refquota,refreservation,quota,reservation zp/fs/esx_tmp


pardon me for butting in .. but I thought that was a spelling error.

It wasn't :

# zfs get refquota,refreservation,quota,reservation fibre0
NAMEPROPERTYVALUE  SOURCE
fibre0  refquotanone   default
fibre0  refreservation  none   default
fibre0  quota   none   default
fibre0  reservation none   default

what the heck is refreservation ??  8-)


--
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Almost empty ZFS filesystem - 14GB?

2009-08-16 Thread Dennis Clarke


 what the heck is refreservation ??  8-)


 PSARC/2009/204 ZFS user/group quotas  space accounting [1]
 Integrated in build 114

 [1] http://arc.opensolaris.org/caselog/PSARC/2009/204/
 [2] http://mountall.blogspot.com/2009/05/sxce-build-114-is-out.html

that was fast .

Cyril, long time no hear. :-(

Hows life the universe and risc processors for you these days ?

--
Dennis Clarke
dcla...@opensolaris.ca  - Email related to the open source Solaris
dcla...@blastwave.org   - Email related to open source for Solaris

ps: I have been busy porting as per usual.
New 64-bit ready Tk/Tcl released today.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool iostat reports seem odd. bug ?

2009-08-10 Thread Dennis Clarke

  -  -  -  -  -  -

^C#

Non-verbose iostat data shows no ( near zero ) write bandwidth :

# zpool iostat phobos_rpool 5
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
phobos_rpool  16.2G  17.5G233 36  6.98M  51.0K
phobos_rpool  16.2G  17.5G202 10  18.6M  12.2K
phobos_rpool  16.2G  17.5G212 15  15.5M  14.6K
phobos_rpool  16.2G  17.5G274 43  15.5M  36.9K
phobos_rpool  16.2G  17.5G250 24  21.1M  22.7K
phobos_rpool  16.2G  17.5G189 15  16.8M  14.9K
phobos_rpool  16.2G  17.5G205 21  16.8M  18.5K
^C#

I also note that the verbose output reports often show no units for read
bandwidth on the new device :

# zpool iostat -v phobos_rpool 5
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
phobos_rpool  16.2G  17.5G375 52  8.60M  74.7K
  mirror  16.2G  17.5G375 52  8.60M  74.7K
c1t0d0s0  -  -112 29  6.21M  75.5K
c1t1d0s0  -  - 59 32  3.10M  75.5K
c0t2d0-  -  0343104  13.3M
  -  -  -  -  -  -

See the 104 in the last row. That may be bytes, KB, or MB. That may be
documented somewhere but I suspect it is not just bytes.

Sorry if I am being nit-picky but I thought that this data would be in the
kstat chain and the per-device data would be summed up for the non-verbose
report. It looks like the write traffic to the new device is being ignored
in the non-verbose output data.


-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] The zfs performance decrease when enable the MPxIO round-robin

2009-07-19 Thread Dennis Clarke



 To enable mpxio, you need to have

 mpxio-disable=no;

 in your fp.conf file. You should run /usr/sbin/stmsboot -e to make
 this happen. If you *must* edit that file by hand, always run
 /usr/sbin/stmsboot -u afterwards to ensure that your system's MPxIO
 config is correctly updated.

I thought stmsboot was mildly broken on the latest releases of S10 and
possibly confused in the ZFS world. I am going by memory here of course
and this may have been fixed since I looked at it 6 months ago or so. I
also feel that editing the fb.conf file manually and entering the paths
chosen is perhaps the best way to go.

Also, since I'm up late and posting a comment anyways, I set up a V890
with mpxio and ZFS with every ZFS mirror being composed of a mpxio enabled
device. The whole process took some time but the level of redundency and
throughput was worth it.  Who knows, maybe someday mpxio will be default
from install on FCAL enabled machines.

Dennis

ps: I'm going to go search for those bugids if they ever existed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Zpool lazy mirror ?

2009-07-19 Thread Dennis Clarke


Pardon me but I had to change subject lines just to get out of that other
thread.

In that other thread .. you were saying :


 dick hoogendijk uttered:
 true. Furthermore, much so-called consumer hardware is very good these
 days. My guess is ZFS should work quite reliably on that hardware.
 (i.e. non ECC memory should work fine!) / mirroring is a -must- !
 Gavin correctly revealed :
 No, ECC memory is a must too.  ZFS checksumming verifies and corrects
 data read back from a disk, but once it is read from disk it is stashed
 in memory for your application to use - without ECC you erode confidence
  that what you read from memory is correct.

Well here I run into a small issue. And timing is everything in life and
this small issue is happening right in front of me as I write this.

I have a Sun Blade 2500 with 4GB of genuine Sun ECC memory ( 370-6203 [1]
) and internally there are dual Sun 72GB Ultra 320 disks ( 390-0106 ). I
like to have mirrors everywhere and I also like safety. I had the
brilliant idea of pulling the secondary disk in slot 1 out and installing
some more ethernet and SCSI paths. So I popped in a 501-5727 ( Dual
FastEthernet / Dual SCSI Ultra-2 PCI Adapter ) and then moved the internal
disk out to an external disk pack. So now I still have a mirror but with
dual SCSI controllers involved.

When the machine boots I see this :


Rebooting with command: boot
Boot device: /p...@1d,70/s...@4/d...@0,0:a  File and args:
SunOS Release 5.10 Version Generic_141414-02 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: mercury
Loading smf(5) service descriptions: 1/1
Reading ZFS config: done.
Mounting ZFS filesystems: (5/5)
mercury console login: root
Password:
Jul 20 00:13:06 mercury login: ROOT LOGIN /dev/console
Last login: Sun Jul 19 23:41:22 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005

# zpool status
  pool: mercury_rpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
mercury_rpool  DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
c0t0d0s0  ONLINE   0 0 0
c1t2d0s0  UNAVAIL  0 0 0  cannot open

So I have to manually intervene and do this :

# zpool online mercury_rpool c1t2d0s0
# zpool status
  pool: mercury_rpool
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Mon Jul 20 00:13:28
2009
config:

NAME  STATE READ WRITE CKSUM
mercury_rpool  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c1t2d0s0  ONLINE   0 0 0

errors: No known data errors

This means that I do have a Zpool with mirrored ZFS boot and root and all
that goodness but not unless I *know* to look at the state of the mirror
after boot. The system seems to be lazy in that it does not report the
DEGRADED state on the console or via syslogd.

Now I caught this, just now ( see date and kernel rev above ) and wonder
.. is this not a bug ?

--
Dennis

[1] DDR266, PC2100, CL2, ECC Serial Presence Detect 1.0 1GB Registered DIMM

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Zpool lazy mirror ?

2009-07-19 Thread Dennis Clarke


self replies are so degrading ( pun intended )

I see this patch :

Document Audience:  PUBLIC
Document ID:139555-08
Title:  SunOS 5.10: Kernel Patch
Copyright Notice:   Copyright © 2009 Sun Microsystems, Inc. All Rights 
Reserved
Update Date:Fri Jul 10 04:29:40 MDT 2009

I have a sneaky feeling .. the issue was fixed in a kernel patch released
*this* past week.

We shall see ... I'll patch now.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Thank you.

2009-07-15 Thread Dennis Clarke


I want to express my thanks. My gratitude. I am not easily impressed
by technology anymore and ZFS impressed me this morning.

Sometime late last night a primary server of mine had a critical
fault. One of the PCI cards in a V480 was the cause and for whatever
reasons this destroyed the DC-DC power convertors that powered the
primary internal disks. It also dropped the whole machine and 12
zones.

I feared the worst and made the call for service at about midnight
last night. A Sun service tech said he could be there in 2 hours
or so but he asked me to check this and check that. The people at
the datacenter were happy to tell me there was a wrench light on
but other than that, they knew nothing.

This machine, like all critical systems I have, uses mirrored disks
in ZPools with multiple links of fibre to arrays.  I dreaded what
would happen when we tried to boot this box after all the dust was
blown out and hardware swapped.

Early this morning ... I watched the detailed diags run and finally
a nice clean ok prompt.

*
Hardware Power On

@(#)OBP 4.22.34 2007/07/23 13:01 Sun Fire 4XX
System is initializing with diag-switch? overrides.
Online: CPU0 CPU1 CPU2 CPU3*
Validating JTAG integrity...Done
.
.
.
CPU0: System POST Completed
Pass/Fail Status  = ...
ESB Overall Status  = ...

*
POST Reset
.
.
.

{3} ok show-post-results
System POST Results
Component:Results

CPU/Memory:Passed
IO-Bridge8:Passed
IO-Bridge9:Passed
GPTwo Slots:   Passed
Onboard FCAL:  Passed
Onboard Net1:  Passed
Onboard Net0:  Passed
Onboard IDE:   Passed
PCI Slots: Passed
BBC0:  Passed
RIO:   Passed
USB:   Passed
RSC:   Passed
POST Message:  POST PASS
{3} ok boot -s

Eventually I saw my login prompt. There were no warnings about data
corruption. No data loss. No noise at all in fact.   :-O

# zpool list
NAME SIZE   USED  AVAILCAP  HEALTH  ALTROOT
fibre0   680G   654G  25.8G96%  ONLINE  -
z0  40.2G   103K  40.2G 0%  ONLINE  -

# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors
#

Not one error. No message about resilver this or inode that.

Everything booted flawlessly and I was able to see all my zones :

# bin/lz
-
NAME   ID  STATUS  PATH  HOSTNAME  BRAND IP
-
z_001  4   running /zone/z_001   pluto solaris8  excl
z_002  -   installed   /zone/z_002   ldap01nativeshared
z_003  -   installed   /zone/z_003   openfor   solaris9  shared
z_004  6   running /zone/z_004   gaspranativeshared
z_005  5   running /zone/z_005   ibisprd   nativeshared
z_006  7   running /zone/z_006   ionativeshared
z_007  1   running /zone/z_007   nis   nativeshared
z_008  3   running /zone/z_008   callistoz nativeshared
z_009  2   running /zone/z_009   loginznativeshared
z_010  -   installed   /zone/z_010   venus solaris8  shared
z_011  -   installed   /zone/z_011   adbs  solaris9  shared
z_012  -   installed   /zone/z_012   auroraux  nativeshared
z_013  8   running /zone/z_013   osirisnativeexcl
z_014  -   installed   /zone/z_014   jira  nativeshared

People love to complain. I see it all the time.

I downloaded this OS for free and I run it in production.
I have support and I am fine with paying for support contracts.
But someone somewhere needs to buy the ZFS guys some keg(s) of
whatever beer they want. Or maybe new Porsche Cayman S toys.

That would be gratitude as something more than just words.

Thank you.

-- 
Dennis Clarke

ps: the one funny thing

Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Dennis Clarke


 Richard Elling richard.ell...@gmail.com writes:

 You can only send/receive snapshots.  However, on the receiving end,
 there will also be a dataset of the name you choose.  Since you didn't
 share what commands you used, it is pretty impossible for us to
 speculate what you might have tried.

 I thought I made it clear I had not used any commands but gave two
 detailed examples of different ways to attempt the move.

 I see now the main thing that confused me is that sending a
 z1/proje...@something
 to a new z2/proje...@something would also result in z2/projects being
 created.

 That part was not at all clear to me from the man page.

This will probably get me bombed with napalm but I often just
use star from Jörg Schilling because its dead easy :

  star -copy -p -acl -sparse -dump -C old_dir . new_dir

and you're done.[1]

So long as you have both the new and the old zfs/ufs/whatever[2]
filesystems mounted. It doesn't matter if they are static or not. If
anything changes on the filesystem then star will tell you about it.

-- 
Dennis

[1] -p means preserve meta-properties of the files/dirs etc.
-acl means what it says. Grabs ACL data also.
-sparse means what it says. Handles files with holes in them.
-dump means be super careful about everything ( read the manpage )

[2] star doesn't care if its zfs or ufs or a CDROM or a floppy.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] first use send/receive... somewhat confused.

2009-07-13 Thread Dennis Clarke


 Dennis Clarke dcla...@blastwave.org writes:

 This will probably get me bombed with napalm but I often just
 use star from JÃ¶rg Schilling because its dead easy :

   star -copy -p -acl -sparse -dump -C old_dir . new_dir

 and you're done.[1]

 So long as you have both the new and the old zfs/ufs/whatever[2]
 filesystems mounted. It doesn't matter if they are static or not. If
 anything changes on the filesystem then star will tell you about it.

 I'm not sure I see how that is easier.

 The command itself may be but it requires other moves not shown in
 your command.

 1) zfs create z2/projects

 2)  star -copy -p -acl -sparse -dump -C old_dir . new_dir

 As a bare minimum would be required.

 whereas
 zfs send z1/proje...@snap |zfs receive z2/proje...@snap

 Is all that is necessary using zfs send receive, and the new
 filesystem z2/projects is created and populated with data from
 z1/projects, not to mention a snapshot at z2/projects/.zfs/snapshot

sort of depends on what you want to get done and both work.

dc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on 32 bit?

2009-06-16 Thread Dennis Clarke


 On Tue, 16 Jun 2009, roland wrote:

 so, we have a 128bit fs, but only support for 1tb on 32bit?

 i`d call that a bug, isn`t it ?  is there a bugid for this? ;)

 I'd say the bug in this instance is using a 32-bit platform in 2009!  :-)

Rich, a lot of embedded industrial solutions are 32-bit and very up to
date in terms of features.

Thus :

$ uname -a
SunOS aequitas 5.11 snv_115 i86pc i386 i86pc
$ isainfo -v
32-bit i386 applications
ahf sse2 sse fxsr mmx cmov sep cx8 tsc fpu
$ isalist -v
i486 i386 i86

$ psrinfo -pv
The physical processor has 1 virtual processor (0)
  x86 (CentaurHauls 6A9 family 6 model 10 step 9 clock 1200 MHz)
VIA Esther processor 1200MHz

Also, some of the very very small little PC units out there, those things
called eePC ( or whatever ) are probably 32-bit only.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-15 Thread Dennis Clarke


 On Mon, 15 Jun 2009, dick hoogendijk wrote:

 IF at all, it certainly should not be the DEFAULT.
 Compression is a choice, nothing more.

 I respectfully disagree somewhat.  Yes, compression shuould be a
 choice, but I think the default should be for it to be enabled.

I agree that Compression is a choice and would add :

   Compression is a choice and it is the default.

Just my feelings on the issue.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Quick adding devices question

2009-05-29 Thread Dennis Clarke



 zpool create dpool c1t0d0 c1t1d0 c1t2d0

yep

 And then later, when the other cable is installed:

 zpool attach dpool c1t0d0 c2t0d0
 zpool attach dpool c1t1d0 c2t1d0
 zpool attach dpool c1t2d0 c2t2d0

That is sort of the way I do things also :

# zpool status
  pool: fibre0
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: resilver completed after 1h35m with 0 errors on Tue Mar 24
18:23:20 2009
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors

You noticed that the man page is not too clear on that eh?

zpool attach [-f] pool device new_device

 Attaches new_device to an  existing  zpool  device.  The
 existing device cannot be part of a raidz configuration.
 If device is not currently part of a mirrored configura-
 tion,  device  automatically  transforms  into a two-way
 mirror of device and new_device. If device is part of  a
 two-way mirror, attaching new_device creates a three-way
 mirror, and so on. In either case, new_device begins  to
 resilver immediately.

so yeah, you have it.

Want to go for bonus points? Try to read into that man page to figure out
how to add a hot spare *after* you are all mirrored up.



-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-10 Thread Dennis Clarke

 It may be because it is blocked in kernel.
 Can you do something like this:
 echo 0tpid of zpool import::pid2proc|::walk thread|::findstack -v

 So we see that it cannot complete import here and is waiting for
 transaction group to sync. So probably spa_sync thread is stuck, and we
 need to find out why.

Well, the details are going to change, I had to reboot.  :-(

I'll start up the stuck thread bug again here by simply starting over.
I'll bet you would be able to learn a few things if you were to ssh into
this machine. ?

regardless, let's start over.

dcla...@neptune:~$ uname -a
SunOS neptune 5.11 snv_111 i86pc i386 i86pc
dcla...@neptune:~$ uptime
  2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
dcla...@neptune:~$ su -
Password:
Sun Microsystems Inc.   SunOS 5.11  snv_111 November 2008
#
# zpool import
  pool: foo
id: 15989070886807735056
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

foo ONLINE
  c0d0p0ONLINE
#

please see ALL the details at :

http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README

also see output from fmdump -eV

http://www.blastwave.org/dclarke/blog/files/fmdump_e.log

Please let me know what else you may need.

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-10 Thread Dennis Clarke


 Dennis Clarke wrote:
 It may be because it is blocked in kernel.
 Can you do something like this:
 echo 0tpid of zpool import::pid2proc|::walk thread|::findstack -v
 So we see that it cannot complete import here and is waiting for
 transaction group to sync. So probably spa_sync thread is stuck, and we
 need to find out why.

 Well, the details are going to change, I had to reboot.  :-(

 I'll start up the stuck thread bug again here by simply starting over.
 I'll bet you would be able to learn a few things if you were to ssh into
 this machine. ?

 regardless, let's start over.

 dcla...@neptune:~$ uname -a
 SunOS neptune 5.11 snv_111 i86pc i386 i86pc
 dcla...@neptune:~$ uptime
   2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
 dcla...@neptune:~$ su -
 Password:
 Sun Microsystems Inc.   SunOS 5.11  snv_111 November 2008
 #
 # zpool import
   pool: foo
 id: 15989070886807735056
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier
 and
 the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
 config:

 foo ONLINE
   c0d0p0ONLINE
 #

 please see ALL the details at :

 http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README

 There's a corrupted space map which is being updated as part of the txg
 sync; in order to update it (add a few free ops to the last block), we
 need to read in current content of the last block from disk first, and
 that fails because it is corrupted (as indicated by checksum errors in
 the fmdump output):

 eb5c9dc0 fec1f3980   0  60 d38e1828
PC: _resume_from_idle+0xb1THREAD: txg_sync_thread()
stack pointer for thread eb5c9dc0: eb5c9a28
  swtch+0x188()
  cv_wait+0x53()
  zio_wait+0x55()
  dbuf_read+0x201()
  dbuf_will_dirty+0x30()
  dmu_write+0xd7()
  space_map_sync+0x304()
  metaslab_sync+0x284()
  vdev_sync+0xc6()
  spa_sync+0x3d0()
  txg_sync_thread+0x308()
  thread_start+8()

 Victor

I had to cc that back onto the ZFS list, it may be of value here.

I agree that there is something wrong, no doubt, however we should not see
zpool import simply hang and become unresponsive nor should that pid be
unresponsive to a SIGKILL. Good behaviour should be the norm and that is
not what we see with a stuck kernel thread. Really, we should get some
response to the effect that a device is corrupt or similar.

Right now, what the user gets, is very little information other than a
non-responsive command.

 CTRL+C does nothing and kill -9 pid does nothing to this command.

feels like a bug to me

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-10 Thread Dennis Clarke


 Dennis Clarke wrote:
 Dennis Clarke wrote:
 It may be because it is blocked in kernel.
 Can you do something like this:
 echo 0tpid of zpool import::pid2proc|::walk thread|::findstack
 -v
 So we see that it cannot complete import here and is waiting for
 transaction group to sync. So probably spa_sync thread is stuck, and
 we
 need to find out why.
 Well, the details are going to change, I had to reboot.  :-(

 I'll start up the stuck thread bug again here by simply starting over.
 I'll bet you would be able to learn a few things if you were to ssh
 into
 this machine. ?

 regardless, let's start over.

 dcla...@neptune:~$ uname -a
 SunOS neptune 5.11 snv_111 i86pc i386 i86pc
 dcla...@neptune:~$ uptime
   2:04pm  up 10:13,  1 user,  load average: 0.17, 0.16, 0.15
 dcla...@neptune:~$ su -
 Password:
 Sun Microsystems Inc.   SunOS 5.11  snv_111 November 2008
 #
 # zpool import
   pool: foo
 id: 15989070886807735056
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier
 and
 the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
 config:

 foo ONLINE
   c0d0p0ONLINE
 #

 please see ALL the details at :

 http://www.blastwave.org/dclarke/blog/files/kernel_thread_stuck.README
 There's a corrupted space map which is being updated as part of the txg
 sync; in order to update it (add a few free ops to the last block), we
 need to read in current content of the last block from disk first, and
 that fails because it is corrupted (as indicated by checksum errors in
 the fmdump output):

 eb5c9dc0 fec1f3980   0  60 d38e1828
PC: _resume_from_idle+0xb1THREAD: txg_sync_thread()
stack pointer for thread eb5c9dc0: eb5c9a28
  swtch+0x188()
  cv_wait+0x53()
  zio_wait+0x55()
  dbuf_read+0x201()
  dbuf_will_dirty+0x30()
  dmu_write+0xd7()
  space_map_sync+0x304()
  metaslab_sync+0x284()
  vdev_sync+0xc6()
  spa_sync+0x3d0()
  txg_sync_thread+0x308()
  thread_start+8()

 Victor

 I had to cc that back onto the ZFS list, it may be of value here.

 Sorry for that, I've just hit wrong button ;-)

 I agree that there is something wrong, no doubt, however we should not
 see
 zpool import simply hang and become unresponsive nor should that pid be
 unresponsive to a SIGKILL. Good behaviour should be the norm and that is
 not what we see with a stuck kernel thread. Really, we should get some
 response to the effect that a device is corrupt or similar.

 Right now, what the user gets, is very little information other than a
 non-responsive command.

  CTRL+C does nothing and kill -9 pid does nothing to this command.

 feels like a bug to me

 Yes, it is:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6758902

oh drat, I thought I hit something new :-\

Not very likely with ZFS, it is pretty well flushed out all the way into
the dark corners I guess.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-10 Thread Dennis Clarke


  CTRL+C does nothing and kill -9 pid does nothing to this command.

 feels like a bug to me

 Yes, it is:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6758902


Now I recall why I had to reboot.  Seems as if a lot of commands hang now.
Things like :

df -ak

zfs list

zpool list

they all just hang.

Dennis

ps: this machine is really just an embedded device based on the VIA
chipset. Not too sure if that matters.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] using zdb -e -bbcsL to debug that hung thread issue

2009-05-10 Thread Dennis Clarke



 Original Message 
Subject: Re: I see you're running zdb -e -bbcsL
From:Victor Latushkin victor.latush...@sun.com
Date:Sun, May 10, 2009 11:17
To:  dcla...@blastwave.org
--

Dennis Clarke wrote:
 # w
   3:14pm  up 11:24,  3 users,  load average: 0.46, 0.29, 0.23
 User tty   login@  idle   JCPU   PCPU  what
 dclarke  console   1:22pm  1:52   2:02   1:31  /usr/lib/nwam-manager
 dclarke  pts/4 1:44pm  1:10zpool import -f -R
 /mnt/foo 1598
 dclarke  pts/7 1:49pm 9ssh -2 -4 -e^ -l
 dclarke loginz.
 dclarke  pts/8 1:51pm 3ssh -2 -4 -e^ -l
 dclarke mail.li
 dclarke  pts/102:07pm   20 w
 iktorn   pts/113:06pm 4zpool import
 iktorn   pts/123:13pm1  1  zdb -e -bbcsL
 159890708868077350


 Now I need to go read the manual to see what zdb is :-)


thus far I see some output from that :

dcla...@neptune:~$ cat ../iktorn/zdb/zdb-ebbcsL.out

I know that will wrap all wrong for people to see.

see :
http://www.blastwave.org/dclarke/blog/files/zdb-ebbcsL.README


Traversing all blocks to verify metadata checksums ...
zdb_blkptr_cb: Got error 50 reading 0, 34, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x6091e400:0x200 DVA[1]=0:0x3e091e400:0x200
DVA[2]=0:0x78091e400:0x200 fletcher4 lzjb LE contiguous birth=1461
fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f
-- skipping
zdb_blkptr_cb: Got error 50 reading 0, 35, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x6091e600:0x200 DVA[1]=0:0x3e091e600:0x200
DVA[2]=0:0x78091e600:0x200 fletcher4 lzjb LE contiguous birth=1461
fill=1 cksum=0x1f7bc0ee12:0x6fcfd90640d:0x10787c83addaf:0x1f3ef97a921b6f
-- skipping
zdb_blkptr_cb: Got error 50 reading 0, 36, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x6091e200:0x200 DVA[1]=0:0x3e091e200:0x200
DVA[2]=0:0x78091e200:0x200 fletcher4 lzjb LE contiguous birth=1461
fill=1 cksum=0x1f522c92e2:0x6ae50e6dbad:0xf0a944e70790:0x1b6468e6c6f56a --
skipping
zdb_blkptr_cb: Got error 50 reading 0, 37, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x803d9800:0x200 DVA[1]=0:0x4003d9800:0x200
DVA[2]=0:0x7a03d9800:0x200 fletcher4 lzjb LE contiguous birth=1509
fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c
-- skipping
zdb_blkptr_cb: Got error 50 reading 0, 38, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x803d9a00:0x200 DVA[1]=0:0x4003d9a00:0x200
DVA[2]=0:0x7a03d9a00:0x200 fletcher4 lzjb LE contiguous birth=1509
fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c
-- skipping
zdb_blkptr_cb: Got error 50 reading 0, 39, 0, 0 [L0 SPA space map]
0x1000L/0x200P DVA[0]=0:0x803d9600:0x200 DVA[1]=0:0x4003d9600:0x200
DVA[2]=0:0x7a03d9600:0x200 fletcher4 lzjb LE contiguous birth=1509
fill=1 cksum=0x1f2b0a539f:0x763a6e219f4:0x1200601439c63:0x22f1a766cefc7c
-- skipping
zdb_blkptr_cb: Got error 50 reading 0, 48, 0, 0 [L0 SPA space map]
0x1000L/0x400P DVA[0]=0:0xc1263c00:0x400 DVA[1]=0:0x441263c00:0x400
DVA[2]=0:0x7c1263c00:0x400 fletcher4 lzjb LE contiguous birth=648 fill=1
cksum=0x22b93a8434:0x190afe8456c3:0x9632f68e6719b:0x2703bc59856dd31 --
skipping
zdb_blkptr_cb: Got error 50 reading 0, 49, 0, 0 [L0 SPA space map]
0x1000L/0x400P DVA[0]=0:0xc1264000:0x400 DVA[1]=0:0x441264000:0x400
DVA[2]=0:0x7c1264000:0x400 fletcher4 lzjb LE contiguous birth=648 fill=1
cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea --
skipping
zdb_blkptr_cb: Got error 50 reading 0, 50, 0, 0 [L0 SPA space map]
0x1000L/0x400P DVA[0]=0:0xc1264400:0x400 DVA[1]=0:0x441264400:0x400
DVA[2]=0:0x7c1264400:0x400 fletcher4 lzjb LE contiguous birth=648 fill=1
cksum=0x24c7dc289e:0x1a54426b8513:0x9cb262a2e8e04:0x286474a6a40f0ea --
skipping

Error counts:

errno  count
   50  9
block traversal size 1561281536 != alloc 20934112256 (unreachable
19372830720)

bp count:4121
bp logical: 521589760avg: 126568
bp physical:520441856avg: 126290compression:   1.00
bp allocated:  1561281536avg: 378859compression:   0.33
SPA allocated: 20934112256  used: 26.17%

Blocks  LSIZE   PSIZE   ASIZE avgcomp   %Total  Type
 8  56.0K   10.0K   30.0K   3.75K5.60 0.00  deferred free
 1512 512   1.50K   1.50K1.00 0.00  object directory
 2 1K  1K   3.00K   1.50K1.00 0.00  object array
 116K   1.50K   4.50K   4.50K   10.67 0.00  packed nvlist
 -  -   -   -   -   --  packed nvlist size
 116K  1K   3.00K   3.00K   16.00 0.00  bplist
 -  -   -   -   -   --  bplist header
 -  -   -   -   -   --  SPA space map header
48   192K   37.0K111K   2.31K5.19

[zfs-discuss] is zpool import unSIGKILLable ?

2009-05-09 Thread Dennis Clarke


I tried to import a zpool and the process just hung there, doing nothing.
It has been ten minutes now so I tries to hit CTRL-C.  That did nothing.

So then I tried :

Sun Microsystems Inc.   SunOS 5.11  snv_110 November 2008
r...@opensolaris:~# ps -efl

 F S  UID   PID  PPID   C PRI NI ADDR SZWCHANSTIME TTY
TIME CMD
 1 T root 0 0   0   0 SY fec1f318  0  10:02:47 ?  
0:01 sched
 0 S root 1 0   0  40 20 d3a62448683 d3291d32 10:02:50 ?  
0:00 /sbin/init
 1 S root 2 0   0   0 SY d3a61bc0  0 fec776b0 10:02:50 ?  
0:00 pageout
.
.
.
 0 S root  1185  1014   0  40 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708

r...@opensolaris:~# kill -9 1185

r...@opensolaris:~# ps -efl | grep root
.
.
.
 0 S root  1014  1008   0  50 20 d74ff260   1470 d74ff2cc 10:16:23
pts/2   0:00 -bash
 0 S root  1185  1014   0  40 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708
.
.
.

OKay, I'll kill the shell.

r...@opensolaris:~# kill -9 1014

r...@opensolaris:~# ps -efl | grep root

 0 S root  1185 1   0  50 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708

r...@opensolaris:~# kill -9 1185
r...@opensolaris:~# ps -efl | grep root | grep import
 0 S root  1185 1   0  50 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708
r...@opensolaris:~# kill -9 1185
r...@opensolaris:~# ps -efl | grep root | grep import
 0 S root  1185 1   0  50 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708
r...@opensolaris:~#

r...@opensolaris:~# date
Sat May  9 11:29:37 PDT 2009
r...@opensolaris:~# ps -efl | grep root | grep import
 0 S root  1185 1   0  50 20 d74fd040   1943 d7f15c66 11:19:04
pts/2   0:00 zpool import -f -R /a/foo 159890708
r...@opensolaris:~#

Seems to be permanently wedged in there.

r...@opensolaris:~# truss -faeild -p 1185
truss: unanticipated system error: 1185

So what is the trick to killing this ?

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-09 Thread Dennis Clarke


 Dennis Clarke wrote:
 I tried to import a zpool and the process just hung there, doing
 nothing.
 It has been ten minutes now so I tries to hit CTRL-C.  That did nothing.


 This symptom is consistent with a process blocked waiting on disk I/O.
 Are the disks functional?

totally

I'm running with the machine right now.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-09 Thread Dennis Clarke


 Dennis Clarke wrote:
 I tried to import a zpool and the process just hung there, doing
 nothing.
 It has been ten minutes now so I tries to hit CTRL-C.  That did nothing.


 This symptom is consistent with a process blocked waiting on disk I/O.
 Are the disks functional?

dcla...@neptune:~$ zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c0d0s0ONLINE   0 0 0

errors: No known data errors
dcla...@neptune:~$ zpool get all rpool
NAME   PROPERTY   VALUE   SOURCE
rpool  size   74G -
rpool  used   11.3G   -
rpool  available  62.7G   -
rpool  capacity   15% -
rpool  altroot-   default
rpool  health ONLINE  -
rpool  guid   3386894308818650832  default
rpool  version14  default
rpool  bootfs rpool/ROOT/snv_111  local
rpool  delegation on  default
rpool  autoreplaceoff default
rpool  cachefile  -   default
rpool  failmode   continuelocal
rpool  listsnapshots  off default

dcla...@neptune:~$ su -
Password:
Sun Microsystems Inc.   SunOS 5.11  snv_111 November 2008
# zpool import
  pool: foo
id: 15989070886807735056
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

foo ONLINE
  c0d0p0ONLINE

If I try this again .. it may just hang again. But here goes.

# mkdir /mnt/foo
# zpool import -f -R /mnt/foo 15989070886807735056

and then ... nothing happens.

Not too sure what is going on here.

In another window I do this and see the same thing as before :

dcla...@neptune:~$ date;ps -efl | grep root | grep import
Sat May  9 20:42:11 GMT 2009
 0 S root  1096  1088   0  50 20 df81e378   1327 d8274526 20:40:38
pts/5   0:00 zpool import -f -R /mnt/foo 1598907

I have to look into this a bit and try to figure out why I am seeing this
thing foo and why can I not import it.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] is zpool import unSIGKILLable ?

2009-05-09 Thread Dennis Clarke

 Dennis Clarke wrote:
 I tried to import a zpool and the process just hung there, doing nothing.
 It has been ten minutes now so I tries to hit CTRL-C.  That did
nothing.

 It may be because it is blocked in kernel.

 Can you do something like this:

 echo 0tpid of zpool import::pid2proc|::walk thread|::findstack -v


dcla...@neptune:~$ date;ps -efl | grep root | grep import

Sat May  9 22:54:45 GMT 2009
 0 S root  1096  1088   0  50 20 df81e378   1327 d8274526 20:40:38
pts/5   0:00 zpool import -f -R /mnt/foo 1598907

dcla...@neptune:~$ su -

Password:
Sun Microsystems Inc.   SunOS 5.11  snv_111 November 2008
# /bin/echo 0t1096::pid2proc|::walk thread|::findstack -v | mdb -k 
stack pointer for thread e0156100: d575fc54
  d575fc94 swtch+0x188()
  d575fca4 cv_wait+0x53(d8274526, d82744e8, , 0)
  d575fce4 txg_wait_synced+0x90(d8274380, 65a, 0, 0)
  d575fd34 spa_config_update_common+0x88(e600fd40, 0, 0, d575fd68)
d575fd84 spa_import_common+0x3cf()
  d575fdb4 spa_import+0x18(ecee4000, dfa040b0, d75b04e0, febd9444)
d575fde4 zfs_ioc_pool_import+0xcd(ecee4000, 0, 0)
  d575fe14 zfsdev_ioctl+0xe0()
  d575fe44 cdev_ioctl+0x31(2d8, 5a02, 80424a0, 13, daf0b0b0,
d575ff00)
  d575fe74 spec_ioctl+0x6b(d83b1d80, 5a02, 80424a0, 13, daf0b0b0,
d575ff00)
  d575fec4 fop_ioctl+0x49(d83b1d80, 5a02, 80424a0, 13, daf0b0b0,
d575ff00)
  d575ff84 ioctl+0x171()
  d575ffac sys_sysenter+0x106()
#

 echo ::threadlist | mdb -k

# /bin/echo ::threadlist | mdb -k | wc -l
 542
# /bin/echo ::threadlist | mdb -k  kern.thread.list
# wc -l kern.thread.list
 541 kern.thread.list

 Output of the second command may be rather big, so it would be better to
post it somewhere.

see http://www.blastwave.org/dclarke/blog/files/kern.thread.list

I see this line :

   e0156100 df81e378 ed00fe70 zpool/1

which seems consistent with what ps says :

   F S  UID   PID  PPID   C PRI NI ADDR SZWCHANSTIME
TTY TIME CMD
 0 S root  1096  1088   0  50 20 df81e378   1327 d8274526 20:40:38
pts/5   0:00 zpool import -f -R /mnt/foo 1598907

which is not telling me much at the moment.

I'm game to play, what is next here ? By the way, brace yourself, this is
a 32-bit system and even worse than that, take a look at this isalist :

dcla...@neptune:~$ isainfo -v

32-bit i386 applications
ahf sse2 sse fxsr mmx cmov sep cx8 tsc fpu
dcla...@neptune:~$ isalist
i486 i386 i86


Dennis Clarke


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [on-discuss] Reliability at power failure?

2009-04-19 Thread Dennis Clarke


 And after some 4 days without any CKSUM error, how can yanking the
 power cord mess boot-stuff?

 Maybe because on the fifth day some hardware failure occurred? ;-)

ha ha !  sorry .. that was pretty funny.

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke

 0 0
  c5t1d0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0

errors: No known data errors

# zpool attach fibre0 c5t1d0 c2t17d0
# zpool add fibre0 spare c2t22d0
# zpool status fibre0
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 2.86% done, 1h18m to go
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors
#

I have also learned that you can not trust that silver progress report
either. It will not take 1h18m to complete. If I wait 20 minutes I'll get
*nearly* the same estimate. The process must not be deterministic in
nature.

# zpool status
  pool: fibre0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h39m, 34.24% done, 1h15m to go
config:

NAME STATE READ WRITE CKSUM
fibre0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t16d0  ONLINE   0 0 0
c5t0d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t1d0   ONLINE   0 0 0
c2t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c5t2d0   ONLINE   0 0 0
c2t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t20d0  ONLINE   0 0 0
c5t4d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c2t21d0  ONLINE   0 0 0
c5t6d0   ONLINE   0 0 0
spares
  c2t22d0AVAIL

errors: No known data errors

  pool: z0
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
z0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s7  ONLINE   0 0 0
c1t1d0s7  ONLINE   0 0 0

errors: No known data errors

# fmadm faulty -afg
#

I do TOTALLY trust that last line that says No known data errors which
makes me wonder if the Severe FAULTs are for unknown data errors :-)


-- 
Dennis Clarke
sig du jour : An appeaser is one who feeds a crocodile, hoping it will
eat him last., Winston Churchill

[1] I really want to know where PowerChute for Solaris went to.

[2] I would create a ZPool of striped mirrors based on multiple USB keys
and on disks on IDE/SATA with or without compression and with
copies={1|2|3} and while running a ON compile I'd pull the USB keys out
and yank the power on the IDE/SATA or fibre disks. ZFS would not throw a
fatal error nor drop a bit of data. Performance suffered but data did not.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke


 On Tue, 24 Mar 2009, Dennis Clarke wrote:

 However, I have repeatedly run into problems when I need to boot after a
 power failure. I see vdevs being marked as FAULTED regardless if there
 are
 actually any hard errors reported by the on disk SMART Firmware. I am
 able
 to remove these FAULTed devices temporarily and then re-insert the same
 disk again and then run fine for months. Until the next long power
 failure.

 In spite of huge detail, you failed to describe to us the technology
 used to communicate with these disks.  The interface adaptors,
 switches, and wiring topology could make a difference.

Nothing fancy. Dual QLogic ( Sun ) fibre cards directly connected to the
back of A5200's. Simple really.

 Is there *really* a severe fault in that disk ?

 # luxadm -v display 2118625d599d

 This sounds some some sort of fiber channel.

 Transport protocol: IEEE 1394 (SBP-2)

 Interesting that it mentions the protocol used by FireWire.

I have no idea where that is coming from.

 If you are using fiber channel, the device names in the pool
 specification suggest that Solaris multipathing is not being used (I
 would expect something long like
 c4t600A0B800039C9B50A9C47B4522Dd0).  If multipathing is not used,
 then you either have simplex connectivity, or two competing simplex
 paths to each device.  Multipathing is recommended if you have
 redundant paths available.

Yes, I have another machine that has mpxio in place. However a power
failure also trips phantom faults.

 If the disk itself is not aware of its severe faults then that
 suggests that there is a transient problem with communicating with the
 disk.

You would think so eh?
But a transient problem that only occurs after a power failure?

 The problem could be in a device driver, adaptor card, FC
 switch, or cable.  If the disk drive also lost power, perhaps the disk
 is unusually slow at spinning up.

All disks were up at boot, you can see that when I ask for a zpool status
at boot time in single user mode. No errors and no faults.

The issue seems to be when fmadm starts up or perhaps some other service
that can thrown a fault. I'm not sure.

 It is easy to blame ZFS for problems.

It is easy to blame a power failure for problems as well as an nice shiney
new APC Smart-UPS XL 3000VA RM 3U unit with external extended run time
battery that doesn't signal a power failure.

I never blame ZFS for anything.

 On my system I was experiencing
 system crashes overnight while running 'zfs scrub' via cron job.  The
 fiber channel card was locking up.  Eventually I learned that it was
 due to a bug in VirtualBox's device driver.  If VirtualBox was not
 left running overnight, then the system would not crash.

VirtualBox ?

This is a Solaris 10 machine. Nothing fancy. OKay, sorry, nothing way out
in the field fancy like VirtualBox.

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke


 On Tue, 24 Mar 2009, Dennis Clarke wrote:

 You would think so eh?
 But a transient problem that only occurs after a power failure?

 Transient problems are most common after a power failure or during
 initialization.

Well the issue here is that power was on for ten minutes before I tried
to do a boot from the ok pronpt.

Regardless, the point is that the ZPool shows no faults at boot time and
then shows phantom faults *after* I go to init 3.

That does seem odd.

Dennsi


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Honesty after a power failure

2009-03-24 Thread Dennis Clarke


 Hey, Dennis -

 I can't help but wonder if the failure is a result of zfs itself finding
 some problems post restart...

Yes, yes, this is what I am feeling also, but I need to find the data also
and then I can sleep at night.  I am certain that ZFS does not just toss
out faults on a whim because there must be a deterministic, logical and
code based reason for those faults that occur *after* I go to init 3.

 Is there anything in your FMA logs?

Oh God yes,  brace yourself :-)

http://www.blastwave.org/dclarke/zfs/fmstat.txt

[ I edit the whitespace here for clarity ]
# fmstat
module  ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-diagnosis   0   0  0.0  2.7   0   0   3 0   4.2K   1.1K
cpumem-retire  0   0  0.0  0.2   0   0   0 0  0  0
disk-transport 0   0  0.0 45.7   0   0   0 040b  0
eft0   0  0.0  0.7   0   0   0 0   1.2M  0
fabric-xlate   0   0  0.0  0.7   0   0   0 0  0  0
fmd-self-diagnosis 3   0  0.0  0.2   0   0   0 0  0  0
io-retire  0   0  0.0  0.2   0   0   0 0  0  0
snmp-trapgen   2   0  0.0  1.7   0   0   0 032b  0
sysevent-transport 0   0  0.0 75.4   0   0   0 0  0  0
syslog-msgs2   0  0.0  1.4   0   0   0 0  0  0
zfs-diagnosis296 252  2.0 236719.7  98   0   1 2   176b   144b
zfs-retire 4   0  0.0 27.4   0   0   0 0  0  0

 zfs-diagnosis svc_t=236719.7 ?

 for a summary and

fmdump

 for a summary of the related errors

http://www.blastwave.org/dclarke/zfs/fmdump.txt

# fmdump
TIME UUID SUNW-MSG-ID
Dec 05 21:31:46.1069 aa3bfcfa-3261-cde4-d381-dae8abf296de ZFS-8000-D3
Mar 07 08:46:43.6238 4c8b199b-add1-c3fe-c8d6-9deeff91d9de ZFS-8000-FD
Mar 07 19:37:27.9819 b4824ce2-8f42-4392-c7bc-ab2e9d14b3b7 ZFS-8000-FD
Mar 07 19:37:29.8712 af726218-f1dc-6447-f581-cc6bb1411aa4 ZFS-8000-FD
Mar 07 19:37:30.2302 58c9e01f-8a80-61b0-ffea-ded63a9b076d ZFS-8000-FD
Mar 07 19:37:31.6410 3b0bfd9d-fc39-e7c2-c8bd-879cad9e5149 ZFS-8000-FD
Mar 10 19:37:08.8289 aa3bfcfa-3261-cde4-d381-dae8abf296de FMD-8000-4M
Repaired
Mar 23 23:47:36.9701 2b1aa4ae-60e4-c8ef-8eec-d92a18193e7a ZFS-8000-FD
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD

# fmdump -vu 3780a2dd-7381-c053-e186-8112b463c2b7
TIME UUID SUNW-MSG-ID
Mar 24 01:29:00.1981 3780a2dd-7381-c053-e186-8112b463c2b7 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=fibre0/vdev=444604062b426970
   Affects: zfs://pool=fibre0/vdev=444604062b426970
   FRU: -
  Location: -

# fmdump -vu 146dad1d-f195-c2d6-c630-c1adcd58b288
TIME UUID SUNW-MSG-ID
Mar 24 01:29:02.1649 146dad1d-f195-c2d6-c630-c1adcd58b288 ZFS-8000-FD
  100%  fault.fs.zfs.vdev.io

Problem in: zfs://pool=fibre0/vdev=23e4d7426f941f52
   Affects: zfs://pool=fibre0/vdev=23e4d7426f941f52
   FRU: -
  Location: -

 will show more and more information about the error. Note that some of
 it might seem like rubbish. The important bits should be obvious though
 - things like the SUNW error message is (like ZFS-8000-D3), which can be
 pumped into

sun.com/msg

like so :

http://www.sun.com/msg/ZFS-8000-FD

or see http://www.blastwave.org/dclarke/zfs/ZFS-8000-FD.txt

Article for Message ID:   ZFS-8000-FD

  Too many I/O errors on ZFS device

  Type

 Fault

  Severity

 Major

  Description

 The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.

  Automated Response

 The device has been offlined and marked as faulted.
 An attempt will be made to activate a hot spare if available.

  Impact

 The fault tolerance of the pool may be affected.


Yep, I agree, that is what I saw.

 Note also that there should also be something interesting in the
 /var/adm/messages log to match and 'faulted' devices.

 You might also find an

fmdump -e

spooky long list of events :

TIME CLASS
Mar 23 23:47:28.5586 ereport.fs.zfs.io
Mar 23 23:47:28.5594 ereport.fs.zfs.io
Mar 23 23:47:28.5588 ereport.fs.zfs.io
Mar 23 23:47:28.5592 ereport.fs.zfs.io
Mar 23 23:47:28.5593 ereport.fs.zfs.io
.
.
.
Mar 23 23:47:28.5622 ereport.fs.zfs.io
Mar 23 23:47:28.5560 ereport.fs.zfs.io
Mar 23 23:47:28.5658 ereport.fs.zfs.io
Mar 23 23:48:41.5957 ereport.fs.zfs.io


   http://www.blastwave.org/dclarke/zfs/fmdump_e.txt

ouch, that is a nasty long list all in a few seconds.

 and

fmdump -eV

a very detailed verbose long list with such entries as

Mar 23 2009 23:48:41.595757900 ereport.fs.zfs.io
nvlist version: 0
class = ereport.fs.zfs.io
ena =

[zfs-discuss] Question about zpool create parameter version

2009-03-04 Thread Dennis Clarke

 version 10.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.

For example, version 4 :

# zpool destroy fibre00
# zpool create -o autoreplace=on -o version=4 -m legacy \
 fibre00 \
 mirror c8t2004CFAC0E97d0 c8t202037F859F1d0 \
 mirror c8t2004CFB53F97d0 c8t202037F84044d0 \
 mirror c8t2004CFA3C3F2d0 c8t2004CF2FCE99d0 \
 mirror c8t2004CF9645A8d0 c8t2004CFA3F328d0 \
 mirror c8t202037F812EAd0 c8t2004CF96FF00d0 \
 mirror c8t2004CFAC489Fd0 c8t2004CF961853d0

Does the keyword current work in some other fashion ?

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool import minor bug in snv_64a

2007-06-25 Thread Dennis Clarke


Not sure if this has been reported or not.

This is fairly minor but slightly annoying.

After fresh install of snv_64a I run zpool import to find this :

# zpool import
  pool: zfs0
id: 13628474126490956011
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool
upgrade'.
config:

zfs0 ONLINE
  mirror ONLINE
c1t9d0   ONLINE
c0t9d0   ONLINE
  mirror ONLINE
c1t10d0  ONLINE
c0t10d0  ONLINE
  mirror ONLINE
c1t11d0  ONLINE
c0t11d0  ONLINE
  mirror ONLINE
c1t12d0  ONLINE
c0t12d0  ONLINE
  mirror ONLINE
c1t13d0  ONLINE
c0t13d0  ONLINE
  mirror ONLINE
c1t14d0  ONLINE
c0t14d0  ONLINE

So I then run a zpool import but I add in the -R option and specify root thus :

# zpool import -f -R / 13628474126490956011

One would think that the -R / would not result in any damage but this si
the result :

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zfs0   191G  8.23G  24.5K  legacy
zfs0/SUNWspro  567M   201M   567M  //opt/SUNWspro
zfs0/backup190G  8.23G   189G  //export/zfs/backup
zfs0/backup/qemu  1.09G   934M  1.09G  //export/zfs/qemu
zfs0/csw   124M  3.88G   124M  //opt/csw
zfs0/home  239M  7.77G   239M  //export/home
zfs0/titan24.5K  8.23G  24.5K  //export/zfs/titan

Note the extra / there that should not be there.

Not a simple thing to fix either :

# zfs set mountpoint=/opt/SUNWspro zfs0/SUNWspro
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zfs0   191G  8.23G  24.5K  legacy
zfs0/SUNWspro  567M   201M   567M  //opt/SUNWspro
zfs0/backup190G  8.23G   189G  //export/zfs/backup
zfs0/backup/qemu  1.09G   934M  1.09G  //export/zfs/qemu
zfs0/csw   124M  3.88G   124M  //opt/csw
zfs0/home  239M  7.77G   239M  //export/home
zfs0/titan24.5K  8.23G  24.5K  //export/zfs/titan

relatively harmless.

Looks like altroot should be assumed to be / unless otherwise specified and
if it is specified to be / then the altroot can be ignored.  I don't know if
that is clear but I think you know what I mean :

in /usr/src/cmd/zpool/zpool_main.c :

static int do_import(nvlist_t *config, const char *newname, const char
*mntopts, const char *altroot, int force, int argc, char **argv)


if that const char *altroot  happens to be nothing more than a forward slash
char ( nul terminated ) then I think it should be ignored.

What say you ?

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import minor bug in snv_64a

2007-06-25 Thread Dennis Clarke


 in /usr/src/cmd/zpool/zpool_main.c :


at line 680 forwards we can probably check for this scenario :

if ( ( altroot != NULL )  ( altroot[0] != '/') ) {
(void) fprintf(stderr, gettext(invalid alternate root '%s': 
must be an absolute path\n), altroot);
nvlist_free(nvroot);
return (1);
}

/*  some altroot has been specified  *
 *  thus altroot[0] and altroot[1] exist */

else if ( ( altroot[0] = '/')  ( altroot[1] = '\0') ) {
(void) fprintf(stderr, Do not specify / as alternate root.\n);
nvlist_free(nvroot);
return (1);
}


not perfect .. but something along those lines.


Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import minor bug in snv_64a

2007-06-25 Thread Dennis Clarke


 On Mon, Jun 25, 2007 at 02:34:21AM -0400, Dennis Clarke wrote:

   note that it was well after 2 AM for me .. half blind asleep

   that's my excuse .. I'm sticking to it.   :-)


  in /usr/src/cmd/zpool/zpool_main.c :
 

 at line 680 forwards we can probably check for this scenario :

 if ( ( altroot != NULL )  ( altroot[0] != '/') ) {
 (void) fprintf(stderr, gettext(invalid alternate root '%s': 
 must be an absolute path\n), altroot);
 nvlist_free(nvroot);
 return (1);
 }

 /*  some altroot has been specified  *
  *  thus altroot[0] and altroot[1] exist */

 else if ( ( altroot[0] = '/')  ( altroot[1] = '\0') ) {

 s/=/==/

yep ... that's what I intended.  The above would bork royally.


 (void) fprintf(stderr, Do not specify / as alternate root.\n);

 You need gettext() here.

  why ?


 nvlist_free(nvroot);
 return (1);
 }


 not perfect .. but something along those lines.

even worse .. I was looking in the wrong section of the code or zpool_main.c

if I get coffee and wake up .. maybe I can take another kick at that eh?

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import minor bug in snv_64a

2007-06-25 Thread Dennis Clarke


 You've tripped over a variant of:

 6335095 Double-slash on /. pool mount points

 - Eric


oh well .. no points for originality then I guess :-)

Thanks

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: ZFS disables nfs/server on a host

2007-04-27 Thread Dennis Clarke


On 4/27/07, Ben Miller [EMAIL PROTECTED] wrote:

I just threw in a truss in the SMF script and rebooted the test system and it 
failed again.
The truss output is at http://www.eecis.udel.edu/~bmiller/zfs.truss-Apr27-2007


324:read(7, 0x000CA00C, 5120)   = 0
324:llseek(7, 0, SEEK_CUR)  Err#29 ESPIPE
324:close(7)= 0
324:waitid(P_PID, 331, 0xFFBFE740, WEXITED|WTRAPPED) = 0

llseek(7, 0, SEEK_CUR)  returns Err#29 ESPIPE

. so then .. whats that mean ?

ERRORS
The llseek() function will fail if:

ESPIPE  The fildes argument  is  associated  with  a
pipe or FIFO.

dunno if that helps

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] FYI: X4500 (aka thumper) sale

2007-04-27 Thread Dennis Clarke


On 4/23/07, Richard Elling [EMAIL PROTECTED] wrote:

FYI,
Sun is having a big, 25th Anniversary sale.  X4500s are half price --
24 TBytes for $24k.  ZFS runs really well on a X4500.
http://www.sun.com/emrkt/25sale/index.jsp?intcmp=tfa5101
I appologize for those not in the US or UK and can't take advantage
of the sale.


I really don't think that advertisements are the right thing to drop
into these maillists.

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] * High Praise for ZFS and NFS services *

2007-04-24 Thread Dennis Clarke


Dear ZFS and OpenSolaris people :

   I recently upgraded a large NFS server upwards from Solaris 8. This is a
production manufacturing facility with football field sized factory floors
and 25 tonne steel products. Many on-site engineers on AIX and CATIA as well
as Solaris users and Windows and everything you can shake a stick at.

   Everything in this place must rest in central storage that is bulletproof
and fast. The NFS server is like the companies vault for its valuables and
its future. After serious consideration looking at a number of products I can
say that *nothing* comes close to the value of ZFS and Solaris.  Nothing comes
close to the speed either. With instant quota control, like turning a dial, we
were able to deliver terabytes of storage to all users but only expose the as
much or as little as we want.  On the fly. That impressed everyone involved.

   I received this yesterday.

:: Verbatim eMail

Dennis:

I have all the data transferred over and reconfigured an AIX machine for
testing tomorrow.
I was surprised that nobody came to me and noticed there was 200 Gig of
disk space available. Or that performance was much faster.
Our spec's showed 5x perfomance increase. I also did a seat of the pants
test ...
Much faster as well. Windows drive mappings were almost instant upon login.
Large CAD models and assembles came up in seconds compared to minutes ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-18 Thread Dennis Clarke


On 4/18/07, Nicolas Williams [EMAIL PROTECTED] wrote:

On Wed, Apr 18, 2007 at 03:47:55PM -0400, Dennis Clarke wrote:
 Maybe with a definition of what a backup is and then some way to
 achieve it. As far as I know the only real backup is one that can be
 tossed into a vault and locked away for seven years.  Or any arbitrary
 amount of time within in reason. Like a decade or a century.   But
 perhaps a backup today will have as much meaning as papertape over
 time.

 Can we discuss this with a few objectives ?  Like define backup and
 then describe mechanisms that may achieve one?  Or a really big
 question that I guess I have to ask, do we even care anymore?

As far as ZFS is concerned any discussion of how you'll read today's
media a decade into the future is completely OT :)


probably.  Media should have a shelf life of seven years minimum and
probably a whole lot longer. The technology ( QIC, 4mm DAT, DLT etc
etc ) should be available and around for a long long time.



zfs send as backup is probably not generally acceptable: you can't
expect to extract a single file out of it (at least not out of an
incremental zfs send), but that's certainly done routinely with ufsdump,
tar, cpio, ...

Also, why not just punt to NDMP?


.. lets look at it.

http://www.ndmp.org/products/

thats a fair list of companies there.  The SDK looks to be alpha stage
or maybe beta :

Q: What good is the NDMP SDK?

The NDMP software developers kit is developed to prototype new NDMP
functionality added and provides a functional (although fairly basic)
implementation of an NDMP client and NDMP server.

The objective of the SDK is to facilitate rapid development of NDMP
compliant clients and servers on a variety of platforms.

Third parties are welcome to download and make use of the provided
source code within your products (subject to copyright notices
supplied) or as example/reference code.


OKay .. thats a good candidate to look at.

dc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] modify zpool_main.c for raw iostat data

2007-04-15 Thread Dennis Clarke


WARNING : long and verbose .. sorry.

ALL :

While doing various performance tests and measurements of IO rates with
a zpool I found that the output from zpool iostat poolname was not really
ready for plotting by gnuplot.

The output from zpool iostat poolname looks like so :

# zpool iostat zpool0 15
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
zpool0   292K  1.63T 48 91  2.96M  6.10M
zpool0   292K  1.63T  0  0  0  0
zpool0  10.0M  1.63T  0373  0   878K
zpool0  27.4M  1.63T  0567  0  1.36M
zpool0  43.9M  1.63T  0612  0  1.44M
zpool0  72.1M  1.63T  0566  0  1.32M
zpool0   101M  1.63T  0  1.16K  0  2.80M
zpool0   156M  1.63T  0  1.17K  0  2.81M
zpool0   183M  1.63T  0  1.13K  0  2.73M
zpool0   237M  1.63T  0  1.14K  0  2.74M
.
.
.

I note that the headers there are hard coded into zpool_main.c and there
appears to be no option to disable them, thus :

static void
print_iostat_header(iostat_cbdata_t *cb)
{
(void) printf(%*s capacity operationsbandwidth\n,
cb-cb_namewidth, );
(void) printf(%-*s   used  avail   read  write   read  write\n,
cb-cb_namewidth, pool);
print_iostat_separator(cb);
}

and here we see :

static void
print_iostat_separator(iostat_cbdata_t *cb)
{
int i = 0;

for (i = 0; i  cb-cb_namewidth; i++)
(void) printf(-);
(void) printf(  -  -  -  -  -  -\n);
}

However these headers are only ever printed once while the verbose option
-v is NOT in effect.  We do see them repeated over and over when the
verbose flag is engaged :

# zpool iostat -v zpool0 5
capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
zpool0148K  1.63T  0 21635  2.64M
  mirror 14.5K   278G  0  3112   449K
c2t8d0   -  -  0  3 52   449K
c3t8d0   -  -  0  3 95   449K
  mirror   47K   278G  0  3 77   449K
c2t9d0   -  -  0  3 52   449K
c3t9d0   -  -  0  3 63   449K
  mirror 47.5K   278G  0  3 69   449K
c2t10d0  -  -  0  3 52   449K
c3t10d0  -  -  0  3 42   449K
  mirror   14K   278G  0  3 97   451K
c2t11d0  -  -  0  3 74   451K
c3t11d0  -  -  0  3 52   451K
  mirror6K   278G  0  3125   451K
c2t12d0  -  -  0  3 73   451K
c3t12d0  -  -  0  3 94   451K
  mirror 19.5K   278G  0  3154   451K
c2t13d0  -  -  0  3147   451K
c3t13d0  -  -  0  3 31   451K
---  -  -  -  -  -  -

It seems perfectly reasonable to have those headers repeated over and over
when we expect verbose reports.

The issue that I have is with the summary reports where we get the headers
only once. I wanted to take iostat data from zpool iostat poolname and be
able to directly pass it through awk and into a datafile for processing into
nice graphs. I find that gnuplot does this nicely. However the units of the
data may vary from Kilobytes ( K ), Megabytes ( M ) and perhaps a simple
digit zero with no suffix at all. I can only presume that we may also see
Gigabyes ( G ) and perhaps some day even Terabytes ( T ).

I was thinking that we could add in an option for the iostat subcommand for
raw output as simple integers.  Here we could have a -r option which would
work only when the verbose option is NOT present. At least for now.

So then we see this :

static const char *
get_usage(zpool_help_t idx) {
switch (idx) {
.
.
.
/*
 * I may have the option syntax wrong here but the intent is that one may
 * specify the -r OR the -v but not both at the same time.
 */
case HELP_IOSTAT:
return (gettext(\tiostat {[-r]|[-v]} [pool] ... [interval 
[count]]\n));
.
.
.
}

abort();
/* NOTREACHED */
}

The iostat_cbdata struct would need a new int element also :

typedef struct iostat_cbdata {
zpool_list_t *cb_list;
/*
 * The cb_raw int is added here by Dennis Clarke
 */
int cb_raw;

int cb_verbose;
int cb_iteration;
int cb_namewidth;
} iostat_cbdata_t;

I don't think that any change to print_vdev_stats is required because the
creation of the suffixes seems to occur with print_one_stat :

/*
 * Display a single statistic.
 */
void
print_one_stat(uint64_t value)
{
char buf[64];

zfs_nicenum(value, buf, sizeof (buf

[zfs-discuss] zpool iostat : This command can be tricky ...

2007-04-15 Thread Dennis Clarke


I really need to take a longer look here.

/*
 * zpool iostat [-v] [pool] ... [interval [count]]
 *
 *  -v  Display statistics for individual vdevs
 *
 * This command can be tricky because we want to be able to deal with pool
.
.
.

 I think I may need to deal with a raw option here ?

/*
 * Enter the main iostat loop.
 */
cb.cb_list = list;
cb.cb_verbose = verbose;
cb.cb_iteration = 0;
cb.cb_namewidth = 0;

Hopefully you can see what I am trying to do ( see previous post ) is just
get the raw data and I may do a quick hack to look at it.

so until I get a clean compile that dumps out the data .. it may be best to
ignore me :-)

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: update on zfs boot support

2007-03-11 Thread Dennis Clarke


 Robert Milkowski wrote:
 Hello Ivan,
 Sunday, March 11, 2007, 12:01:28 PM, you wrote:

 IW Got it, thanks, and a more general question, in a single disk
 IW root pool scenario, what advantage zfs will provide over ufs w/
 IW logging? And when zfs boot integrated  in neveda, will live upgrade
 work with zfs root?

 Snapshots/clones + live upgrade or standard patching.
 Additionally no more hassle with separate /opt /var ...

 Potentially also compression turned on on /var

   - just to add to Robert's list, here's other advantages ZFS on root
 has over UFS, even on a single disk:

 * knowing when your data starts getting corrupted
(if your disk starts failing, and what data is being lost)
 * ditto blocks to take care of filesystem metadata consistency
 * performance improvements over UFS
 * ability to add disks to mirror the root filesystem at any time,
should they become available
 * ability to use free space on the root pool, making it
available for other uses (by setting a reservation on the root
filesystem, you can ensure that / always has sufficient available
space)

 - am I missing any others ?


 * ability to show off to your geeky friends who will all say neato!

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-02-27 Thread Dennis Clarke



 You don't honestly, really, reasonably, expect someone, anyone, to look
 at the stack

  well of course he does :-)

  and I looked at it .. all of it and I can tell exactly what the problem is
  but I'm not gonna say because its a trick question.
  so there.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: [osol-help] How to recover from rm *?

2007-02-18 Thread Dennis Clarke


 On Sun, 18 Feb 2007, Calvin Liu wrote:

 I want to run command rm Dis* in a folder but mis-typed a space in it
 so it became rm Dis *. Unfortunately I had pressed the return button
 before I noticed the mistake. So you all know what happened... :( :( :(

 Ouch!

 How can I get the files back in this case?

 You restore them from your backups.

 I haven't backup them.

This is one ( of many ) reasons why ZFS just rocks.  A snapshot would have
saved you.  I don't consider a snapshot to be an actual backup however.  I
define a backup as something that you can actually restore to bare metal
when your entire datacenter has vanished into a blackhole.  That means a
tape generally.

In the Lotus Notes/Domino world there is a very nice feature where you can
have soft-deletions.  Essentially you can delete a record from a database
and then still do a recovery if needed within a given retention time period.
 Perhaps a soft-deletion feature to ZFS would be nice.  It would allow a
sysadmin or maybe even a user to delete something and then come back later,
check a deletion log and possibly just unrm the file.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: Re[2]: [zfs-discuss] 118855-36 ZFS

2007-02-05 Thread Dennis Clarke


   /* Warning : soapbox speech ahead */


 Something here is broken.


As a rule don't trust smpatch.  Don't trust the freeware pca either.

Either one may or may not include patches that you don't need or they
may list patches you do need or seem to need but once you apply them
you find your system buggered up in some way.

So, in my opinion, patches are like russian roulette.  So very carefully
apply what you know you *need* based on actually looking in the patch
readme files.  The recommended pile of patches are 99.9% safe and then
outside of that you have to pick and choose.

Since I am on a soapbox here, I may as well be in for a pound as
well as the penny.

I like to install what I call a reference edition of Solaris.  An update
release like Solaris 10 Update 3 or Solaris 9 Update 8.  These releases are
generally very well tested and you can install them and run them in a very
stable fashion long term.  Once you add a single patch to that system you
have wandered out of this is shipped on media to somewhere else.

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] impressive

2007-02-01 Thread Dennis Clarke


boldly plowing forwards I request a few disks/vdevs to be mirrored
all at the same time :

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Feb  1 04:17:58 2007
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  c1t11d0ONLINE   0 0 0
  c1t12d0ONLINE   0 0 0
  c1t13d0ONLINE   0 0 0
  c1t14d0ONLINE   0 0 0

errors: No known data errors
bash-3.2# zpool attach -f zfs0 c1t11d0 c0t11d0
bash-3.2# zpool attach -f zfs0 c1t12d0 c0t12d0
bash-3.2# zpool attach -f zfs0 c1t13d0 c0t13d0
bash-3.2# zpool attach -f zfs0 c1t14d0 c0t14d0

  needless to say there is some thrashing going on

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.00% done, 45h14m to go
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0
c0t14d0  ONLINE   0 0 0

errors: No known data errors
bash-3.2#

moments later I see :

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 1.59% done, 2h19m to go
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0
c0t14d0  ONLINE   0 0 0

errors: No known data errors
bash-3.2#

bash-3.2# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd ip hook
neti sctp arp usba nca zfs random audiosup sppp crypto ptm md logindmux cpc
wrsmd fcip fctl fcp nfs ]
 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel  79986   624   71%
Anon16131   126   14%
Exec and libs1830142%
Page cache533 40%
Free (cachelist)  934 71%
Free (freelist) 13662   106   12%

Total  113076   883
Physical   111514   871

bash-3.2#

so in a few hours I will have decent redundency

all on snv_55b ... looking very very fine

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Hello,

 We're setting up a new mailserver infrastructure and decided, to run it
 on zfs. On a E220R with a D1000, I've setup a storage pool with four
 mirrors:

   Good morning Ihsan ...

   I see that you have everything mirrored here, thats excellent.

   When you pulled a disk, was it a disk that was containing a metadevice or
 was it a disk in the zpool ?  In the case of a metadevice, as you know, the
 system should have kept running fine.  We have probably both done this over
 and over at various sites to demonstrate SVM to people.

   If you pulled out a device in the zpool, well now we are in a whole new
 world and I had heard that there was some *feature* in Solaris now that
 will protect the ZFS file system integrity by simply causing a system to
 panic if the last device in some redundant component was compromised.

   I think you hit a major bug in ZFS personally.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Hello Michael,

 Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb:

 --
 [EMAIL PROTECTED] # zpool status
   pool: pool0
  state: ONLINE
  scrub: none requested
 config:

 [...]

 Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
 Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
 lack of DiskSuite state
 Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
 were available,
 Jan 23 18:51:38 newponit  so panic to ensure data integrity.

 this message shows (and the rest of the stack prove) that your panic
 happened in SVM. It has NOTHING to do with zfs. So either you pulled the
 wrong disk, or the disk you pulled also contained SVM volumes (next to
 ZFS).

 I noticed that the panic was in SVM and I'm wondering, why the machine
 was hanging. SVM is only running on the internal disks (c0) and I pulled
 a disk from the D1000:

   so the device that was affected had nothing to do with SVM at all.

   fine ... I have the exact same cconfig here.  Internal SVM and
  then external ZFS on two disk arrays on two controllers.

 Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:14 newponit  SCSI transport failed: reason 'incomplete':
 retrying command
 Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:14 newponit  disk not responding to selection
 Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:18 newponit  disk not responding to selection

 This is clearly the disk with ZFS on it: SVM has nothing to do with this
 disk. A minute later, the troubles started with the internal disks:

  OKay .. so are we back to looking at ZFS or ZFS and the SVM components or
some interaction between these kernel modules.  At this point I have to be
careful not to fall into a pit of blind ignorance as I grobe for the
answer.  Perhaps some data would help.  Was there a core file in
/var/crash/newponit ?

 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  Cmd (0x6a3ed10) dump for Target 0 Lun 0:
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  cdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 
 0x0 0x10
 0x0 ]
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  pkt_flags=0x4000 pkt_statistics=0x60 
 pkt_state=0x7
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  pkt_scbp=0x0 cmd_flags=0x860
 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
 Jan 23 17:25:26 newponit  Disconnected tagged cmd(s) (1) timeout for
 Target 0.0

   so a pile of scsi noise above there .. one would expect that from a
 suddenly missing scsi device.

 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available
 Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
 Disconnected tagged cmd(s) (1) timeout for Target 0.0

  NCR scsi controllers .. what OS revision is this ?   Solaris 10 u 3 ?

  Solaris Nevada snv_55b ?

 Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING:
 ID[SUNWpd.glm.cmd_timeout.6018]
 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
 Jan 23 17:25:26 newponit  got SCSI bus reset
 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available

 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
 should be any impact on the SVM disks when I pull out a ZFS disk.

  I still feel that you hit a bug in ZFS somewhere.  Under no circumstances
should a Solaris server panic and crash simply because you pulled out a
single disk that was totally mirrored.  In fact .. I will reproduce those
conditions here and then see what happens for me.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Am 24.1.2007 14:59 Uhr, Dennis Clarke schrieb:

 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available
 Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
 Disconnected tagged cmd(s) (1) timeout for Target 0.0

   NCR scsi controllers .. what OS revision is this ?   Solaris 10 u 3 ?

   Solaris Nevada snv_55b ?

 [EMAIL PROTECTED] # cat /etc/release
Solaris 10 11/06 s10s_u3wos_10 SPARC
Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
Assembled 14 November 2006
 [EMAIL PROTECTED] # uname -a
 SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60


   oh dear.

   that's not Solaris Nevada at all.  That is production Solaris 10.

 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
 should be any impact on the SVM disks when I pull out a ZFS disk.

   I still feel that you hit a bug in ZFS somewhere.  Under no
 circumstances
 should a Solaris server panic and crash simply because you pulled out a
 single disk that was totally mirrored.  In fact .. I will reproduce those
 conditions here and then see what happens for me.

 And Solaris should not hang at all.

  I agree.  We both know this.  You just recently patched a blastwave server
that was running for over 700 days in production and *this* sort of
behavior just does not happen in Solaris.

  Let me see if I can reproduce your config here :

bash-3.2# metastat -p
d0 -m /dev/md/rdsk/d10 /dev/md/rdsk/d20 1
d10 1 1 /dev/rdsk/c0t1d0s0
d20 1 1 /dev/rdsk/c0t0d0s0
d1 -m /dev/md/rdsk/d11 1
d11 1 1 /dev/rdsk/c0t1d0s1
d4 -m /dev/md/rdsk/d14 1
d14 1 1 /dev/rdsk/c0t1d0s7
d5 -m /dev/md/rdsk/d15 1
d15 1 1 /dev/rdsk/c0t1d0s5
d21 1 1 /dev/rdsk/c0t0d0s1
d24 1 1 /dev/rdsk/c0t0d0s7
d25 1 1 /dev/rdsk/c0t0d0s5

bash-3.2# metadb
flags   first blk   block count
 a m  p  luo16  8192/dev/dsk/c0t0d0s4
 ap  luo82088192/dev/dsk/c0t0d0s4
 ap  luo16  8192/dev/dsk/c0t1d0s4
 ap  luo82088192/dev/dsk/c0t1d0s4

bash-3.2# zpool status -v zfs0
  pool: zfs0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zfs0ONLINE   0 0 0
  c1t9d0ONLINE   0 0 0
  c1t10d0   ONLINE   0 0 0
  c1t11d0   ONLINE   0 0 0
  c1t12d0   ONLINE   0 0 0
  c1t13d0   ONLINE   0 0 0
  c1t14d0   ONLINE   0 0 0

errors: No known data errors
bash-3.2#

I will add in mirrors to that zpool from another array on another controller
and then yank a disk.  However this machine is on snv_52 at the moment.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Dennis Clarke


 What do you mean by UFS wasn't an option due to
 number of files?

 Exactly that. UFS has a 1 million file limit under Solaris. Each Oracle
 Financials environment well exceeds this limitation.


what ?

$ uname -a
SunOS core 5.10 Generic_118833-17 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
$ df -F ufs -t
/  (/dev/md/dsk/d0):  5367776 blocks   616328 files
  total: 13145340 blocks   792064 files
/export/nfs(/dev/md/dsk/d8): 83981368 blocks 96621651 files
  total: 404209452 blocks 100534720 files
/export/home   (/dev/md/dsk/d7):   980894 blocks   260691 files
  total:   986496 blocks   260736 files
$

I think that I am 95,621,651 files over your 1 million limit right there!

Should I place a support call and file a bug report ?

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Dennis Clarke


 On Mon, Jan 08, 2007 at 03:47:31PM +0100, Peter Schuller wrote:
   http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

 So just to confirm; disabling the zil *ONLY* breaks the semantics of
 fsync()
 and synchronous writes from the application perspective; it will do
 *NOTHING*
 to lessen the correctness guarantee of ZFS itself, including in the case
 of a
 power outtage?

 That is correct.  ZFS, with or without the ZIL, will *always* maintain
 consistent on-disk state and will *always* preserve the ordering of
 events on-disk.  That is, if an application makes two changes to the
 filesystem, first A, then B, ZFS will *never* show B on-disk without
 also showing A.


  So then, this begs the question Why do I want this ZIL animal at all?

 This makes it more reasonable to actually disable the zil. But still,
 personally I would like to be able to tell the NFS server to simply not be
 standards compliant, so that I can keep the correct semantics on the lower
 layer (ZFS), and disable the behavior at the level where I actually want
 it
 disabled (the NFS server).

 This would be nice, simply to make it easier to do apples-to-apples
 comparisons with other NFS server implementations that don't honor the
 correct semantics (Linux, I'm looking at you).

  is that a glare or a leer or a sneer ?

  :-)

dc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] HOWTO make a mirror after the fact

2007-01-07 Thread Dennis Clarke

 is not currently part of a mirrored configura-
 tion,  device  automatically  transforms  into a two-way
 mirror of device and new_device.  If device is part of a
 two-way mirror, attaching new_device creates a three-way
 mirror, and so on. In either case, new_device begins  to
 resilver immediately.

 -f   Forces use of new_device, even if  its  appears
  to be in use. Not all devices can be overridden
  in this manner.


Note that attach has no option for -n which would just show me the damage
I am about to do :-(

So I am making a best guess here that what I need is something like this :

# zpool attach zfs0 c1t9d0 c0t9d0

which would mean that the fist disk in my zpool would be mirrored and
nothing else.  A weird config to be sure but .. is this what will happen?

I ask all this in painful boring detail because I have no way to backup this
zpool other than tar to a DLT.  The last thing I want to do is destroy my
data when I am trying to add redundency.

Any thoughts ?

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] HOWTO make a mirror after the fact

2007-01-07 Thread Dennis Clarke

 Note that attach has no option for -n which would just show me the
 damage I am about to do :-(

 In general, ZFS does a lot of checking before committing a change to the
 configuration.  We make sure that you don't do things like use disks
 that are already in use, partitions aren't overlapping, etc.  All of the
 data integrity features in ZFS wouldn't be worth much if we allowed an
 administrator to unintentionally destroy data.

  which is why I am beginning to think of ZFS as the last filesystem I
 will need.  But the head space transition is not easy for a guy that
 thrives on super stable technology.  Like Solaris 8 :-)

 So I am making a best guess here that what I need is something like this :

 # zpool attach zfs0 c1t9d0 c0t9d0

 which would mean that the fist disk in my zpool would be mirrored and
 nothing else.  A weird config to be sure but .. is this what will happen?

 Yep, that's exactly what will happen.  Lather, rinse, repeat for the
 other disks in the pool, and you should be exactly where you want to be.

  Okay .. phasars on stun and in I go .

 I ask all this in painful boring detail because I have no way to backup
 this
 zpool other than tar to a DLT.  The last thing I want to do is destroy my
 data when I am trying to add redundency.

 Any thoughts ?

 What you figured out is exactly the right thing.  If you decide you want
 to undo it, just use zpool detach.


  The only reason that I asked is that there is no explicit EXAMPLE in the
manpage that says HOW TO UPGRADE FROM STRIPE TO MIRRORED STRIPES or maybe
something that says RAID 0+1 or RAID 1+0.  Just a bit more info in the ZFS
manpages because that is the first place any admin will look.  Not an
online PDF file somewhere.  Often times all I have to see what is going on
in my server is s DEC VT220 terminal.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Dennis Clarke


 Another thing to keep an eye out for is disk caching.  With ZFS,
 whenever the NFS server tells us to make sure something is on disk, we
 actually make sure it's on disk by asking the drive to flush dirty data
 in its write cache out to the media.  Needless to say, this takes a
 while.

 With UFS, it isn't aware of the extra level of caching, and happily
 pretends it's in a world where once the drive ACKs a write, it's on
 stable storage.

 If you use format(1M) and take a look at whether or not the drive's
 write cache is enabled, that should shed some light on this.  If it's
 on, try turning it off and re-run your NFS tests on ZFS vs. UFS.

 Either way, let us know what you find out.

Slightly OT but you just reminded me of why I like disks that have Sun
firmware on them.  They never have write cache on.  At least I have never
seen it. Read cache yes but write cache never.  At least in the Seagates and
Fujitsus Ultra320 SCSI/FCAL disks that have a Sun logo on them.

I have no idea what else that Sun firmware does on a SCSI disk but I'd love
to know :-)

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re[2]: ZFS in a SAN environment

2006-12-20 Thread Dennis Clarke



 no no .. its a feature.  :-P

 If it walks like a duck and quacks like a duck then its a duck.

 a kernel panic that brings down a system is a bug.  Plain and simple.

 I disagree (nit).  A hardware fault can also cause a panic.  Faults != bugs.

  ha ha .. yeah.  If the sysadm walks over to a machine an pour coffee in
it then I guess it will fault all over the place.  No appreciation for
coffee I guess.

however ... when it comes to storage I expect that a disk failure or
hot swap will not cause a fault if and only if there still remains some
other storage device that holds the bits in a redundant fashion.

so .. disks can fail.  That should be okay.

even memory and processors can fail.  within reason.

 I do agree in principle, though.  Panics should be avoided whenever
 possible.

 coffee spillage also ..

 Incidentally, we do track the panic rate and collect panic strings.  The
 last detailed analysis I saw on the data showed that the vast majority were
 hardware induced.  This was a bit of a bummer because we were hoping that
 the tracking data would lead to identifying software bugs.

but it does imply that the software is way better than the hardware eh ?

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re[2]: ZFS in a SAN environment

2006-12-19 Thread Dennis Clarke


 Anton B. Rang wrote:
 INFORMATION: If a member of this striped zpool becomes unavailable or
 develops corruption, Solaris will kernel panic and reboot to protect your
 data.


 OK, I'm puzzled.

 Am I the only one on this list who believes that a kernel panic, instead
 of EIO, represents a bug?


 Nope.  I'm with you.

no no .. its a feature.  :-P

If it walks like a duck and quacks like a duck then its a duck.

a kernel panic that brings down a system is a bug.  Plain and simple.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: bare metal ZFS ? How To ?

2006-11-24 Thread Dennis Clarke

 Excuse me if I'm mistaken, but I think the question is on the lines of how
to access and more importantly - Backup zfs pools/filesystems present on a
system by just booting from a CD/DVD.

 I think the answer would be on the lines of (forced?) importing of zfs
pools
 present on the system and then using zfs send /foo | star  The OP
might be looking at something convenient along the lines of ufsdump.

 I think there is a need of a zfsdump tool (script?) or even better - zfs
integration in star. Maybe JÃ¶rg should chip in :-)

As a matter of fact you nailed down exactly what I was doing.

Except star had a problem with locking an object or some similar error
message.  I simply tried the following :

power up the Sun machine
look at the ok prompt
type 'boot net -srv'
wait for a while until I get a SINGLE USER MODE hash prompt
type zfs import thus

.
.
.
Requesting System Maintenance Mode
SINGLE USER MODE
# zpool import
  pool: zfs0
id: 13628474126490156099
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
The pool may be active on on another system, but can be imported using
the '-f' flag.
config:

zfs0ONLINE
  c1t9d0ONLINE
  c1t10d0   ONLINE
  c1t11d0   ONLINE
  c1t12d0   ONLINE
  c1t13d0   ONLINE
  c1t14d0   ONLINE

then I did this

# mkdir /tmp/root/foo
# zpool import -f -R /tmp/root/foo 13628474126490156099

Then I could cd to various places in /tmp/root/foo and attempt to run star
to do a backup to tape.  That didn't go so well as I got an error about not
being able to lock an object in memory.  Also, you can't get star unless you
ftp it in from somewhere or have it on floppy/CDROM etc etc.

I reverted to good old tar like so :

tar -cvfPE /dev/rmt/0mbn .

then that blew up ( after three hours or more ) because I hit the end of the
tape and the process died.

So the long and short of it is that you can't drop a ZFS filesystem to tape
easily with any built in tools in the SXCR these days.  There is already an
RFE filed on that but I think its low priority.  You can recover a zpool
easily enough with zpool import but if you ever lose a few disks or some
disaster hits then you had better have Veritas NetBackup or similar in
place.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] bare metal ZFS ? How To ?

2006-11-23 Thread Dennis Clarke


One of the things that I have taken for granted was that I can *always* boot
a Sun server with a CDROM or DVD or jumpstart boot net -srv and get to a
prompt. That allows me to fsck filesystems and ufsdump to tape if needed. 
In fact, I have generally done obscure things like fully install a server
and then, with everything working fine, booted with CDROM and dumped the
system to tape and I call that my ground zero backup tape.  Everything
that follows after that is incremental for a while until the next level 0
dump. [1]

Well I just did a boot net -sv on my server here and have snv-b52 at the
prompt.  I was able to fsck the basic UFS filesystems on this machine as it
was running snv-b46.  I attached a tape drive before booting and am
therefore able to ufsdump and verify the contents of the snv-b46 UFS
filesystems to tape.

This is all a good thing.

My problem is that the snv-46 machine also had a zpool and multiple ZFS
filesystems.  Simply running zfs list here at the prompt gets me nothing
of course.

Can I create or otherwise recover some XML file from the snv-46 server
filesystems in order to gain access to those ZFS filesystems from this noot
boot shell?  Is there any way to backup those ZFS filesystems while booted
from CDROM/DVD or boot net ?

Essentially, if I had nothing but bare metal here and a tape drive can I
access the zpool that resides on six 36GB disks on controller 2 or am I dead
in the water ?

-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] bare metal ZFS ? How To ?

2006-11-23 Thread Dennis Clarke


 On 11/23/06, James Dickens [EMAIL PROTECTED] wrote:
 On 11/23/06, Dennis Clarke [EMAIL PROTECTED] wrote:
 
  assume worst case
 
  someone walks up to you and drops an array on you.
 They say its ZFS an' I need that der stuff 'k?  all while chewing on a
  cig.
 
  what do you do ?  besides run ?


 same thing.. plug it in... run   zpool import and get a list of pool...
 and import, renaming the pool if necessary...


well golly gee .. that works real slick

.
.
.
Requesting System Maintenance Mode
SINGLE USER MODE
# zpool import
  pool: zfs0
id: 13628474126490156099
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
The pool may be active on on another system, but can be imported using
the '-f' flag.
config:

zfs0ONLINE
  c1t9d0ONLINE
  c1t10d0   ONLINE
  c1t11d0   ONLINE
  c1t12d0   ONLINE
  c1t13d0   ONLINE
  c1t14d0   ONLINE
#

besides the grammer error above it all looks perfect.

I can search the source code to find the double on on error there and then
someone else can file a bug report.

Right now I think I'll see if I can import this puppy.


 and do   zpool  import -R /test  foreignarray;   zpool status
 foreignarray; zfs list foreignarray


okay .. that comes next.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-22 Thread Dennis Clarke


Have a gander below :

 Agreed - it sucks - especially for small file use.  Here's a 5,000 ft view
 of the performance while unzipping and extracting a tar archive.  First
 the test is run on a SPARC 280R running Build 51a with dual 900MHz USIII
 CPUs and 4Gb of RAM:

 $ cp emacs-21.4a.tar.gz /tmp
 $ ptime gunzip -c /tmp/emacs-21.4a.tar.gz |tar xf -

 real   13.092
 user2.083
 sys 0.183

here is my machine here ( Solaris 8 Ultra 2 200MHz )

# cd /tmp
# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes =
74570.00k).
/export/home/dclarke/star: Total time 11.057sec (6744 kBytes/sec)

real   11.146
user0.300
sys 1.762

and the same test on the same machine with a local UFS filesystem :

# cd /mnt/test
# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes =
74570.00k).
/export/home/dclarke/star: Total time 92.378sec (807 kBytes/sec)

real 1:32.463
user0.351
sys 3.658

Pretty much what I expect for an old old Solaris 8 box.

Then I try using a mounted NFS filesystem shared from ZFS on snv_46

# cat /etc/release
   Solaris Nevada snv_46 SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 14 August 2006

# zfs set sharenfs=nosub,nosuid,rw=pluto,root=pluto zfs0/backup
# zfs get sharenfs zfs0/backup
NAME PROPERTY   VALUE  SOURCE
zfs0/backup  sharenfs   nosub,nosuid,rw=pluto,root=pluto  local
#

# tip hardwire
connected

pluto console login: root
Password:
Nov 22 18:41:50 pluto login: ROOT LOGIN /dev/console
Last login: Tue Nov 21 02:07:39 on console
Sun Microsystems Inc.   SunOS 5.8   Generic Patch   February 2004
# cat /etc/release
   Solaris 8 2/04 s28s_hw4wos_05a SPARC
   Copyright 2004 Sun Microsystems, Inc.  All Rights Reserved.
Assembled 08 January 2004

# dfshares mars
RESOURCE  SERVER ACCESSTRANSPORT
  mars:/export/zfs/backup   mars  - -
  mars:/export/zfs/qemu mars  - -
#

# mkdir /export/nfs
# mount -F nfs -o bg,intr,nosuid mars:/export/zfs/backup /export/nfs
#
# cd /export/nfs/titan
# ls -lap
total 142780
drwxr-xr-x   3 dclarke  other  8 Nov 22 19:08 ./
drwxr-xr-x   9 root sys   12 Nov 15 20:14 ../
-rw-r--r--   1 phil csw13102 Jul 12 12:32 README.csw
-rw-r--r--   1 dclarke  csw   189389 Sep 14 19:33 ae-2.2.0.tar.gz
-rw-r--r--   1 dclarke  csw  91965440 Jul 25 12:56 dclarke.tar
-rw-r--r--   1 dclarke  csw  20403483 Nov 22 19:07 emacs-21.4a.tar.gz
-rw-r--r--   1 dclarke  csw  5468160 Jul 25 12:57 root.tar
drwxr-xr-x   5 dclarke  csw5 May 24  2006 schily/
#

Now that my Solaris 8 box has a mounted ZFS/NFS filesystem I test again

# ptime /export/home/dclarke/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/export/home/dclarke/star: 7457 blocks + 0 bytes (total of 76359680 bytes =
74570.00k).
/export/home/dclarke/star: Total time 215.958sec (345 kBytes/sec)

real 3:36.048
user0.397
sys 5.961
#

That is based on the ZFS/NFS mounted filesystem.

What if I run the same test on my server locally? On ZFS ?

# ptime /root/bin/star -x -time -z file=/tmp/emacs-21.4a.tar.gz
/root/bin/star: 7457 blocks + 0 bytes (total of 76359680 bytes = 74570.00k).
/root/bin/star: Total time 32.238sec (2313 kBytes/sec)

real   32.680
user6.973
sys 9.945
#

So gee ... thats all pretty slow but really really slow with ZFS shared out
via NFS.

wow .. good to know.   I *never* would have seen that coming.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dennis Clarke


 On Sun, 22 Oct 2006, Stephen Le wrote:

 Is it possible to construct a RAID-10 array with ZFS? I've read through
 the ZFS documentation, and it appears that the only way to create a
 RAID-10 array would be to create two mirrored (RAID-1) emulated volumes
 in ZFS and combine those to create the outer RAID-0 volume.

 Am I approaching this in the wrong way? Should I be using SVM to create
 my RAID-1 volumes and then create a ZFS filesystem from those volumes?

 No - don't do that.  Here is a ZFS version of a RAID 10 config with 4
 disks:

 - from 817.2271.pdf -

 Creating a Mirrored Storage Pool

 To create a mirrored pool, use the mirror keyword, followed by any number
 of storage devices that will comprise the mirror. Multiple mirrors can be
 specied by repeating the mirror keyword on the command line.  The
 following command creates a pool with two, two-way mirrors:

 # zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0

 The second mirror keyword indicates that a new top-level virtual device is
 being specied.  Data is dynamically striped across both mirrors, with data
 being replicated between each disk appropriately.


We need to keep in mind that the exact same result may be achieved with
simple SVM :

d1 1 2 /dev/dsk/c1d0s0 /dev/dsk/c3d0s0 -i 512b
d2 1 2 /dev/dsk/c2d0s0 /dev/dsk/c4d0s0 -i 512b
d3 -m d1

metainit d1
metainit d2
metainit d3
metattach d3 d2

At this point, if and only if all stripe components come from exactly
identical geometry disks or slices, you get a stripe of mirrors and not
just a mirror of stripes.

While ZFS may do a similar thing *I don't know* if there is a published
document yet that shows conclusively that ZFS will survive multiple disk
failures.

However ZFS brings a lot of other great features.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dennis Clarke


 Dennis Clarke wrote:
 While ZFS may do a similar thing *I don't know* if there is a published
 document yet that shows conclusively that ZFS will survive multiple disk
 failures.

 ??  why not?  Perhaps this is just too simple and therefore doesn't get
 explained well.

That is not what I wrote.

Once again, for the sake of clarity, I don't know if there is a published
document, anywhere, that shows by way of a concise experiment, that ZFS will
actually perform RAID 1+0 and survive multiple disk failures gracefully.

I do not see why it would not.  But there is no conclusive proof that it will.

 Note that SVM (nee Solstice Disksuite) did not always do RAID-1+0, for
 many years it would do RAID-0+1.  However, the data availability for
 RAID-1+0 is better than for an equivalent sized RAID-0+1, so it is just
 as well that ZFS does stripes of mirrors.
   -- richard

My understanding is that SVM will do stripes of mirrors if all of the disk
or stripe components have the same geometry.  This has been documented, well
described and laid out bare for years.  One may easily create two identical
stripes and then mirror them.  Then pull out multiple disks on both sides of
the mirror and life goes on.  So long as one does not remove identical
mirror components on both sides at the same time.  Common sense really.

Anyways, the point is that SVM does do RAID 1+0 and has for years.

ZFS probably does the same thing but it adds in a boatload of new features
that leaves SVM lightyears behind.

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ?: ZFS and POSIX

2006-10-21 Thread Dennis Clarke


 Steffen Weiberle wrote:
 Customer asks whether ZFS is fully POSIX compliant, such as flock?


 ZFS is not currently fully POSIX compliant.  Making ZFS fully POSIX
 compliant is still planned and we are currently addressing bugs in this
 area.

 Interfaces such as flock() should work just fine now.  The flock
 interface is implemented in both the VFS and in ZFS.  As far as I know
 we have no known issues with flock.

 Most of the POSIX related issues are in edge conditions such as removing
 directories when the file system is 100% full.


Sometimes I think people roll out the old is it POSIX compliant question
for the sake of argument.  I think that the standards manpage has a LOT to
say on the matter *but* it does mention Solaris 10 specifically. No mention
of Solaris Nevada or Solaris 11 or ZFS in there.

http://www.blastwave.org/man/standards_5.html

So, while its nice that the manpage is there, its not so nice that it makes
no mention of the OS rev on which we found it.

Just a thought.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] no tool to get expected disk usage reports

2006-10-13 Thread Dennis Clarke



- Original Message -
Subject: no tool to get expected disk usage reports
From:Dennis Clarke [EMAIL PROTECTED]
Date:Fri, October 13, 2006 14:29
To:  zfs-discuss@opensolaris.org


given :

bash-3.1# uname -a
SunOS mars 5.11 snv_46 sun4u sparc SUNW,Ultra-2

bash-3.1# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
zfs0  89.4G   110G  24.5K  legacy
zfs0/backup   65.8G  6.19G  65.8G  /export/zfs/backup
zfs0/kayak23.3G  8.69G  23.3G  /export/zfs/kayak
zfs0/zoner 279M  63.7G  24.5K  legacy
zfs0/zoner/common   53K  16.0G  24.5K  legacy
zfs0/zoner/common/postgres  28.5K  4.00G  28.5K  /export/zfs/postgres
zfs0/zoner/postgres279M  7.73G   279M  /export/zfs/zone/postgres

bash-3.1#
bash-3.1# zfs get all zfs0/kayak
NAME PROPERTY   VALUE  SOURCE
zfs0/kayak   type   filesystem -
zfs0/kayak   creation   Sun Oct  1 23:42 2006  -
zfs0/kayak   used   23.3G  -
zfs0/kayak   available  8.69G  -
zfs0/kayak   referenced 23.3G  -
zfs0/kayak   compressratio  1.19x  -
zfs0/kayak   mountedyes-
zfs0/kayak   quota  32Glocal
zfs0/kayak   reservationnone   default
zfs0/kayak   recordsize 128K   default
zfs0/kayak   mountpoint /export/zfs/kayak  local
zfs0/kayak   sharenfs   offdefault
zfs0/kayak   checksum   on default
zfs0/kayak   compressionon inherited from zfs0
zfs0/kayak   atime  on default
zfs0/kayak   deviceson default
zfs0/kayak   exec   on default
zfs0/kayak   setuid on default
zfs0/kayak   readonly   offdefault
zfs0/kayak   zoned  offdefault
zfs0/kayak   snapdirhidden default
zfs0/kayak   aclmodegroupmask  default
zfs0/kayak   aclinherit secure default

bash-3.1# pwd
/export/zfs/kayak
bash-3.1# ls
c  d  e  f  g
bash-3.1# du -sk c
1246404 c

bash-3.1# find c -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print
Total size  ttl }'
Total size 1752184261

Due to compression there is no easy way to get the expected total size of
a tree of files and directories.

worse, there may be various ways to get a sum total of files in a tree but
the results may be wildly different from what du reports thus :

bash-3.1# find f -type f -ls | awk 'BEGIN{ ttl=0 }{ ttl+=$7 }END{ print
Total size  ttl }'
Total size 3387278008853146
bash-3.1# du -sk f
22672288 f
bash-3.1#

Is there a way to modify du or perhaps create a new tool ?

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Excuses; I did indeed overlook the obvious

2006-08-12 Thread Dennis Clarke


 Yes, before the flames come in I finally realize where I went wrong last
 night. Mistook the discussion lists as a mere forum [i]and[/i] also assumed
 that by participating with a new discussion I could automaticly participate
 in full. I'll keep that in mind for a possible next time but for now I think
 I'd better keep to the common forums.

 Sorry for causing any possible inconvenience for people only following this
 through e-mail.


I had no problem with your email thread at all.  No worries and I don't
any cause for concern.

my 0.02 $


-- 
Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] system unresponsive after issuing a zpool attach

2006-07-13 Thread Dennis Clarke


 Who hoo! It looks like the resilver completed sometime over night. The
 system appears to be running normally, (after one final reboot):

 [EMAIL PROTECTED]: zpool status
   pool: storage
  state: ONLINE
  scrub: none requested
 config:

 NAME  STATE READ WRITE CKSUM
 storage   ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c1t2d0s4  ONLINE   0 0 0
 c1t1d0s4  ONLINE   0 0 0

 errors: No known data errors

looks nice :-)

 I took a poke at the zfs bugs on SunSolve again, and found one that is
 the likely culprit:

 6355416 zpool scrubbing consumes all memory, system hung

 Appears that a fix is in Nevada 36, hopefully it'll be back ported to a
 patch for 10.


whoa whoa ... just one bloody second .. whoa ..

That looks like a real nasty bug description there.

What are the details on that?  Is this particular to a given system or
controller config or something liek that or are we talking global to Solaris
10 Update 2 everywhere ??  :-(

Bug ID: 6355416
Synopsis: zpool scrubbing consumes all memory, system hung
Category: kernel
Subcategory: zfs
State: 10-Fix Delivered   -- in a patch somewhere ?

Description:

On a 6800 domain with 8G of RAM I created a zpool using a single 18G drive
and on that pool created a file system and a zvol. The zvol was filled with
data.

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
pool  11.0G  5.58G  9.00K  /pool
pool/fs  8K  5.58G 8K  /pool/fs
pool/[EMAIL PROTECTED]  0  - 8K  -
pool/root 11.0G  5.58G  11.0G  -
pool/[EMAIL PROTECTED]783K  -  11.0G  -
#

I then attached a second 18g drive to the pool and all seemed well. After a
few minutes however the system ground to a halt.  No response from the
keyboard.

Aborting the system it failed to dump due to the dump device being to small.
 On rebooting it did not make it into multi user.

Booting milestone=none and then bringing it up by had I could see it hung
doing zfs mount -a.

Booting milestone=none again I was able to export the pool and then the
system would come up into multiuser.  Any attempt to import the pool would
hang the system , running vmstat showing it consumed all available memory.

With the pool exported I reinstalled the system with a larger dump device
and then imported the pool.  The same hung occurred however this time I got
the crash dump.

Dumps can be found here:

/net/enospc.uk/export/esc/pts-crashdumps/zfs_nomemory

Dump 0 is from stock build 72a dump 1 from my workspace and had KMF_AUDIT
set.  The only change in my workspace is to the isp driver.

::kmausers gives: ::kmausers
365010944 bytes for 44557 allocations with data size 8192:
 kmem_cache_alloc+0x148
 segkmem_xalloc+0x40
 segkmem_alloc+0x9c
 vmem_xalloc+0x554
 vmem_alloc+0x214
 kmem_slab_create+0x44
 kmem_slab_alloc+0x3c
 kmem_cache_alloc+0x148
 kmem_zalloc+0x28
 zio_create+0x3c
 zio_vdev_child_io+0xc4
 vdev_mirror_io_start+0x1ac
 spa_scrub_cb+0xe4
 traverse_segment+0x2e8
 traverse_more+0x7c
362520576 bytes for 44253 allocations with data size 8192:
 kmem_cache_alloc+0x148
 segkmem_xalloc+0x40
 segkmem_alloc+0x9c
 vmem_xalloc+0x554
 vmem_alloc+0x214
 kmem_slab_create+0x44
 kmem_slab_alloc+0x3c
 kmem_cache_alloc+0x148
 kmem_zalloc+0x28
 zio_create+0x3c
 zio_read+0x54
 spa_scrub_io_start+0x88
 spa_scrub_cb+0xe4
 traverse_segment+0x2e8
 traverse_more+0x7c
241177600 bytes for 376840 allocations with data size 640:
 kmem_cache_alloc+0x88
 kmem_zalloc+0x28
 zio_create+0x3c
 zio_vdev_child_io+0xc4
 vdev_mirror_io_done+0x254
 taskq_thread+0x1a0
209665920 bytes for 327603 allocations with data size 640:
 kmem_cache_alloc+0x88
 kmem_zalloc+0x28
 zio_create+0x3c
 zio_read+0x54
 spa_scrub_io_start+0x88
 spa_scrub_cb+0xe4
 traverse_segment+0x2e8
 traverse_more+0x7c

I have attached the full output.

If I am quick I can detatch the disk and the export the pool before the
system grinds to a halt.  Then reimporting the pool I can access the data. 
Attaching the disk again results in the system using all the memory again.

Date Modified: 2005-11-25 09:03:07 GMT+00:00


Work Around:
Suggested Fix:
Evaluation:
Fixed by patch:
Integrated in Build: snv_36
Duplicate of:
Related Change Request(s):6352306  6384439  6385428
Date Modified: 2006-03-23 23:58:15 GMT+00:00
Public Summary:


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] system unresponsive after issuing a zpool attach

2006-07-12 Thread Dennis Clarke


 Today I attempted to upgrade to S10_U2 and migrate some mirrored UFS SVM
 partitions to ZFS.

 I used Live Upgrade to migrate from U1 to U2 and that went without a
 hitch on my SunBlade 2000. And the initial conversion of one side of the
 UFS mirrors to a ZFS pool and subsequent data migration went fine.
 However, when I attempted to attach the second side mirrors as a mirror
 of the ZFS pool, all hell broke loose.

 The system more or less became unresponsive after a few minutes. It
 appeared that ZFS had taken all available memory because I saw tons of
 errors on the console about failed memory allocations.

 Any thoughts/suggestions?

 The data I migrated consisted of about 80GB. Here's the general flow of
 what I did:

 1. break the SVM mirrors
metadetach d5 d51
metadetach d6 d61
metadetach d7 d71
 2. remove the SVM mirrors
metaclear d51
metaclear d61
metaclear d71
 3. combine the partitions with format. They were contiguous
partitions on s4, s5  s6 of the disk, I just made a single
partition on s4 and cleared s5  s6.
 4. create the pool
zpool create storage cXtXdXs4
 5. create three filesystems
zfs create storage/app
zfs create storage/work
zfs create storage/extra
 6. migrate the data
cd /app; find . -depth -print | cpio -pdmv /storage/app
cd /work; find . -depth -print | cpio -pdmv /storage/work
cd /extra; find . -depth -print | cpio -pdmv /storage/extra
 7. remove the other SVM mirrors
umount /app; metaclear d5 d50
umount /work; metaclear d6 d60
umount /extra; metaclear d7 d70

before you went any further here did you issue a metastat command and also
did you have any metadb's on that other disk before you nuked those slices ?

just asking here

I am hoping that you did a metaclear d5 and then metaclear d50 in order to
clear out both the one sided mirror as well as its component.

I'm just fishing around here ..

 8. combine the partitions with format. They were contiguous
partitions on s4, s5  s6 of the disk, I just made a single
partition on s4 and cleared s5  s6.

okay .. I hope that SVM was not looking for them.  I guess you would get a
nasty stack of errors in that case.

 9. attach the partition to the pool as a mirror
zpool attach storage cXtXdXs4 cYtYdYs4

So you wanted a mirror ?

Like :

# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
storage   ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s4  ONLINE   0 0 0
c0t1d0s4  ONLINE   0 0 0

errors: No known data errors

that sort of deal ?


Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS needs a viable backup mechanism

2006-07-07 Thread Dennis Clarke

   0 0 0
  mirror ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0

errors: No known data errors

   ... with no way to back it up to tape ?

   Someone please enlighten me.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

1 2 >

1 - 100 of 111 matches

Mail list logo