[zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Roch - PAE


Just posted:

  http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 



Performance, Availability  Architecture Engineering  

Roch BourbonnaisSun Microsystems, Icnc-Grenoble 
Senior Performance Analyst  180, Avenue De L'Europe, 38330, 
Montbonnot Saint Martin, France
http://icncweb.france/~rbourbon http://blogs.sun.com/roch
[EMAIL PROTECTED]   (+33).4.76.18.83.20


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Darren J Moffat

Roch - PAE wrote:


Just posted:

  	  http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 


Nice article.  Now what about when we do this with more than one disk 
and compare UFS/SVM or VxFS/VxVM with ZFS as the back end - all with 
JBOD storage ?


How then does ZFS compare as an NFS server ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-08 Thread Peter Schuller
  Is this expected behavior? Assuming concurrent reads (not synchronous and
  sequential) I would naively expect an ndisk raidz2 pool to have a
  normalized performance of n for small reads.

 q.v. http://www.opensolaris.org/jive/thread.jspa?threadID=20942tstart=0
 where such behavior in a hardware RAID array lead to corruption which
 was detected by ZFS.  No free lunch today, either.
  -- richard

I appreciate the advantage of checksumming, believe me. Though I don't see why 
this is directly related to the small read problem, other than that the 
implementation is such.

Is there some fundamental reason why one could not (though I understand one 
*would* not) keep a checksum on a per-disk basis, so that in the normal case 
one really could read from just one disk, for a small read? I realize it is 
not enough for a block to be self-consistent, but theoretically couldn't the 
block which points to the block in question contain multiple checksums for 
the various subsets on different disks, rather than just the one checksum for 
the entire block?

Not that I consider this a major issue; but since you pointed me to that 
article in response to my statement above...

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Peter Schuller
 http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

So just to confirm; disabling the zil *ONLY* breaks the semantics of fsync() 
and synchronous writes from the application perspective; it will do *NOTHING* 
to lessen the correctness guarantee of ZFS itself, including in the case of a 
power outtage?

This makes it more reasonable to actually disable the zil. But still, 
personally I would like to be able to tell the NFS server to simply not be 
standards compliant, so that I can keep the correct semantics on the lower 
layer (ZFS), and disable the behavior at the level where I actually want it 
disabled (the NFS server).

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Distributed FS

2007-01-08 Thread Ivan
Hi,

Is ZFS comparable to PVFS2?  Could it also be used as an distributed filesystem 
at the moment or are there any plans for this in the future?

Thanks and best regards,
Ivan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-08 Thread Jason J. W. Williams

Sanjeev,

Could you point me in the right direction as to how to convert the
following GCC compile flags to Studio 11 compile flags? Any help is
greatly appreciated. We're trying to recompile MySQL to give a
stacktrace and core file to track down exactly why its
crashing...hopefully it will illuminate if memory truly is the issue.
Thank you very much in advance!

-felide-constructors
-fno-exceptions -fno-rtti

Best Regards,
Jason

On 1/7/07, Sanjeev Bagewadi [EMAIL PROTECTED] wrote:

Jason,

There is no documented way of limiting the memory consumption.
The ARC section of ZFS tries to adapt to the memory pressure of the system.
However, in your case probably it is not quick enough I guess.

One way of limiting the memory consumption would be limit the arc.c_max
This (arc.c_max) is set to 3/4 of the memory available (or 1GB less than
memory available).
This is done when the ZFS is loaded (arc_init()).

You should be able to change the value of arc.c_max through mdb and set
it to the value
you want. Exercise caution while setting it. Make sure you don't have
active zpools during this operation.

Thanks and regards,
Sanjeev.

Jason J. W. Williams wrote:

 Hello,

 Is there a way to set a max memory utilization for ZFS? We're trying
 to debug an issue where the ZFS is sucking all the RAM out of the box,
 and its crashing MySQL as a result we think. Will ZFS reduce its cache
 size if it feels memory pressure? Any help is greatly appreciated.

 Best Regards,
 Jason
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Wade . Stuart




I have been looking at zfs source trying to get up to speed on the
internals.  One thing that interests me about the fs is what appears to be
a low hanging fruit for block squishing CAS (Content Addressable Storage).
I think that in addition to lzjb compression, squishing blocks that contain
the same data would buy a lot of space for administrators working in many
common workflows.

I am writing to see if I can get some feedback from people that know the
code better than I -- are there any gotchas in my logic?

Assumptions:

SHA256 hash used (Fletcher2/4 have too many collisions,  SHA256 is 2^128 if
I remember correctly)
SHA256 hash is taken on the data portion of the block as it exists on disk.
the metadata structure is hashed separately.
In the current metadata structure, there is a reserved bit portion to be
used in the future.


Description of change:
Creates:
The filesystem goes through its normal process of writing a block, and
creating the checksum.
Before the step where the metadata tree is pushed, the checksum is checked
against a global checksum tree to see if there is any match.
If match exists, insert a metadata placeholder for the block, that
references the already existing block on disk, increment a number_of_links
pointer on the metadata blocks to keep track of the pointers pointing to
this block.
free up the new block that was written and check-summed to be used in the
future.
else if no match, update the checksum tree with the new checksum and
continue as normal.


Deletes:
normal process, except verifying that the number_of_links count is lowered
and if it is non zero then do not free the block.
clean up checksum tree as needed.

What this requires:
A new flag in metadata that can tag the block as a CAS block.
A checksum tree that allows easy fast lookup of checksum keys.
a counter in the metadata or hash tree that tracks links back to blocks.
Some additions to the userland apps to push the config/enable modes.

Does this seem feasible?  Are there any blocking points that I am missing
or unaware of?   I am just posting this for discussion,  it seems very
interesting to me.

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Bill Moore
On Mon, Jan 08, 2007 at 03:47:31PM +0100, Peter Schuller wrote:
http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
 
 So just to confirm; disabling the zil *ONLY* breaks the semantics of fsync() 
 and synchronous writes from the application perspective; it will do *NOTHING* 
 to lessen the correctness guarantee of ZFS itself, including in the case of a 
 power outtage?

That is correct.  ZFS, with or without the ZIL, will *always* maintain
consistent on-disk state and will *always* preserve the ordering of
events on-disk.  That is, if an application makes two changes to the
filesystem, first A, then B, ZFS will *never* show B on-disk without
also showing A.

 This makes it more reasonable to actually disable the zil. But still, 
 personally I would like to be able to tell the NFS server to simply not be 
 standards compliant, so that I can keep the correct semantics on the lower 
 layer (ZFS), and disable the behavior at the level where I actually want it 
 disabled (the NFS server).

This would be nice, simply to make it easier to do apples-to-apples
comparisons with other NFS server implementations that don't honor the
correct semantics (Linux, I'm looking at you).


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-08 Thread Richard Elling

Peter Schuller wrote:

Is this expected behavior? Assuming concurrent reads (not synchronous and
sequential) I would naively expect an ndisk raidz2 pool to have a
normalized performance of n for small reads.
  

q.v. http://www.opensolaris.org/jive/thread.jspa?threadID=20942tstart=0
where such behavior in a hardware RAID array lead to corruption which
was detected by ZFS.  No free lunch today, either.
 -- richard



I appreciate the advantage of checksumming, believe me. Though I don't see why 
this is directly related to the small read problem, other than that the 
implementation is such.


Is there some fundamental reason why one could not (though I understand one 
*would* not) keep a checksum on a per-disk basis, so that in the normal case 
one really could read from just one disk, for a small read? I realize it is 
not enough for a block to be self-consistent, but theoretically couldn't the 
block which points to the block in question contain multiple checksums for 
the various subsets on different disks, rather than just the one checksum for 
the entire block?
  

Then you would need to keep checksums for each physical block, which
is not part of the on-disk spec.It is not clear to me that this 
would be a net

win, because you would need that checksum to be physically placed on
another vdev, which implies that you still couldn't just read a single 
block and
be happy.  Note, there are lots of different possibilities here, ZFS 
implements

the end-to-end checksum which would not be replaced by a lower level
checksum anyway.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Dennis Clarke

 On Mon, Jan 08, 2007 at 03:47:31PM +0100, Peter Schuller wrote:
   http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

 So just to confirm; disabling the zil *ONLY* breaks the semantics of
 fsync()
 and synchronous writes from the application perspective; it will do
 *NOTHING*
 to lessen the correctness guarantee of ZFS itself, including in the case
 of a
 power outtage?

 That is correct.  ZFS, with or without the ZIL, will *always* maintain
 consistent on-disk state and will *always* preserve the ordering of
 events on-disk.  That is, if an application makes two changes to the
 filesystem, first A, then B, ZFS will *never* show B on-disk without
 also showing A.


  So then, this begs the question Why do I want this ZIL animal at all?

 This makes it more reasonable to actually disable the zil. But still,
 personally I would like to be able to tell the NFS server to simply not be
 standards compliant, so that I can keep the correct semantics on the lower
 layer (ZFS), and disable the behavior at the level where I actually want
 it
 disabled (the NFS server).

 This would be nice, simply to make it easier to do apples-to-apples
 comparisons with other NFS server implementations that don't honor the
 correct semantics (Linux, I'm looking at you).

  is that a glare or a leer or a sneer ?

  :-)

dc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Eric Kustarz

Peter Schuller wrote:

  http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine



So just to confirm; disabling the zil *ONLY* breaks the semantics of fsync() 
and synchronous writes from the application perspective; it will do *NOTHING* 
to lessen the correctness guarantee of ZFS itself, including in the case of a 
power outtage?


See this blog that Roch pointed to:
http://blogs.sun.com/erickustarz/entry/zil_disable

See the sentence:
Note: disabling the ZIL does NOT compromise filesystem integrity. 
Disabling the ZIL does NOT cause corruption in ZFS.




This makes it more reasonable to actually disable the zil. But still, 
personally I would like to be able to tell the NFS server to simply not be 
standards compliant, so that I can keep the correct semantics on the lower 
layer (ZFS), and disable the behavior at the level where I actually want it 
disabled (the NFS server).




This discussion belongs on the nfs-discuss alias.

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Roch - PAE

Hans-Juergen Schnitzer writes:
  Roch - PAE wrote:
   
   Just posted:
   
http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 
   
   
  
  Which role plays network latency? If I understand you right,
  even a low-latency network, e.g. Infiniband, would not increase
  performance substantially since the main bottleneck is that
  the NFS server always has to write data to stable storage.
  Is that correct?
  
  Hans Schnitzer
  
  

For this load, network latency plays a role as long as it is
of the   same order of magnitude to   the I/O latency.  Once
network latency  gets   much smaller than  I/O  latency then
network  latency  becomes pretty  much  irrelevant. At times
both are of  the same order of magnitude   and both must  be
taken into account.

So  if  your storage  is  NVRAM based or   is far away, then
network latency may still be very much at play.

-r

  
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS and ZFS, a fine combination

2007-01-08 Thread Eric Kustarz

Hans-Juergen Schnitzer wrote:

Roch - PAE wrote:



Just posted:

http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine



Which role plays network latency? If I understand you right,
even a low-latency network, e.g. Infiniband, would not increase
performance substantially since the main bottleneck is that
the NFS server always has to write data to stable storage.
Is that correct?


Correct.  You can essentially simulate the NFS semantics by doing a 
fsync after every file creation and before every close on a local tar 
extraction.


eric



Hans Schnitzer




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-08 Thread Toby Thain


On 8-Jan-07, at 11:54 AM, Jason J. W. Williams wrote:


...We're trying to recompile MySQL to give a
stacktrace and core file to track down exactly why its
crashing...hopefully it will illuminate if memory truly is the issue.


If you're using the Enterprise release, can't you get MySQL's  
assistance with this?


--Toby

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread James Carlson
[Initial version of this message originally sent to zfs-interest by
mistake.  Sorry if this appears anywhere as a duplicate.]

I was noodling around with creating a backup script for my home
system, and I ran into a problem that I'm having a little trouble
diagnosing.  Has anyone seen anything like this or have any debug
advice?

I did a zfs create -r to set a snapshot on all of the members of a
given pool.  Later, for reasons that are probably obscure, I wanted to
rename that snapshot.  There's no zfs rename -r function, so I tried
to write a crude one on my own:

zfs list -rHo name -t filesystem pool |
while read name; do
zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
done

The results were disappointing.  The system was extremely busy for a
moment and then went completely catatonic.  Most network traffic
appeared to stop, though I _think_ network driver interrupts were
still working.  The keyboard and mouse (traditional PS/2 types; not
USB) went dead -- not even keyboard lights were working (nothing from
Caps Lock).  The disk light stopped flashing and went dark.  The CPU
temperature started to climb (as measured by an external sensor).  No
messages were written to /var/adm/messages or dmesg on reboot.

The system turned into an increasingly warm brick.  As all of my
inputs to the system were gone, I really had no good way immediately
available to debug the problem.  Thinking this was just a fluke or
perhaps something induced by hardware, I shut everything down, cooled
off, and tried again.  Three times.  The same thing happened each
time.

System details:

  - snv_55

  - Tyan 2885 motherboard with 4GB RAM (four 1GB modules) and one
Opteron 246 (model 5 step 8).

  - AMI BIOS version 080010, dated 06/14/2005.  No tweaks applied,
system is always on; no power management.

  - Silicon Image 3114 SATA controller configured for legacy (not
RAID) mode.

  - Three SATA disks in the system, no IDE as they've gone to the
great bit-bucket in the sky.  The SATA drives are one WDC
WD740GD-32F (not part of this ZFS pool), and a pair of
ST3250623NS.

  - The two Seagate drives are partitioned like this:

  0   rootwm   3 -   6555.00GB(653/0/0)10490445
  1   swapwm 656 -   9162.00GB(261/0/0) 4192965
  2 backupwu   0 - 30397  232.86GB(30398/0/0) 488343870
  3   reservedwm 917 -   9177.84MB(1/0/0) 16065
  4 unassignedwu   00 (0/0/0) 0
  5 unassignedwu   00 (0/0/0) 0
  6 unassignedwu   00 (0/0/0) 0
  7   homewm 918 - 30397  225.83GB(29480/0/0) 473596200
  8   bootwu   0 - 07.84MB(1/0/0) 16065
  9 alternateswm   1 - 2   15.69MB(2/0/0) 32130

  - For both disks: slice 0 is for an SVM mirrored root, slice 1 has
swap, slice 3 has the SVM metadata, and slice 7 is in the ZFS pool
named pool as a mirror.  No, I'm not using whole-disk or EFI.

  - Zpool status:

  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  mirrorONLINE   0 0 0
c4d0s7  ONLINE   0 0 0
c4d1s7  ONLINE   0 0 0

  - 'zfs list -rt filesystem pool | wc -l' says 37.

  - Iostat -E doesn't show any errors of any kind on the drives.

  - I read through CR 6421427, but that seems to be SPARC-only.

Next step will probably be to set the 'snooping' flag and maybe hack
the bge driver to do an abort_sequence_enter() call on a magic packet
so that I can wrest control back.  Before I do something that drastic,
does anyone else have ideas?

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 1 Network Drive 71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Puzzling ZFS behavior with COMPRESS option

2007-01-08 Thread Anantha N. Srirama
Our setup:

- E2900 (24 x 96); Solaris 10 Update 2 (aka 06/06)
- 2 2Gbps FC HBA
- EMC DMX storage
- 50 x 64GB LUNs configured in 1 ZFS pool
- Many filesystems created with COMPRESS enabled; specifically I've one that is 
768GB

I'm observing the following puzzling behavior:

- We are currently creating a large (1.4TB) and sparse dataset; most of the 
dataset contains repeating blanks (default/standard SAS dataset behavior.)
- ls -l reports the file size as 1.4+TB and du -sk reports the actual on disk 
usage at around 65GB.
- My I/O on the system is pegged at 150+MB/S as reported by zpool iostat and 
I've confirmed the same with iostat.

This is very confusing
 
- ZFS is doing very good compression as reported by the ratio of on disk versus 
as reported size of the file (1.4TB vs 65GB)
- [b]Why on God's green earth am I observing such high I/O when indeed ZFS is 
compressing?[/b] I can't believe that the program is actually generating I/O at 
the rate of (150MB/S * compressratio).

Any thoughts?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

2007-01-08 Thread Anantha N. Srirama
Quick update, since my original post I've confirmed via DTrace (rwtop script in 
toolkit) that the application is not generating 150MB/S * compressratio of I/O. 
What then is causing this much I/O in our system?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Adding disk to a RAID-Z?

2007-01-08 Thread Tom Buskey
I want to setup a ZFS server with RAID-Z.  Right now I have 3 disks.  In 6 
months, I want to add a 4th drive and still have everything under RAID-Z 
without a backup/wipe/restore scenario.  Is this possible?

I've used NetApps in the past (1996 even!) and they do it.  I think they're 
using RAID4.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Puzzling ZFS behavior with COMPRESS option

2007-01-08 Thread Neil Perrin



Anantha N. Srirama wrote On 01/08/07 13:04,:

Our setup:

- E2900 (24 x 96); Solaris 10 Update 2 (aka 06/06)
- 2 2Gbps FC HBA
- EMC DMX storage
- 50 x 64GB LUNs configured in 1 ZFS pool
- Many filesystems created with COMPRESS enabled; specifically I've one that is 
768GB

I'm observing the following puzzling behavior:

- We are currently creating a large (1.4TB) and sparse dataset; most of the 
dataset contains repeating blanks (default/standard SAS dataset behavior.)
- ls -l reports the file size as 1.4+TB and du -sk reports the actual on disk 
usage at around 65GB.
- My I/O on the system is pegged at 150+MB/S as reported by zpool iostat and 
I've confirmed the same with iostat.

This is very confusing
 
- ZFS is doing very good compression as reported by the ratio of on disk versus as reported size of the file (1.4TB vs 65GB)

- [b]Why on God's green earth am I observing such high I/O when indeed ZFS is 
compressing?[/b] I can't believe that the program is actually generating I/O at 
the rate of (150MB/S * compressratio).

Any thoughts?




One possibility is that the data is written synchronously (uses O_DSYNC,
fsync, etc), and so the ZFS Intent Log (ZIL) will write that uncompressed
data to stable storage in case of a crash/power fail before the txg
is committed.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread Wade . Stuart





 I was noodling around with creating a backup script for my home
 system, and I ran into a problem that I'm having a little trouble
 diagnosing.  Has anyone seen anything like this or have any debug
 advice?

 I did a zfs create -r to set a snapshot on all of the members of a
 given pool.  Later, for reasons that are probably obscure, I wanted to
 rename that snapshot.  There's no zfs rename -r function, so I tried
 to write a crude one on my own:

do you mean zfs snapshot -r fsname@foo instead of the create?


 zfs list -rHo name -t filesystem pool |
 while read name; do
zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
 done


hmm,  just to verify sanity,  have can you show the output of:

zfs list -rHo name -t filesystem pool

and

 zfs list -rHo name -t filesystem pool |
 while read name; do
echo zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
 done

(note the echo inserted above)


 The results were disappointing.  The system was extremely busy for a
 moment and then went completely catatonic.  Most network traffic
 appeared to stop, though I _think_ network driver interrupts were
 still working.  The keyboard and mouse (traditional PS/2 types; not
 USB) went dead -- not even keyboard lights were working (nothing from
 Caps Lock).  The disk light stopped flashing and went dark.  The CPU
 temperature started to climb (as measured by an external sensor).  No
 messages were written to /var/adm/messages or dmesg on reboot.

 The system turned into an increasingly warm brick.  As all of my
 inputs to the system were gone, I really had no good way immediately
 available to debug the problem.  Thinking this was just a fluke or
 perhaps something induced by hardware, I shut everything down, cooled
 off, and tried again.  Three times.  The same thing happened each
 time.

 System details:

   - snv_55

   - Tyan 2885 motherboard with 4GB RAM (four 1GB modules) and one
 Opteron 246 (model 5 step 8).

   - AMI BIOS version 080010, dated 06/14/2005.  No tweaks applied,
 system is always on; no power management.

   - Silicon Image 3114 SATA controller configured for legacy (not
 RAID) mode.

   - Three SATA disks in the system, no IDE as they've gone to the
 great bit-bucket in the sky.  The SATA drives are one WDC
 WD740GD-32F (not part of this ZFS pool), and a pair of
 ST3250623NS.

   - The two Seagate drives are partitioned like this:

   0   rootwm   3 -   6555.00GB(653/0/0)
10490445
   1   swapwm 656 -   9162.00GB(261/0/0)
4192965
   2 backupwu   0 - 30397  232.86GB(30398/0/0)
488343870
   3   reservedwm 917 -   9177.84MB(1/0/0)
16065
   4 unassignedwu   00 (0/0/0)
0
   5 unassignedwu   00 (0/0/0)
0
   6 unassignedwu   00 (0/0/0)
0
   7   homewm 918 - 30397  225.83GB(29480/0/0)
473596200
   8   bootwu   0 - 07.84MB(1/0/0)
16065
   9 alternateswm   1 - 2   15.69MB(2/0/0)
32130

   - For both disks: slice 0 is for an SVM mirrored root, slice 1 has
 swap, slice 3 has the SVM metadata, and slice 7 is in the ZFS pool
 named pool as a mirror.  No, I'm not using whole-disk or EFI.

   - Zpool status:

   pool: pool
  state: ONLINE
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 poolONLINE   0 0 0
   mirrorONLINE   0 0 0
 c4d0s7  ONLINE   0 0 0
 c4d1s7  ONLINE   0 0 0

   - 'zfs list -rt filesystem pool | wc -l' says 37.

   - Iostat -E doesn't show any errors of any kind on the drives.

   - I read through CR 6421427, but that seems to be SPARC-only.

 Next step will probably be to set the 'snooping' flag and maybe hack
 the bge driver to do an abort_sequence_enter() call on a magic packet
 so that I can wrest control back.  Before I do something that drastic,
 does anyone else have ideas?

 --
 James Carlson, Solaris Networking  [EMAIL PROTECTED]
 Sun Microsystems / 1 Network Drive 71.232W   Vox +1 781 442 2084
 MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Bart Smaalders

[EMAIL PROTECTED] wrote:




I have been looking at zfs source trying to get up to speed on the
internals.  One thing that interests me about the fs is what appears to be
a low hanging fruit for block squishing CAS (Content Addressable Storage).
I think that in addition to lzjb compression, squishing blocks that contain
the same data would buy a lot of space for administrators working in many
common workflows.

I am writing to see if I can get some feedback from people that know the
code better than I -- are there any gotchas in my logic?

Assumptions:

SHA256 hash used (Fletcher2/4 have too many collisions,  SHA256 is 2^128 if
I remember correctly)
SHA256 hash is taken on the data portion of the block as it exists on disk.
the metadata structure is hashed separately.
In the current metadata structure, there is a reserved bit portion to be
used in the future.


Description of change:
Creates:
The filesystem goes through its normal process of writing a block, and
creating the checksum.
Before the step where the metadata tree is pushed, the checksum is checked
against a global checksum tree to see if there is any match.
If match exists, insert a metadata placeholder for the block, that
references the already existing block on disk, increment a number_of_links
pointer on the metadata blocks to keep track of the pointers pointing to
this block.
free up the new block that was written and check-summed to be used in the
future.
else if no match, update the checksum tree with the new checksum and
continue as normal.


Deletes:
normal process, except verifying that the number_of_links count is lowered
and if it is non zero then do not free the block.
clean up checksum tree as needed.

What this requires:
A new flag in metadata that can tag the block as a CAS block.
A checksum tree that allows easy fast lookup of checksum keys.
a counter in the metadata or hash tree that tracks links back to blocks.
Some additions to the userland apps to push the config/enable modes.

Does this seem feasible?  Are there any blocking points that I am missing
or unaware of?   I am just posting this for discussion,  it seems very
interesting to me.



Note that you'd actually have to verify that the blocks were the same;
you cannot count on the hash function.  If you didn't do this, anyone
discovering a collision could destroy the colliding blocks/files.
Val Henson wrote a paper on this topic; there's a copy here:

http://infohost.nmt.edu/~val/review/hash.pdf

- Bart

Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-08 Thread Jason J. W. Williams

We're not using the Enterprise release, but we are working with them.
It looks like MySQL is crashing due to lack of memory.

-J

On 1/8/07, Toby Thain [EMAIL PROTECTED] wrote:


On 8-Jan-07, at 11:54 AM, Jason J. W. Williams wrote:

 ...We're trying to recompile MySQL to give a
 stacktrace and core file to track down exactly why its
 crashing...hopefully it will illuminate if memory truly is the issue.

If you're using the Enterprise release, can't you get MySQL's
assistance with this?

--Toby



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread James Carlson
[EMAIL PROTECTED] writes:
  I was noodling around with creating a backup script for my home
  system, and I ran into a problem that I'm having a little trouble
  diagnosing.  Has anyone seen anything like this or have any debug
  advice?
 
  I did a zfs create -r to set a snapshot on all of the members of a
  given pool.  Later, for reasons that are probably obscure, I wanted to
  rename that snapshot.  There's no zfs rename -r function, so I tried
  to write a crude one on my own:
 
 do you mean zfs snapshot -r fsname@foo instead of the create?

Yes; sorry.  A bit of a typo there.

 hmm,  just to verify sanity,  have can you show the output of:
 
 zfs list -rHo name -t filesystem pool
 
 and
 
  zfs list -rHo name -t filesystem pool |
  while read name; do
 echo zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
  done
 
 (note the echo inserted above)

Sure, but it's not a shell problem.  I should have mentioned that when
I brought the system back up, *most* of the renames had actually taken
place, but not *all* of them.  I ended up with mostly [EMAIL PROTECTED], but
with a handful of stragglers near the end of the list [EMAIL PROTECTED]

The output looks a bit like this (not _all_ file systems shown, but
representative ones):

pool
pool/HTSData
pool/apache
pool/client
pool/csw
pool/home
pool/home/benjamin
pool/home/beth
pool/home/carlsonj
pool/home/ftp
pool/laptop
pool/local
pool/music
pool/photo
pool/sys
pool/sys/core
pool/sys/dhcp
pool/sys/mail
pool/sys/named

And then:

zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]

It's not a matter of the shell script not working; it's a matter of
something inside the kernel (perhaps not even ZFS but instead a driver
related to SATA?) experiencing vapor-lock.

Other heavy load on the system, though, doesn't cause this to happen.
This one operation does cause the lock-up.

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 1 Network Drive 71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Bill Sommerfeld
 Note that you'd actually have to verify that the blocks were the same;
 you cannot count on the hash function.  If you didn't do this, anyone
 discovering a collision could destroy the colliding blocks/files.

Given that nobody knows how to find sha256 collisions, you'd of course
need to test this code with a weaker hash algorithm.

(It would almost be worth it to have the code panic in the event that a
real sha256 collision was found)

- Bill






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Wade . Stuart




 
  Does this seem feasible?  Are there any blocking points that I am
missing
  or unaware of?   I am just posting this for discussion,  it seems very
  interesting to me.
 

 Note that you'd actually have to verify that the blocks were the same;
 you cannot count on the hash function.  If you didn't do this, anyone
 discovering a collision could destroy the colliding blocks/files.
 Val Henson wrote a paper on this topic; there's a copy here:

Sure,  that makes sense.  I do not see why that would be much of a problem
beyond if sha256 hash match, then do yet one more crypto hash of your
choice to verify they are indeed the same blocks (fool me once, shame on
me...),  the hash key should be able to be based on only the sha256 marker
then.  If we do find a natural collision,  then a special code path (and
email to nsa =) could be in order.




 http://infohost.nmt.edu/~val/review/hash.pdf

 - Bart

 Bart Smaalders Solaris Kernel Performance
 [EMAIL PROTECTED]  http://blogs.sun.com/barts

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Torrey McMahon

[EMAIL PROTECTED] wrote:



  

Does this seem feasible?  Are there any blocking points that I am
  

missing
  

or unaware of?   I am just posting this for discussion,  it seems very
interesting to me.

  

Note that you'd actually have to verify that the blocks were the same;
you cannot count on the hash function.  If you didn't do this, anyone
discovering a collision could destroy the colliding blocks/files.
Val Henson wrote a paper on this topic; there's a copy here:



Sure,  that makes sense.  I do not see why that would be much of a problem
beyond if sha256 hash match, then do yet one more crypto hash of your
choice to verify they are indeed the same blocks (fool me once, shame on
me...),  the hash key should be able to be based on only the sha256 marker
then.  If we do find a natural collision,  then a special code path (and
email to nsa =) could be in order.

  


Is Honeycomb doing anything in this space?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question: ZFS + Block level SHA256 ~= almost free CAS Squishing?

2007-01-08 Thread Wade . Stuart





Bill Sommerfeld [EMAIL PROTECTED] wrote on 01/08/2007 03:41:53 PM:

  Note that you'd actually have to verify that the blocks were the same;
  you cannot count on the hash function.  If you didn't do this, anyone
  discovering a collision could destroy the colliding blocks/files.

 Given that nobody knows how to find sha256 collisions, you'd of course
 need to test this code with a weaker hash algorithm.

 (It would almost be worth it to have the code panic in the event that a
 real sha256 collision was found)

- Bill


That reminds me,  I had a few more questions about this.

1, If a fs was started with a fletcher hash,  and switched later to sha256,
is there a way to resilver the hashes to sha256 that existed before the
set?

2, Also is there any way to get zdb to dump a list of blocks and their
associated hashes (zdb seems to be lightly documented and the source files
for it require a little more familiarity with zfs internals than I have
groked yet).



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread Wade . Stuart






James Carlson [EMAIL PROTECTED] wrote on 01/08/2007 03:26:14 PM:

 [EMAIL PROTECTED] writes:
   I was noodling around with creating a backup script for my home
   system, and I ran into a problem that I'm having a little trouble
   diagnosing.  Has anyone seen anything like this or have any debug
   advice?
  
   I did a zfs create -r to set a snapshot on all of the members of a
   given pool.  Later, for reasons that are probably obscure, I wanted
to
   rename that snapshot.  There's no zfs rename -r function, so I
tried
   to write a crude one on my own:
 
  do you mean zfs snapshot -r fsname@foo instead of the create?

 Yes; sorry.  A bit of a typo there.

  hmm,  just to verify sanity,  have can you show the output of:
 
  zfs list -rHo name -t filesystem pool
 
  and
 
   zfs list -rHo name -t filesystem pool |
   while read name; do
  echo zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
   done
 
  (note the echo inserted above)

 Sure, but it's not a shell problem.  I should have mentioned that when
 I brought the system back up, *most* of the renames had actually taken
 place, but not *all* of them.  I ended up with mostly [EMAIL PROTECTED], but
 with a handful of stragglers near the end of the list [EMAIL PROTECTED]

 The output looks a bit like this (not _all_ file systems shown, but
 representative ones):

 pool
 pool/HTSData
 pool/apache
 pool/client
 pool/csw
 pool/home
 pool/home/benjamin
 pool/home/beth
 pool/home/carlsonj
 pool/home/ftp
 pool/laptop
 pool/local
 pool/music
 pool/photo
 pool/sys
 pool/sys/core
 pool/sys/dhcp
 pool/sys/mail
 pool/sys/named

 And then:

 zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
 zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
 zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
 zfs rename pool/home/[EMAIL PROTECTED] pool/home/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/[EMAIL PROTECTED] pool/[EMAIL PROTECTED]
 zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
 zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
 zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
 zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]

 It's not a matter of the shell script not working; it's a matter of
 something inside the kernel (perhaps not even ZFS but instead a driver
 related to SATA?) experiencing vapor-lock.

 Other heavy load on the system, though, doesn't cause this to happen.
 This one operation does cause the lock-up.


Understood. Two things,  does the rename loop hit any of the fs in
question, and does putting a  sort -r |  before the while make any
difference?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread Wade . Stuart





[EMAIL PROTECTED] wrote on 01/08/2007 04:06:46 PM:







 James Carlson [EMAIL PROTECTED] wrote on 01/08/2007 03:26:14 PM:

  [EMAIL PROTECTED] writes:
I was noodling around with creating a backup script for my home
system, and I ran into a problem that I'm having a little trouble
diagnosing.  Has anyone seen anything like this or have any debug
advice?
   
I did a zfs create -r to set a snapshot on all of the members of
a
given pool.  Later, for reasons that are probably obscure, I wanted
 to
rename that snapshot.  There's no zfs rename -r function, so I
 tried
to write a crude one on my own:
  
   do you mean zfs snapshot -r fsname@foo instead of the create?
 
  Yes; sorry.  A bit of a typo there.
 
   hmm,  just to verify sanity,  have can you show the output of:
  
   zfs list -rHo name -t filesystem pool
  
   and
  
zfs list -rHo name -t filesystem pool |
while read name; do
   echo zfs rename [EMAIL PROTECTED] [EMAIL PROTECTED]
done
  
   (note the echo inserted above)
 
  Sure, but it's not a shell problem.  I should have mentioned that when
  I brought the system back up, *most* of the renames had actually taken
  place, but not *all* of them.  I ended up with mostly [EMAIL PROTECTED], but
  with a handful of stragglers near the end of the list [EMAIL PROTECTED]
 

Sorry missed this,  ignore my first question

  The output looks a bit like this (not _all_ file systems shown, but
  representative ones):
 
  pool
  pool/HTSData
...

  zfs rename pool/sys/[EMAIL PROTECTED] pool/sys/[EMAIL PROTECTED]
 
  It's not a matter of the shell script not working; it's a matter of
  something inside the kernel (perhaps not even ZFS but instead a driver
  related to SATA?) experiencing vapor-lock.
 
  Other heavy load on the system, though, doesn't cause this to happen.
  This one operation does cause the lock-up.
 

 Understood. Two things,  does the rename loop hit any of the fs in
 question, and does putting a  sort -r |  before the while make any
 difference?


The reason I ask is because I had a similar issue running through batch
renames (from epoch - human) of my snapshots.  It seemed to cause a system
lock unless I did the batch depth first (sort -r).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard-hang on snapshot rename

2007-01-08 Thread James Carlson
[EMAIL PROTECTED] writes:
  Other heavy load on the system, though, doesn't cause this to happen.
  This one operation does cause the lock-up.
 
 
 Understood. Two things,  does the rename loop hit any of the fs in
 question,

No; the loop you saw is essentially what I ran.  (Other than that it
was level0new and level0 instead of foo and bar.)

Thinking it was some locking issue, I did try saving off the list in a
file (on tmpfs), and then running it through the while loop -- that
produced the same result.

 and does putting a  sort -r |  before the while make any
 difference?

I'll give it a try tonight and see.  It's a production system, so I
have to wait until all of the users are asleep or otherwise occupied
by Two And A Half Men reruns to try something hazardous like that.

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 1 Network Drive 71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding disk to a RAID-Z?

2007-01-08 Thread Peter Schuller
 I want to setup a ZFS server with RAID-Z.  Right now I have 3 disks.  In 6
 months, I want to add a 4th drive and still have everything under RAID-Z
 without a backup/wipe/restore scenario.  Is this possible?

You can add additional storage to the same pool effortlessly, such that the 
pool will be striped across two raidz:s. You cannot (AFAIK) expand the raidz 
itself. End result is 9 disks, with 7 disks worth of effective storage 
capacity. The ZFS administratiion guide contains examples of doing exactly 
this, except I believe the examples use mirrors.

ZFS administration guide:

http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Distributed FS

2007-01-08 Thread Ed Gould

Ivan wrote:

Hi,

Is ZFS comparable to PVFS2?  Could it also be used as an distributed filesystem 
at the moment or are there any plans for this in the future?


I don't know anything at all about PVFS2, so I can't comment on that point.

As far as ZFS being used as a distributed file system, it cannot be used 
as such today, but it is something we sould like to develop.  Do you 
have a specific use case in mind for a distributed file system?

--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Solaris Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect, PSARC Chair
tel;work:+1.650.786.4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Puzzling ZFS behavior with COMPRESS option

2007-01-08 Thread Bart Smaalders

Anantha N. Srirama wrote:

Quick update, since my original post I've confirmed via DTrace (rwtop script in 
toolkit) that the application is not generating 150MB/S * compressratio of I/O. 
What then is causing this much I/O in our system?
 
 
This message posted from opensolaris.org



Are you doing random IO?  Appending or overwriting?

- Bart

--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: What SATA controllers are people using for ZFS?

2007-01-08 Thread Naveen Nalam
For future reference for someone looking to build a ZFS storage server, the 
server config I am now using is Solaris 10 U3, has two Supermicro AOC-SAT2-MV8 
controllers, 12 Seagate 750GB drives, 2 Seagate 160GB drives, and an Asus P5M2 
motherboard (don't think these boards are yet for general sale, my vendor got 
them from Asus). The P5M2 has one PCIe x16 slot, two PCI-X 133/100/64bit slots, 
and PCI 33/32bit slot.

The vendor's IT staff claimed that even though Solaris loaded on the Supermicro 
PDSME+ motherboard, there were frequent keyboard detection issues. Asus claimed 
they had tested the P5M2 with Solaris.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Distributed FS

2007-01-08 Thread Ivan
Hi Ed,

pNFS (Parallel NFS) could benefit by using a 'distributed filesystem version' 
of ZFS.  By using pNFS files could be striped along different NFS servers.  
Lisa Week ([EMAIL PROTECTED]) told me that they would like to use ZFS in future 
pNFS Servers in Solaris.

Thanks and best regards,
Ivan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss