Re: [zfs-discuss] Supporting recordsizes larger than 128K?

2007-09-05 Thread Roch - PAE
Matty writes:
  Are there any plans to support record sizes larger than 128k? We use
  ZFS file systems for disk staging on our backup servers (compression
  is a nice feature here), and we typically configure the disk staging
  process to read and write large blocks (typically 1MB or so). This
  reduces the number of I/Os that take place to our storage arrays, and
  our testing has shown that we can push considerably more I/O with 1MB+
  block sizes.
  

So  other  FS and raw  devices  clearly benefit  from larger
blocksize but the way ZFS schedule such I/Os, I don't expect
anymore more throughput from bigger blocks.

Maybe you're hitting something else that limits throughput ?

-r


  Thanks for any insight,
  - Ryan
  -- 
  UNIX Administrator
  http://prefetch.net
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New zfs pr0n server :)))

2007-09-05 Thread Diego Righi
The case is a Sharkoon Rebel9 (Economy edition, has no integrated fans), I 
bought it from an italian online store, and I think it's commercialized in 
Germany, it has 9 51/4 frontal slots
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] inherit vs clone and property values.

2007-09-05 Thread Darren J Moffat
Eric Schrock wrote:
 Yes, this would be useful.  See:
 
 6364688 method to preserve properties when making a clone

Thanks for that pointer.  I'd say it should be the default - but then 
that was basically the topic of this thread :-)

 The infrastructure is all there (zfs_clone() takes an nvlist of
 properties), it just hasn't been implemented yet.

Yep I spotted that.

 Note that 'volblocksize' is special because it is a create-time property
 and cannot be changed once it is set.  It doesn't make sense to
 'inherit' volblocksize from the new parent, as the resulting volume
 would be unusable.  I believe that 'volsize', while changeable, is also
 preserved with a snapshot for a similar purpose.

It is exactly because it is create-time that I used that for my comparison.

 Can the encryption properties be changed once a filesystem is created?

There are currently three separate properties for a dataset that deals 
with encryption and none of them can be changed by the end user after 
creation and they must all be the same in the clone.

For context they are:

encryption:
should be create time only since changing this changes policy and could 
result in a mix of clear text and cipher text.  Even changes between 
on values aren't allowed since it is too hard to rationalise about the 
  overall security of the dataset.

keytype:

This is currently create time only. In theory some future project could 
allow it to be changed (ie the same clear text key is available via a 
different key management system).  For the first delivery it will be 
create time only.  Need not be identical in the clone but see below, 
realistically it should always be identical in the clone [I haven't 
found a real world case yet where it would be useful to have different 
key management but the same key value in the clone].

wrappedkey:

This is a hidden property that is an implementation artefact rather than 
a property exposed via zfs(1) [we may not even allow it to be exposed 
over the ioctl but that depends on the future of zfs send/recv with 
properties and is a much more complex issue ].  This is the encrypted 
per data set key.  The clear text of this must be identical in the clone 
and currently I'm planning for the actual property value to be identical 
too.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New zfs pr0n server :)))

2007-09-05 Thread Jason
Thanks, did it come with the hardware to mount HDD's in 5.25 slots?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] find on ZFS much slower than on xfs

2007-09-05 Thread Joerg Moellenkamp
Hello,

in a benchmark a find to a filename (find -name foobar ) , ZFS is approx 
7 times slower than an XFS filesystem (14 minutes ZFS,2 Minutes XFS). 
The filesystem consists out of a huge amount of files.

I assume, that ZFS has no comparable function to the directory indexes 
like XFS or the dir_index option like ext3. Is this correct ? And what 
can i do to speed up such a find operation. I know it?s pathological 
benchmark, but is there a solution for this performance gap ?

Regards
Joerg

-- 
Joerg Moellenkamp   Tel: (+49 40) 25 15 23 - 460
IT-ArchitectFax: (+49 40) 25 15 23 - 425
Sun Microsystems GmbH   Mobile: (+49 172) 83 18 433
Nagelsweg 55mailto:[EMAIL PROTECTED]
D-20097 Hamburg Website: http://www.sun.de
Blog: http://www.c0t0d0s0.org

Sitz der Gesellschaft   Sun Microsystems GmbH
Sonnenallee 1
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen - HRB 161028
Geschaeftsfuehrer   Wolfgang Engels
Dr. Roland Boemer
Vors. des AufsichtsratesMartin Haering


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] find on ZFS much slower than on xfs

2007-09-05 Thread Casper . Dik

Hello,

in a benchmark a find to a filename (find -name foobar ) , ZFS is approx 
7 times slower than an XFS filesystem (14 minutes ZFS,2 Minutes XFS). 
The filesystem consists out of a huge amount of files.

I assume, that ZFS has no comparable function to the directory indexes 
like XFS or the dir_index option like ext3. Is this correct ? And what 
can i do to speed up such a find operation. I know it?s pathological 
benchmark, but is there a solution for this performance gap ?


Are you using the same find program?  (GNU find contains a certain
hack which allows it to go fast in certain circumstances.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] find on ZFS much slower than on xfs

2007-09-05 Thread Joerg Moellenkamp
Hello,

in a different benchmark run on the same system, the gfind took 15 
minutes whereas the standarf find took 18 minutes. With find and 
noatime=off the benchmark took 14 minutes. But even this is slow 
compared to 2-3 minutes of the xfs system.

Regards
 Joerg




 [EMAIL PROTECTED] schrieb:
 Hello,

 in a benchmark a find to a filename (find -name foobar ) , ZFS is approx 
 7 times slower than an XFS filesystem (14 minutes ZFS,2 Minutes XFS). 
 The filesystem consists out of a huge amount of files.

 I assume, that ZFS has no comparable function to the directory indexes 
 like XFS or the dir_index option like ext3. Is this correct ? And what 
 can i do to speed up such a find operation. I know it?s pathological 
 benchmark, but is there a solution for this performance gap ?
 


 Are you using the same find program?  (GNU find contains a certain
 hack which allows it to go fast in certain circumstances.

 Casper

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] find on ZFS much slower than on xfs

2007-09-05 Thread Michael Schuster
Joerg Moellenkamp wrote:
 Hello,
 
 in a different benchmark run on the same system, the gfind took 15 
 minutes whereas the standarf find took 18 minutes. With find and 
 noatime=off the benchmark took 14 minutes. But even this is slow 
 compared to 2-3 minutes of the xfs system.

just asking the obvious:
- is this the same HW?
- are zfs/zpool and xfs set up similarly?

Michael
-- 
Michael Schuster
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] find on ZFS much slower than on xfs

2007-09-05 Thread Joerg Schilling
Joerg Moellenkamp [EMAIL PROTECTED] wrote:

 Hello,

 in a different benchmark run on the same system, the gfind took 15 
 minutes whereas the standarf find took 18 minutes. With find and 
 noatime=off the benchmark took 14 minutes. But even this is slow 
 compared to 2-3 minutes of the xfs system.

GNU find is buggy by default as it by default asumes that the linkcount of a 
directory is 2+number-of-subdirs which is not granted by POSIX.

GNU find will definitively fail on pcfs and on hsfs that has not been 
created by a recent mkisofs version (still missing on Solaris).

For some UNIX filesystems you may asume the behavior expected by GNU tar.
For NFS, you cannot make any asumption..

GNU tar does not test and this is buggy.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-05 Thread Paul Kraus
On 9/4/07, Gino [EMAIL PROTECTED] wrote:

 yesterday we had a drive failure on a fc-al jbod with 14 drives.
 Suddenly the zpool using that jbod stopped to respond to I/O requests
 and we get tons of the following messages on /var/adm/messages:

snip

 cfgadm -al or devfsadm -C didn't solve the problem.
 After a reboot  ZFS recognized the drive as failed and all worked well.

 Do we need to restart Solaris after a drive failure??

I would hope not but ... prior to putting some ZFS volumes
into production we did some failure testing. The hardware I was
testing with was a couple SF-V245 with 4 x 72 GB disks each. Two disks
were setup with SVM/UFS as mirrored OS, the other two were handed to
ZFS as a mirrored zpool. I did some large file copies to generate I/O.
While a large copy was going on (lots of disk I/O) I pulled one of the
drives.

If the I/O was to the zpool the system would hang (just like
it was hung waiting on an I/O operation). I let it sit this way for
over an hour with no recovery. After rebooting it found the existing
half of the ZFS mirror just fine. Just to be clear, once I pulled the
disk, over about a 5 minute period *all* activity on the box hung.
Even a shell just running prstat.

If the I/O was to one of the SVM/UFS disks there would be a
60-90 second pause in all activity (just like the ZFS case), but then
operation would resume. This is what I am used to seeing for a disk
failure.

In the ZFS case I could replace the disk and the zpool would
resilver automatically. I could also take the removed disk and put it
into the second system and have it recognize the zpool (and that it
was missing half of a mirror) and the data was all there.

In no case did I see any data loss or corruption. I had
attributed the system hanging to an interaction between the SAS and
ZFS layers, but the previous post makes me question that assumption.

As another data point, I have an old Intel box at home I am
running x86 on with ZFS. I have a pair of 120 GB PATA disks. OS is on
SVM/UFS mirrored partitions and /export home is on a pair of
partitions in a zpool (mirror). I had a bad power connector and
sometime after booting lost one of the drives. The server kept running
fine. Once I got the drive powered back up (while the server was shut
down), the SVM mirrors resync'd and the zpool resilvered. The zpool
finished substantially before the SVM.

In all cases the OS was Solaris 10 U 3 (11/06) with no
additional patches.

-- 
Paul Kraus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New zfs pr0n server :)))

2007-09-05 Thread Christopher Gibbs
I'm now using the CM Stacker 810 for my file server and I love it.

http://www.newegg.com/Product/Product.aspx?Item=N82E1689093

It comes with one 4-in-3 drive cage and has room for 2 more (3 if you
remove the front I/O panel). The drive cages are excellent - mounted
with rubber washers to absorb vibration and comes with a 120mm fan
mounted to the front.

On 9/5/07, Jason [EMAIL PROTECTED] wrote:
 Thanks, did it come with the hardware to mount HDD's in 5.25 slots?


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



-- 
Christopher Gibbs
Email / LDAP Administrator
Web Integration  Programming
Abilene Christian University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Consequences of adding a root vdev later?

2007-09-05 Thread Solaris
Is it possible to force ZFS to nicely re-organize data inside a zpool
after a new root level vdev has been introduced?

e.g. Take a pool with 1 vdev consisting of a 2 disk mirror.  Populate some
arbitrary files using about 50% of the capacity.  Then add another 2
mirrored disks to the pool.

It seems like (judging from zpool iostat) that going forward new data will
be striped as expected, but existing data is not striped.  This presents a
question of what happens with the original mirror set runs out of space?
Does the striping stop automagically?  How does this impact resilvering and
recovering from failures?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs via sata controller

2007-09-05 Thread Peter Bridge
Hi All, 
I'm a total newbie to solaris so apologies if the answer is obvious, but google 
is not my friend today.

I'm installing opensolaris on an intel based machine with an IDE drive for 
booting and 2 SATA disks that I plan to use for a ZFS based NAS.  I manged to 
get opensolaris installed, but can't get zfs to see my two disks.

2x SATA ii 150GB disks
1x Promise sata ii 150 TX4
motherboard 855 GME MGF
Pentium M760

Any help appreciated!

Cheers,
Peter
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs via sata controller

2007-09-05 Thread Will Murnane
On 9/5/07, Peter Bridge [EMAIL PROTECTED] wrote:
 1x Promise sata ii 150 TX4
That controller doesn't work with Solaris.  Marvell 88sx6081s (like
the Supermicro AOC-SAT2-MV8), LSI Logic controllers, and Sil3124s
(from a variety of manufacturers) are.  AHCI controllers like the
Intel ICH series should work, as well (or at least, support for them
is in the works).   Try the Device Detection Tool:
http://www.sun.com/bigadmin/hcl/hcts/device_detect.html or the list of
supported disk controllers:
http://www.sun.com/bigadmin/hcl/data/sol/components/views/disk_controller_all_results.page1.html
for more information.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Rob Windsor
http://news.com.com/NetApp+files+patent+suit+against+Sun/2100-1014_3-6206194.html

I'm curious how many of those patent filings cover technologies that 
they carried over from Auspex.

While it is legal for them to do so, it is a bit shady to inherit 
technology (two paths; employees departing Auspex and the Auspex 
bankruptcy asset buyout), file patents against that technology, and then 
open suits against other companies based on (patents covering) that 
technology.

(No, I'm not defending Sun in it's apparent patent-growling, either, it 
all sucks IMO.)

Rob++
-- 
Internet: [EMAIL PROTECTED] __o
Life: [EMAIL PROTECTED]_`\,_
(_)/ (_)
They couldn't hit an elephant at this distance.
   -- Major General John Sedgwick
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] patent fight is on its way..

2007-09-05 Thread Selim Daoud
http://www.netapp.com/go/ipsuit/spider-complaint.pdf
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Jim Mauro
About 2 years ago I was able to get a little closer to the patent 
litigation process,
by way of giving a deposition in litigation that was filed against Sun 
and Apple
(and has been settled).

Apparently, there's an entire sub-economy built on patent litigation 
among the
technology players. Suits, counter-suits, counter-counter-suits, etc, 
are just
part of every day business. And the money that gets poured down the drain!

Here's an example. During my deposition, the lawyer questioning me opened
a large box, and removed 3 sets of a 500+ slide deck created by myself and
Richard McDougall for seminars and tutorials on Solaris. Each set was
color print on heavy, glossy paper. That represented color printing of about
1600 pages total. All so the attorney could question me about 2 of the 
slides.

I almost fell off my chair

/jim



Rob Windsor wrote:
 http://news.com.com/NetApp+files+patent+suit+against+Sun/2100-1014_3-6206194.html

 I'm curious how many of those patent filings cover technologies that 
 they carried over from Auspex.

 While it is legal for them to do so, it is a bit shady to inherit 
 technology (two paths; employees departing Auspex and the Auspex 
 bankruptcy asset buyout), file patents against that technology, and then 
 open suits against other companies based on (patents covering) that 
 technology.

 (No, I'm not defending Sun in it's apparent patent-growling, either, it 
 all sucks IMO.)

 Rob++
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Bill Moore
On Wed, Sep 05, 2007 at 03:43:38PM -0500, Rob Windsor wrote:
 (No, I'm not defending Sun in it's apparent patent-growling, either, it 
 all sucks IMO.)

In contrast to the positioning by NetApp, Sun didn't start the patent
fight.  It was started by StorageTek, well prior to Sun's acquisition of
them.  We inherited the in-flight fight, and I don't think we wound up
doing much with it.  I agree Sun should have just formally dropped the
suit, but nobody asked me.  :)

And before anyone asks, I don't know any more about all of this than
what's been reported online.


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-05 Thread Richard Elling
Paul Kraus wrote:
 On 9/4/07, Gino [EMAIL PROTECTED] wrote:
 
 yesterday we had a drive failure on a fc-al jbod with 14 drives.
 Suddenly the zpool using that jbod stopped to respond to I/O requests
 and we get tons of the following messages on /var/adm/messages:
 
 snip
 
 cfgadm -al or devfsadm -C didn't solve the problem.
 After a reboot  ZFS recognized the drive as failed and all worked well.

 Do we need to restart Solaris after a drive failure??

It depends...

 I would hope not but ... prior to putting some ZFS volumes
 into production we did some failure testing. The hardware I was
 testing with was a couple SF-V245 with 4 x 72 GB disks each. Two disks
 were setup with SVM/UFS as mirrored OS, the other two were handed to
 ZFS as a mirrored zpool. I did some large file copies to generate I/O.
 While a large copy was going on (lots of disk I/O) I pulled one of the
 drives.

... on which version of Solaris you are running.  ZFS FMA phase 2 was
integrated into SXCE build 68.  Prior to that release, ZFS had a limited
view of the (many) disk failure modes -- it would say a disk was failed
if it could not be opened.  In phase 2, the ZFS diagnosis engine was
enhanced to look for per-vdev soft error rate discriminator (SERD) engines.

More details can be found in the ARC case materials:
http://www.opensolaris.org/os/community/arc/caselog/2007/283/materials/portfolio-txt/

In SXCE build 72 we gain a new FMA I/O retire agent.  This is more general
purpose and allows a process to set a contract against a device in use.
http://www.opensolaris.org/os/community/on/flag-days/pages/2007080901/
http://www.opensolaris.org/os/community/arc/caselog/2007/290/

 If the I/O was to the zpool the system would hang (just like
 it was hung waiting on an I/O operation). I let it sit this way for
 over an hour with no recovery. After rebooting it found the existing
 half of the ZFS mirror just fine. Just to be clear, once I pulled the
 disk, over about a 5 minute period *all* activity on the box hung.
 Even a shell just running prstat.

It may depend on what shell you are using.  Some shells, such as ksh
write to the $HISTFILE before exec'ing the command.  If your $HISTFILE
was located in an affected file system, then you would appear hung.

 If the I/O was to one of the SVM/UFS disks there would be a
 60-90 second pause in all activity (just like the ZFS case), but then
 operation would resume. This is what I am used to seeing for a disk
 failure.

Default retries to most disks are 60 seconds (last time I checked).
There are several layers involved here, so you can expect something to
happen on 60 second intervals, even if it is just another retry.

 In the ZFS case I could replace the disk and the zpool would
 resilver automatically. I could also take the removed disk and put it
 into the second system and have it recognize the zpool (and that it
 was missing half of a mirror) and the data was all there.
 
 In no case did I see any data loss or corruption. I had
 attributed the system hanging to an interaction between the SAS and
 ZFS layers, but the previous post makes me question that assumption.
 
 As another data point, I have an old Intel box at home I am
 running x86 on with ZFS. I have a pair of 120 GB PATA disks. OS is on
 SVM/UFS mirrored partitions and /export home is on a pair of
 partitions in a zpool (mirror). I had a bad power connector and
 sometime after booting lost one of the drives. The server kept running
 fine. Once I got the drive powered back up (while the server was shut
 down), the SVM mirrors resync'd and the zpool resilvered. The zpool
 finished substantially before the SVM.
 
 In all cases the OS was Solaris 10 U 3 (11/06) with no
 additional patches.

The behaviour you describe is what I would expect for that release of
Solaris + ZFS.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consequences of adding a root vdev later?

2007-09-05 Thread Richard Elling
Solaris wrote:
 Is it possible to force ZFS to nicely re-organize data inside a zpool 
 after a new root level vdev has been introduced?

Currently, ZFS will not reorganize the existing data for such cases.
You can force this to occur by copying the data and removing the old,
but that seems like a lot of extra work for most cases.

 e.g. Take a pool with 1 vdev consisting of a 2 disk mirror.  Populate 
 some arbitrary files using about 50% of the capacity.  Then add another 
 2 mirrored disks to the pool.
 
 It seems like (judging from zpool iostat) that going forward new data 
 will be striped as expected, but existing data is not striped.  This 
 presents a question of what happens with the original mirror set runs 
 out of space?  Does the striping stop automagically?  How does this 
 impact resilvering and recovering from failures?

Yes.

AFAIK, nobody has characterized resilvering, though this is about the 4th
time this week someone has brought the topic up.  Has anyone done work here
that we don't know about?  If so, please speak up :-)
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Nicolas Williams
On Wed, Sep 05, 2007 at 03:43:38PM -0500, Rob Windsor wrote:
 http://news.com.com/NetApp+files+patent+suit+against+Sun/2100-1014_3-6206194.html
 
 I'm curious how many of those patent filings cover technologies that 
 they carried over from Auspex.
 
 While it is legal for them to do so, it is a bit shady to inherit 
 technology (two paths; employees departing Auspex and the Auspex 
 bankruptcy asset buyout), file patents against that technology, and then 
 open suits against other companies based on (patents covering) that 
 technology.
 
 (No, I'm not defending Sun in it's apparent patent-growling, either, it 
 all sucks IMO.)

DISCLAIMER:  I've not read any of those patents, nor do I intend to, nor
 did I have anything to do with the design or implementation
 of ZFS.  Also, IANAL.

To me ZFS is very, very similar to 4.4BSD's Log Structured Filesystem.
Both are have strong similarities to transactional databases.

My token effort to blog about ZFS when it came out was, in fact, a
comparison to 4.4BSD LFS.

I don't know about any patents in this area nor about their timelines,
but I imagine that there's a *lot* of prior art in the 4.4.BSD LFS
history and in the transactional database literature going back several
decades.  Art on 4.4BSD LFS first appeared no later than June 1990, and
perhaps much earlier.  Transactional DB literature goes back at least to
the early 80s.  I.e., my uneducated guess is that there's enough prior
art to blow most any patent claims about ZFS out of the water.  (You
might take this to mean that ZFS is not all that original.  I think that
in a way that is quite so, but there's plenty of originality in ZFS:
RAID-Z [which depends on checksum trees], awesomely simple and
user-friendly CLIs, etc...)

Another disclaimer: I've no idea whether ZFS's designers had 4.4BSD's
LFS in mind or knew about it when they came up with
ZFS.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Wed, Sep 05, 2007 at 03:43:38PM -0500, Rob Windsor wrote:
  http://news.com.com/NetApp+files+patent+suit+against+Sun/2100-1014_3-6206194.html
  
  I'm curious how many of those patent filings cover technologies that 
  they carried over from Auspex.
  
  While it is legal for them to do so, it is a bit shady to inherit 
  technology (two paths; employees departing Auspex and the Auspex 
  bankruptcy asset buyout), file patents against that technology, and then 
  open suits against other companies based on (patents covering) that 
  technology.
  
  (No, I'm not defending Sun in it's apparent patent-growling, either, it 
  all sucks IMO.)

 DISCLAIMER:  I've not read any of those patents, nor do I intend to, nor
did I have anything to do with the design or implementation
of ZFS.  Also, IANAL.

 To me ZFS is very, very similar to 4.4BSD's Log Structured Filesystem.
 Both are have strong similarities to transactional databases.


There is a difference between a log structured filesystem and a copy on write 
based filesystem.

As I wrote before, my wofs (designed and implemented 1989-1990 for SunOS 4.0, 
published May 23th 1991) is copy on write based, does not need fsck and always 
offers a stable view on the media because it is COW.


http://cdrecord.berlios.de/new/private/wofs.ps.gz

If you believe this is sufficient to cancel the netapps patent because of 
prior art, feel free to contact me.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread mike
On 9/5/07, Joerg Schilling [EMAIL PROTECTED] wrote:
 As I wrote before, my wofs (designed and implemented 1989-1990 for SunOS 4.0,
 published May 23th 1991) is copy on write based, does not need fsck and always
 offers a stable view on the media because it is COW.

Side question:

If COW is such an old concept, why haven't there been many filesystems
that have become popular that use it? ZFS, BTRFS (I think) and maybe
WAFL? At least that I know of. It seems like an excellent guarantee of
disk commitment, yet we're all still fussing with journalled
filesystems, filesystems that fragment, buffer lags (or whatever you
might call it) etc.

Just stirring the pot, seems like a reasonable question (perhaps one
to take somewhere else or start a new thread...)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Joerg Schilling
mike [EMAIL PROTECTED] wrote:

 On 9/5/07, Joerg Schilling [EMAIL PROTECTED] wrote:
  As I wrote before, my wofs (designed and implemented 1989-1990 for SunOS 
  4.0,
  published May 23th 1991) is copy on write based, does not need fsck and 
  always
  offers a stable view on the media because it is COW.

 Side question:

 If COW is such an old concept, why haven't there been many filesystems
 that have become popular that use it? ZFS, BTRFS (I think) and maybe
 WAFL? At least that I know of. It seems like an excellent guarantee of
 disk commitment, yet we're all still fussing with journalled
 filesystems, filesystems that fragment, buffer lags (or whatever you
 might call it) etc.

Maybe people did not see that wofs uses two different concepts to allow it to
be optimal for WORM media.

The best documented one is the inverted meta data tree that allows wofs to write
only one new generation node for one modified file while ZFS needs to also 
write new
nodes for all directories above the file including the root directory in the fs.

The other one is the fact that COW is the only way to implement a FS on a WORM
media. 

COW allows wofs to live without fsck and always grants a conststent fs view on 
the 
medium.

The inverted tree allows to write few data for modified files and to auto move 
the orphaned files to /lost+found during the mount process in the kenel.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread James C. McPherson
mike wrote:
 On 9/5/07, Joerg Schilling [EMAIL PROTECTED] wrote:
 As I wrote before, my wofs (designed and implemented 1989-1990 for SunOS 4.0,
 published May 23th 1991) is copy on write based, does not need fsck and 
 always
 offers a stable view on the media because it is COW.
 
 Side question:
 
 If COW is such an old concept, why haven't there been many filesystems
 that have become popular that use it? ZFS, BTRFS (I think) and maybe
 WAFL? At least that I know of. It seems like an excellent guarantee of
 disk commitment, yet we're all still fussing with journalled
 filesystems, filesystems that fragment, buffer lags (or whatever you
 might call it) etc.
 
 Just stirring the pot, seems like a reasonable question (perhaps one
 to take somewhere else or start a new thread...)


I think it was due to cpu cycles and memory not being quite
as cheap then as they are now.

Oh, and that it's sufficiently different from existing ideas
on how to write filesystems that there wasn't really any
incentive to actually do it.


$X ain't broke (sufficiently) so let's not rock the boat



James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
   http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS/WAFL lawsuit

2007-09-05 Thread David Magda
Hello,

Not sure if anyone at Sun can comment on this, but I thought it might  
be of interest to the list:

 This morning, NetApp filed an IP (intellectual property) lawsuit  
 against Sun. It has two parts. The first is a “declaratory  
 judgment”, asking the court to decide whether we infringe a set of  
 patents that Sun claims we do. The second says that Sun infringes  
 several of our patents with its ZFS technology.

http://blogs.netapp.com/dave/2007/09/netapp-sues-sun.html

He goes on to explain some of the logic behind NetApp's reaction.

Regards,
David
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-05 Thread Joerg Schilling
James C. McPherson [EMAIL PROTECTED] wrote:

  If COW is such an old concept, why haven't there been many filesystems
  that have become popular that use it? ZFS, BTRFS (I think) and maybe
  WAFL? At least that I know of. It seems like an excellent guarantee of
  disk commitment, yet we're all still fussing with journalled
  filesystems, filesystems that fragment, buffer lags (or whatever you
  might call it) etc.
  
  Just stirring the pot, seems like a reasonable question (perhaps one
  to take somewhere else or start a new thread...)


 I think it was due to cpu cycles and memory not being quite
 as cheap then as they are now.

CPU cycles have not been a problem. Memory was a problem and for this reason,
I did implement virtual kernel memory for my wofs implementation.


 Oh, and that it's sufficiently different from existing ideas
 on how to write filesystems that there wasn't really any
 incentive to actually do it.

It has been implemented for SunOS-4.0

If you do not believe it, just ask Carsten Bormann ([EMAIL PROTECTED])
who did discuss the ideas with me and who did supervise my master thesis wofs.

As a side note: I did hear around 1996 that Luftansa used WORM media to archive
important passenger data. They did not have something like wofs and did write 
tar 
archives only a few times per day to avoid that thousands of new small files 
caused to rewrite all directroy inode data upwards to the root directory. This 
verifies that other COW implementations exist before the netapps patent was 
files.



Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cascading metadata modifications

2007-09-05 Thread Matthew Ahrens
Joerg Schilling wrote:
 The best documented one is the inverted meta data tree that allows wofs to 
 write
 only one new generation node for one modified file while ZFS needs to also 
 write new
 nodes for all directories above the file including the root directory in the 
 fs.

I believe you are thinking of indirect blocks, which are unrelated to the 
directory tree.  In ZFS and most other filesystems, ancestor directories need 
not be modified when a file in a directory is modified.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] DMU as general purpose transaction engine?

2007-09-05 Thread Matthew Ahrens
Atul Vidwansa wrote:
 ZFS Experts,
 
 Is it possible to use DMU as general purpose transaction engine? More
 specifically, in following order:
 
 1. Create transaction:
 tx = dmu_tx_create(os);
 error = dmu_tx_assign(tx, TXG_WAIT)
 
 2. Decide what to modify(say create new object):
 dmu_tx_hold_bonus(tx, DMU_NEW_OBJECT);
 dmu_tx_hold_bonus(tx, dzp-z_id);
 dmu_tx_hold_zap(tx, dzp-z_id, TRUE, name);
 |
 |
 
 3. Commit transaction:
 dmu_tx_commit(tx);
 
 The reason I am asking for this particular order because I may not
 know the intent of transaction till late in the process.
 If it is not possible, can I at least declare that the transaction is
 going to change N objects (without specification of each object) and
 each change is M blocks at most (without specification of object and
 offset). If yes, how?

You must specify what will be modified (by using dmu_tx_hold_*) before doing 
dmu_tx_assign().  dmu_tx_assign() needs to know what may be modified in order 
to determine if there is enough space available.

You can try to solve this problem by doing dmu_tx_hold_*() on anything that 
*might* be modified.  Hopefully you have enough information up front to 
determine that to some degree of accuracy.  Otherwise, you must go far enough 
in your process to determine what will be modified before doing the 
dmu_tx_hold_*() and dmu_tx_assign().

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consequences of adding a root vdev later?

2007-09-05 Thread Bill Sommerfeld
On Wed, 2007-09-05 at 14:26 -0700, Richard Elling wrote:
 AFAIK, nobody has characterized resilvering, though this is about the 4th
 time this week someone has brought the topic up.  Has anyone done work here
 that we don't know about?  If so, please speak up :-)

I haven't been conducting controlled experiments, but I have been moving
a large pool around recently via a series of zpool replace operations,
and so have been keeping an eye on a bunch of resilvering.

The one conclusion I have so far is that, for the pool I'm moving, the
time to complete a disk-replacement resilver seems to be largely
independent of the number of disks being resilvered (so far, I've done
batches of up to seven replacements) and in the same ballpark as a
scrub.

To be conservative, I'm moving only one disk per raidz group per
pass.  

- Bill





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Serious ZFS problems

2007-09-05 Thread Tim Spriggs
Hello,

I think I have gained sufficient fool status for testing the 
fool-proof-ness of zfs. I have a cluster of T1000 servers running 
Solaris 10 and two x4100's running an OpenSolaris dist (Nexenta) which 
is at b68. Each T1000 hosts several zones each of which has its own 
zpool associated with it. Each zpool is a mirrored configuration between 
and IBM N series Nas and another OSOL box serving iscsi from zvols. To 
move zones around, I move the zone configuration and then move the zpool 
from one T1000 to another and bring the zone up. Now for the problem.

For sake of brevity:

T1000-1: zpool export pool1
T1000-2: zpool export pool2
T1000-3: zpool import -f pool1
T1000-4: zpool import -f pool2
and other similar operations to move zone data around.

Then I 'init 6'd all the T1000s. The reason for the init 6 was so that 
all of the pools would completely let go of the iscsi luns so I can 
remove static-configurations from each T1000.

upon reboot, pool1 has the following problem:

WARNING: can't process intent log for pool1

and then attempts to export the pool fail with:

cannot open 'pool1': I/O error


pool2 can consistently make a T1000 (Sol1) kernel panic when imported. 
It will also make an x4100 panic (osol)


Any ideas?

Thanks in advance.
-Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss