Re: [zfs-discuss] zfs and small files
Claus Guttesen writes: I have many small - mostly jpg - files where the original file is approx. 1 MB and the thumbnail generated is approx. 4 KB. The files are currently on vxfs. I have copied all files from one partition onto a zfs-ditto. The vxfs-partition occupies 401 GB and zfs 449 GB. Most files uploaded are in jpg and all thumbnails are always jpg. Is there a problem? Not by the diskusage itself. But if zfs takes up more space than vxfs (12 %) 80 TB will become 89 TB instead (our current storage) and add cost. Also, how are you measuring this (what commands)? I did a 'df -h'. Will a different volblocksize (during creation of the partition) make better use of the available diskspace? Will (meta)data require less space if compression is enabled? volblocksize won't have any affect on file systems, it is for zvols. Perhaps you mean recordsize? But recall that recordsize is the maximum limit, not the actual limit, which is decided dynamically. I read http://www.opensolaris.org/jive/thread.jspa?threadID=37673tstart=105 which is very similar to my case except for the file type. But no clear pointers otherwise. A good start would be to find the distribution of file sizes. The files are approx. 1 MB with an thumbnail of approx. 4 KB. So the 1 MB files are stored as ~8 x 128K recordsize. Because of 5003563 use smaller tail block for last block of object The last block of you file is partially used. It will depend on your filesize distribution by without that info we can only guess that we're wasting an avg of 64K per file. Or 6%. If your distribution is such that most files are slightly more than 1M, then we'd have 12% overhead from this effect. So using 16K/32K recordsize would quite possibly help as files would be stored using ~ 64 x 16K blocks with an overhead of 1-2% (0.5 blocks wasted every 64). -r -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and small files
So the 1 MB files are stored as ~8 x 128K recordsize. Because of 5003563 use smaller tail block for last block of object The last block of you file is partially used. It will depend on your filesize distribution by without that info we can only guess that we're wasting an avg of 64K per file. Or 6%. If your distribution is such that most files are slightly more than 1M, then we'd have 12% overhead from this effect. So using 16K/32K recordsize would quite possibly help as files would be stored using ~ 64 x 16K blocks with an overhead of 1-2% (0.5 blocks wasted every 64). I will (re)create the partition and modify the recordsize. I was unwilling to do so when I read the man page which discourages modifying this setting unless a database was used. Does zfs use suballocation if a file does not use an entire recordsize? If not the thumbnails probably wastes most space. They are approx. 4 KB. I'll be testing recordsizes from 1K and upwards. Actually 1K made zfs very slow but 2K seems fine. I'll report back when the entire partition has been copied. When I find the sweet spot I'll try to enable (default) compression. Thank you. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and small files
Claus Guttesen writes: So the 1 MB files are stored as ~8 x 128K recordsize. Because of 5003563 use smaller tail block for last block of object The last block of you file is partially used. It will depend on your filesize distribution by without that info we can only guess that we're wasting an avg of 64K per file. Or 6%. If your distribution is such that most files are slightly more than 1M, then we'd have 12% overhead from this effect. So using 16K/32K recordsize would quite possibly help as files would be stored using ~ 64 x 16K blocks with an overhead of 1-2% (0.5 blocks wasted every 64). I will (re)create the partition and modify the recordsize. I was unwilling to do so when I read the man page which discourages modifying this setting unless a database was used. Does zfs use suballocation if a file does not use an entire recordsize? If not the thumbnails probably wastes most space. They are approx. 4 KB. Files smaller than 'recordsize' are stored using a multiple of the sector size. So small files should not factor in this equation. I'll be testing recordsizes from 1K and upwards. Actually 1K made zfs very slow but 2K seems fine. I'll report back when the entire partition has been copied. When I find the sweet spot I'll try to enable (default) compression. Beware because at 2K you might be generating more indirect blocks. For 1MB files the gains from using a recordsize smaller than 16K start to be quite small. -r Thank you. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS (and quota)
I'm CCing zfs-discuss@opensolaris.org, as this doesn't look like FreeBSD-specific problem. It looks there is a problem with block allocation(?) when we are near quota limit. tank/foo dataset has quota set to 10m: Without quota: FreeBSD: # dd if=/dev/zero of=/tank/test bs=512 count=20480 time: 0.7s Solaris: # dd if=/dev/zero of=/tank/test bs=512 count=20480 time: 4.5s With quota: FreeBSD: # dd if=/dev/zero of=/tank/foo/test bs=512 count=20480 dd: /tank/foo/test: Disc quota exceeded time: 306.5s Solaris: # dd if=/dev/zero of=/tank/foo/test bs=512 count=20480 write: Disc quota exceeded time: 602.7s CPU is almost entirely idle, but disk activity seems to be high. Any ideas? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp0eROCivYe1.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic when trying to import pool
Ok I found the problem with 0x06, one disk was missing. But now I got all my disk and I get 0x05.: Sep 21 10:25:53 unknown ^Mpanic[cpu0]/thread=ff0001e12c80: Sep 21 10:25:53 unknown genunix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: .. /../common/fs/zfs/space_map.c, line: 339 Sep 21 10:25:53 unknown unix: [ID 10 kern.notice] Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e124f0 genunix:assfail3+b9 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12590 zfs:space_map_load+2ef () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e125d0 zfs:metaslab_activate+66 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12690 zfs:metaslab_group_alloc+24e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12760 zfs:metaslab_alloc_dva+192 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12800 zfs:metaslab_alloc+82 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12850 zfs:zio_dva_allocate+68 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12870 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e128a0 zfs:zio_checksum_generate+6e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e128c0 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12930 zfs:zio_write_compress+239 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12950 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129a0 zfs:zio_wait_for_children+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129c0 zfs:zio_wait_children_ready+20 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129e0 zfs:zio_next_stage_async+bb () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12a00 zfs:zio_nowait+11 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12a80 zfs:dmu_objset_sync+196 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12ad0 zfs:dsl_dataset_sync+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12b40 zfs:dsl_pool_sync+b5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12bd0 zfs:spa_sync+1c5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12c60 zfs:txg_sync_thread+19a () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12c70 unix:thread_start+8 () There is no scsi the disks, because those are virtual disk. Also for anyone who are interest, I wrote a little program to show the properties on the vdev.: http://www.projectvolcano.org/zfs/list_vdev.c. Here is a sample output: bash-3.00# ./list_vdev -d /dev/dsk/c1t12d0s0 Vdev properties for /dev/dsk/c1t12d0s0: version: 0x0003 name: share02 state: 0x0001 txg: 0x003fd0e4 pool_guid: 0x88f93fc54c215cfa top_guid: 0x65400f2e7db0c2a5 guid: 0xfc3b9af2d3b6fd46 vdev_tree: type: raidz id: 0x guid: 0x65400f2e7db0c2a5 nparity: 0x0001 metaslab_array: 0x000d metaslab_shift: 0x001e ashift: 0x0009 asize: 0x00196e0c children: [ [0] type: disk id: 0x guid: 0xfc3b9af2d3b6fd46 path: /dev/dsk/c1t12d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x004e [1] type: disk id: 0x0001 guid: 0x377cc1a2beb3c985 path: /dev/dsk/c1t13d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x004d [2] type: disk id: 0x0002 guid: 0xe97db62ad7fe325d path: /dev/dsk/c1t14d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x0091 ] So my question, is there a way to really know why I got IOE (0x05)? Is there a way to know in the debugger? How can I access it? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] The ZFS-Man.
Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=o3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf BTW. Inspired by ZFS demos from OpenSolaris page I created few demos of ZFS on FreeBSD: http://youtube.com/results?search_query=freebsd+zfssearch=Search And better versions: http://people.freebsd.org/~pjd/misc/zfs/ -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpe0ibMatzuw.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07 7:31 PM, Paul B. Henson [EMAIL PROTECTED] wrote: On Thu, 20 Sep 2007, Tim Spriggs wrote: It's an IBM re-branded NetApp which can which we are using for NFS and iSCSI. Yeah its fun to see IBM compete with its OEM provider Netapp. Ah, I see. Is it comparable storage though? Does it use SATA drives similar to the x4500, or more expensive/higher performance FC drives? Is it one of the models that allows connecting dual clustered heads and failing over the storage between them? I agree the x4500 is a sweet looking box, but when making price comparisons sometimes it's more than just the raw storage... I wish I could just drop in a couple of x4500's and not have to worry about the complexity of clustering sigh... zfs send/receive. Netapp is great, we have about 6 varieties in production here. But what I pay in maintenance and up front cost on just 2 filers, I can buy a x4500 a year, and have a 3 year warranty each time I buy. It just depends on the company you work for. I haven't played too much with anything but netapp and storagetek.. But once I got started on zfs I just knew it was the future; and I think netapp realizes that too. And if apple does what I think it will, it will only get better :) Fast, Cheap, Easy - you only get 2. Zfs may change that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server
Gino wrote: The x4500 is very sweet and the only thing stopping us from buying two instead of another shelf is the fact that we have lost pools on Sol10u3 servers and there is no easy way of making two pools redundant (ie the complexity of clustering.) Simply sending incremental snapshots is not a viable option. The pools we lost were pools on iSCSI (in a mirrored config) and they were mostly lost on zpool import/export. The lack of a recovery mechanism really limits how much faith we can put into our data on ZFS. It's safe as long as the pool is safe... but we've lost multiple pools. Hello Tim, did you try SNV60+ or S10U4 ? Gino Hi Gino, We need Solaris proper for these systems and we will have to schedule a significant downtime to patch update to U4. -Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On 9/20/07, Paul B. Henson [EMAIL PROTECTED] wrote: Again though, that would imply two different storage locations visible to the clients? I'd really rather avoid that. For example, with our current Samba implementation, a user can just connect to '\\files.csupomona.edu\username' to access their home directory or '\\files.csupomona.edu\groupname' to access a shared group directory. They don't need to worry on which physical server it resides or determine what server name to connect to. MS-DFS could be helpful here. You could have a virtual samba instance that generates MS-DFS redirects to the appropriate spot. At one point in the past I wrote a script (long since lost - at a different job) that would automatically convert automounter maps into the appropriately formatted symbolic links used by the Samba MS-DFS implementation. It worked quite well for giving one place to administer the location mapping while providing transparency to the end-users. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in s10u4?
Mike, Grant, I reported the zoneadm.1m man page problem to the man page group. I also added some stronger wording to the ZFS Admin Guide and the ZFS FAQ about not using ZFS for zone root paths for the Solaris 10 release and that upgrading or patching is not supported for either Solaris 10 or Solaris Express release. It is true that none of the current install/patch tools recognize ZFS yet and this has been discussed many times before. You can follow the ZFS boot/root project here: http://opensolaris.org/os/community/zfs/boot/ I've tried to add all the things you *can't* do with ZFS in the ZFS FAQ and the next topic to add is around the inability to split mirrors. If you are considering doing something with ZFS where recovering would be difficut, then please ask here first. :-) Cindy Mike Gerdts wrote: On 9/19/07, grant beattie [EMAIL PROTECTED] wrote: according to the zoneadm(1m) man page on s10u4: clone [-m copy] [-s zfs_snapshot] source_zone Install a zone by copying an existing installed zone. This subcommand is an alternative way to install the zone. That's interesting... I reported this as a bug during the S10U2 (11/06) beta and it got fixed for 11/06. My bug report was closed as 6480274 as duplicate of 6383119. This was wrong - I was reporting a man page bug and not the feature request (CR 6383119). The feature request from me was the previous bug that I filed in that beta program. :) Someone really wants this man page to be out of sync with the command. The rather consistent answer is that zoneadm clone will not do zfs until live upgrade does zfs. Since there is a new project in the works (Snap Upgrade) that is very much targeted at environments that use zfs, I would be surprised to see zfs support come into live upgrade. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS-Man.
On Sep 21, 2007, at 11:47 AM, Pawel Jakub Dawidek wrote: Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=o3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf Looks like Jeff has been working out :) BTW. Inspired by ZFS demos from OpenSolaris page I created few demos of ZFS on FreeBSD: http://youtube.com/results?search_query=freebsd+zfssearch=Search And better versions: http://people.freebsd.org/~pjd/misc/zfs/ Nice. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, Tim Spriggs wrote: The x4500 is very sweet and the only thing stopping us from buying two instead of another shelf is the fact that we have lost pools on Sol10u3 servers and there is no easy way of making two pools redundant (ie the complexity of clustering.) Simply sending incremental snapshots is not a viable option. The pools we lost were pools on iSCSI (in a mirrored config) and they were mostly lost on zpool import/export. The lack of a recovery mechanism really limits how much faith we can put into our data on ZFS. It's safe as long as the pool is safe... but we've lost multiple pools. Lost data doesn't give me a warm fuzzy 8-/. Were you running an officially supported version of Solaris at the time? If so, what did Sun support have to say about this issue? -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS-Man.
On Sep 21, 2007, at 14:57, eric kustarz wrote: Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=o3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf Looks like Jeff has been working out :) my first thought too: http://blogs.sun.com/bonwick/resource/images/bonwick.portrait.jpg funny - i always pictured this as UFS-man though: http://www.benbakerphoto.com/business/47573_8C-after.jpg but what's going on with the sheep there? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS-Man.
Jonathan Edwards wrote: On Sep 21, 2007, at 14:57, eric kustarz wrote: Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=o3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf Looks like Jeff has been working out :) my first thought too: http://blogs.sun.com/bonwick/resource/images/bonwick.portrait.jpg funny - i always pictured this as UFS-man though: http://www.benbakerphoto.com/business/47573_8C-after.jpg but what's going on with the sheep there? Got me but they do look kind of nervous. (Happy friday folks...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
Paul B. Henson wrote: On Thu, 20 Sep 2007, Tim Spriggs wrote: The x4500 is very sweet and the only thing stopping us from buying two instead of another shelf is the fact that we have lost pools on Sol10u3 servers and there is no easy way of making two pools redundant (ie the complexity of clustering.) Simply sending incremental snapshots is not a viable option. The pools we lost were pools on iSCSI (in a mirrored config) and they were mostly lost on zpool import/export. The lack of a recovery mechanism really limits how much faith we can put into our data on ZFS. It's safe as long as the pool is safe... but we've lost multiple pools. Lost data doesn't give me a warm fuzzy 8-/. Were you running an officially supported version of Solaris at the time? If so, what did Sun support have to say about this issue? Sol 10 with just about all patches up to date. I joined this list in hope of a good answer. After answering a few questions over two days I had no hope of recovering the data. Don't import/export (especially between systems) without serious cause, at least not with U3. I haven't tried updating our servers yet and I don't intend to for a while now. The filesystems contained databases that were luckily redundant and could be rebuilt, but our DBA was not too happy to have to do that at 3:00am. I still have a pool that can not be mounted or exported. It shows up with zpool list but nothing under zfs list. zpool export gives me an IO error and does nothing. On the next downtime I am going to attempt to yank the lun out from under its feet (as gently as I can) after I have stopped all other services. Still, we are using ZFS but we are re-thinking on how to deploy/manage it. Our original model had us exporting/importing pools in order to move zone data between machines. We had done the same with UFS on iSCSI without a hitch. ZFS worked for about 8 zone moves and then killed 2 zones. The major operational difference between the moves involved a reboot of the global zones. The initial import worked but after the reboot the pools were in a bad state reporting errors on both drives in the mirror. I exported one (bad choice) and attempted to gain access to the other. Now attempting to import the first pool will panic a solaris/opensolaris box very reliably. The second is in the state I described above. Also, the drive labels are intact according to zdb. When we don't move pools around, zfs seems to be stable on both Solaris and OpenSolaris. I've done snapshots/rollbacks/sends/receives/clones/... without problems. We even have zvols exported as mirrored luns from an OpenSolaris box. It mirrors the luns that the IBM/NetApp box exports and seems to be doing fine with that. There are a lot of other people that seem to have the same opinion and use zfs with direct attached storage. -Tim PS: when I have a lot of time I might try to reproduce this by: m2# zpool create test mirror iscsi_lun1 iscsi_lun2 m2# zpool export test m1# zpool import -f test m1# reboot m2# reboot ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Sep 21, 2007, at 3:50 PM, Tim Spriggs wrote: Paul B. Henson wrote: On Thu, 20 Sep 2007, Tim Spriggs wrote: The x4500 is very sweet and the only thing stopping us from buying two instead of another shelf is the fact that we have lost pools on Sol10u3 servers and there is no easy way of making two pools redundant (ie the complexity of clustering.) Simply sending incremental snapshots is not a viable option. The pools we lost were pools on iSCSI (in a mirrored config) and they were mostly lost on zpool import/export. The lack of a recovery mechanism really limits how much faith we can put into our data on ZFS. It's safe as long as the pool is safe... but we've lost multiple pools. Lost data doesn't give me a warm fuzzy 8-/. Were you running an officially supported version of Solaris at the time? If so, what did Sun support have to say about this issue? Sol 10 with just about all patches up to date. I joined this list in hope of a good answer. After answering a few questions over two days I had no hope of recovering the data. Don't import/export (especially between systems) without serious cause, at least not with U3. I haven't tried updating our servers yet and I don't intend to for a while now. The filesystems contained databases that were luckily redundant and could be rebuilt, but our DBA was not too happy to have to do that at 3:00am. I still have a pool that can not be mounted or exported. It shows up with zpool list but nothing under zfs list. zpool export gives me an IO error and does nothing. On the next downtime I am going to attempt to yank the lun out from under its feet (as gently as I can) after I have stopped all other services. Still, we are using ZFS but we are re-thinking on how to deploy/manage it. Our original model had us exporting/importing pools in order to move zone data between machines. We had done the same with UFS on iSCSI without a hitch. ZFS worked for about 8 zone moves and then killed 2 zones. The major operational difference between the moves involved a reboot of the global zones. The initial import worked but after the reboot the pools were in a bad state reporting errors on both drives in the mirror. I exported one (bad choice) and attempted to gain access to the other. Now attempting to import the first pool will panic a solaris/opensolaris box very reliably. The second is in the state I described above. Also, the drive labels are intact according to zdb. When we don't move pools around, zfs seems to be stable on both Solaris and OpenSolaris. I've done snapshots/rollbacks/sends/receives/ clones/... without problems. We even have zvols exported as mirrored luns from an OpenSolaris box. It mirrors the luns that the IBM/NetApp box exports and seems to be doing fine with that. There are a lot of other people that seem to have the same opinion and use zfs with direct attached storage. -Tim PS: when I have a lot of time I might try to reproduce this by: m2# zpool create test mirror iscsi_lun1 iscsi_lun2 m2# zpool export test m1# zpool import -f test m1# reboot m2# reboot Since I haven't actually looked into what problem caused your pools to become damaged/lost, i can only guess that its possibly due to the pool being actively imported on multiple machines (perhaps even accidentally). If it is that, you'll be happy to note that we specifically no longer that to happen (unless you use the -f flag): http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725 Looks like it just missed the s10u4 cut off, but should be in s10_u5. In your above example, there should be no reason why you have to use the '-f' flag on import (the pool was cleanly exported) - when you're moving the pool from system to system, this can get you into trouble if things don't go exactly how you planned. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
eric kustarz wrote: On Sep 21, 2007, at 3:50 PM, Tim Spriggs wrote: m2# zpool create test mirror iscsi_lun1 iscsi_lun2 m2# zpool export test m1# zpool import -f test m1# reboot m2# reboot Since I haven't actually looked into what problem caused your pools to become damaged/lost, i can only guess that its possibly due to the pool being actively imported on multiple machines (perhaps even accidentally). If it is that, you'll be happy to note that we specifically no longer that to happen (unless you use the -f flag): http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6282725 Looks like it just missed the s10u4 cut off, but should be in s10_u5. In your above example, there should be no reason why you have to use the '-f' flag on import (the pool was cleanly exported) - when you're moving the pool from system to system, this can get you into trouble if things don't go exactly how you planned. eric That's a very possible prognosis. Even when the pools are exported from one system, they are still marked as attached (thus the -f was necessary). Since I rebooted both systems at the same time I guess it's possible that they both made claim to the pool and corrupted it. I'm glad this will be fixed in the future. -Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in
grant beattie wrote: I don't have any advice, unfortunately, but I do know that in my case putting zones on UFS is simply not an option. there must be a way considering there is nothing in the documentation to suggest that zones on ZFS are not supported. There's a very explicit Do not place the zonepath on ZFS for this release in this doc: http://docs.sun.com/app/docs/doc/817-1592/z.conf.start-5?a=view one question though, why does patchadd care about filesystems in the first place? what if I put my zones on VxFS, or QFS? I don't see why it should make any difference to patchadd. live upgrade is obviously another kettle of fish entirely, though. patch and install tools can't figure out pools yet. If you have a 1GB pool and 10 filesystems on it, du reports each having 1GB, do you have 10GB capacity? The tools can't tell. Please check the archives, this subject has been extensively discussed. CT ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in
On 9/20/07, Matthew Flanagan [EMAIL PROTECTED] wrote: Mike, I followed your procedure for cloning zones and it worked well up until yesterday when I tried applying the S10U4 kernel patch 12001-14 and it wouldn't apply because I had my zones on zfs :( Thanks for sharing. That sucks. I'm still figuring out how to fix this other than moving all of my zones onto UFS. How about a dtrace script that changes the fstyp in statvfs() returns to say that it is ufs? :) I bet someone comes along and says that isn't supported either... -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zoneadm clone doesn't support ZFS snapshots in
On 9/21/07, Christine Tran [EMAIL PROTECTED] wrote: patch and install tools can't figure out pools yet. If you have a 1GB pool and 10 filesystems on it, du reports each having 1GB, do you have 10GB capacity? The tools can't tell. Please check the archives, this subject has been extensively discussed. Two responses come immediately to mind... 1) Thanks for protecting stupid/careless people from doing bad things. 2) UNIX has a longstanding tradition of adding a -f flag for cases when the sysadmin realizes there is additional risk but feels that appropriate precautions have been taken. I would really like to ask Sun for a roadmap as to when this is going to be supported. Since this is the zfs list (not zones or install list) and it is OpenSolaris (not Solaris) I guess I should probably find a more appropriate forum. So, for now I will use OpenSolaris where I can and wait patiently for the new installer + snap upgrade basket and wait for it to find its way into Solaris in about a year or two. In the meantime, I'll probably end up putting most zones on a particular competitor's NAS devices and looking into how well their file system cloning capabilities play in coordination with iSCSI. irony Oh, wait! What if the NAS device runs out of space while I'm patching? Better rule out the thin provisioning capabilities of the HDS storage that Sun sells as well. /irony -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, 20 Sep 2007, eric kustarz wrote: As far as quotas, I was less than impressed with their implementation. Would you mind going into more details here? The feature set was fairly extensive, they supported volume quotas for users or groups, or qtree quotas, which similar to the ZFS quota would limit space for a particular directory and all of its contents regardless of user/group ownership. But all quotas were set in a single flat text file. Anytime you added a new quota, you needed to turn off quotas, then turn them back on, and quota enforcement was disabled while it recalculated space utilization. Like a lot of aspects of the filer, it seemed possibly functional but rather kludgy. I hate kludgy :(. I'd have to go review the documentation to recall the other issues I had with it, quotas were one of the last things we reviewed and I'd about given up taking notes at that point. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Andy Lubel wrote: Yeah its fun to see IBM compete with its OEM provider Netapp. Yes, we had both IBM and Netapp out as well. I'm not sure what the point was... We do have some IBM SAN equipment on site, I suppose if we had gone with the IBM variant we could have consolidated support. sometimes it's more than just the raw storage... I wish I could just drop in a couple of x4500's and not have to worry about the complexity of clustering sigh... zfs send/receive. If I understand correctly, that would be sort of a poor man's replication? So you would result with a physical copy on server2 of all of the data on server1? What would you do when server1 crashed and died? One of the benefits of a real cluster would be the automatic failover, and fail back when the server recovered. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, James F. Hranicky wrote: It just seems rather involved, and relatively inefficient to continuously be mounting/unmounting stuff all the time. One of the applications to be deployed against the filesystem will be web service, I can't really envision a web server with tens of thousands of NFS mounts coming and going, seems like a lot of overhead. Well, that's why ZFS wouldn't work for us :-( . Although, I'm just saying that from my gut -- does anyone have any actual experience with automounting thousands of file systems? Does it work? Is it horribly inefficient? Poor performance? Resource intensive? Makes sense -- in that case you would be looking at multiple SMB servers, though. Yes, with again the resultant problem of worrying about where a user's files are when they want to access them :(. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Mike Gerdts wrote: MS-DFS could be helpful here. You could have a virtual samba instance that generates MS-DFS redirects to the appropriate spot. At one point in That's true, although I rather detest Microsoft DFS (they stole the acronym from DCE/DFS, even though particularly the initial versions sucked feature-wise in comparison). Also, the current release version of MacOS X does not support CIFS DFS referrals. I'm not sure if the upcoming version is going to rectify that or not. Windows clients not belonging to the domain also occasionally have problems accessing shares across different servers. Although it is definitely something to consider if I'm going to be unable to achieve my single namespace by having one large server... Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs mount points (all-or-nothing)
msl wrote: Hello all, There is a way to configure the zpool to legacy_mount, and have all filesystems in that pool mounted automatically? I will try explain better: - Imagine that i have a zfs pool with 1000 filesystems. - I want to control the mount/unmount of that pool, so, i did configure the zpool to legacy_mount. - But i don't want to have to mount the other 1000 filessytems...so, when i issue a mount -F zfs mypool, all the filesystems would be mounted too (i think the mount property is per-filesystem). I don't quite follow what behavior you are looking for. When you say you want to control the mount/unmount of the pool, do you mean just the poolname filesystem, or all filesystems in the pool? You may be looking for zfs set canmount=off poolname. This will cause the poolname (top-most) filesystem to not be mounted, but all filesystems below it will be mounted as usual. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Fri, 21 Sep 2007, Tim Spriggs wrote: Still, we are using ZFS but we are re-thinking on how to deploy/manage it. Our original model had us exporting/importing pools in order to move zone data between machines. We had done the same with UFS on iSCSI [...] When we don't move pools around, zfs seems to be stable on both Solaris and OpenSolaris. I've done snapshots/rollbacks/sends/receives/clones/... Sounds like your problems are in an area we probably wouldn't be delving into... Thanks for the detail. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [EMAIL PROTECTED] California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] enterprise scale redundant Solaris 10/ZFS server providing NFSv4/CIFS
On Thu, Sep 20, 2007 at 12:49:29PM -0700, Paul B. Henson wrote: I was planning to provide CIFS services via Samba. I noticed a posting a while back from a Sun engineer working on integrating NFSv4/ZFS ACL support into Samba, but I'm not sure if that was ever completed and shipped either in the Sun version or pending inclusion in the official version, does anyone happen to have an update on that? Also, I saw a patch proposing a different implementation of shadow copies that better supported ZFS snapshots, any thoughts on that would also be appreciated. This work is done and, AFAIK, has been integrated into S10 8/07. Excellent. I did a little further research myself on the Samba mailing lists, and it looks like ZFS ACL support was merged into the official 3.0.26 release. Unfortunately, the patch to improve shadow copy performance on top of ZFS still appears to be floating around the technical mailing list under discussion. ZFS ACL support was going to be merged into 3.0.26 but 3.0.26 ended up being a security fix release and the merge got pushed back. The next release will be 3.2.0 and ACL support will be in there. As others have pointed out though, Samba is included in Solaris 10 Update 4 along with support for ZFS ACLs, Active Directory, and SMF. The patches for the shadow copy module can be found here: http://www.edplese.com/samba-with-zfs.html There are hopefully only a few minor changes that I need to make to them before submitting them again to the Samba team. I recently compiled the module for someone to use with Samba as shipped with U4 and he reported that it worked well. I've made the compiled module available on this page as well if anyone is interested in testing it. The patch doesn't improve performance anymore in order to preserve backwards compatibility with the existing module but adds usability enhancements for both admins and end-users. It allows shadow copy functionality to just work with ZFS snapshots without having to create symlinks to each snapshot in the root of each share. For end-users it allows the Previous Versions list to be sorted chronologically to make it easier to use. If performance is an issue the patch can be modified to improve performance like the original patch did but this only affects directory listings and is likely negligible in most cases. Is there any facility for managing ZFS remotely? We have a central identity management system that automatically provisions resources as necessary for [...] This is a loaded question. There is a webconsole interface to ZFS which can be run from most browsers. But I think you'll find that the CLI is easier for remote management. Perhaps I should have been more clear -- a remote facility available via programmatic access, not manual user direct access. If I wanted to do something myself, I would absolutely login to the system and use the CLI. However, the question was regarding an automated process. For example, our Perl-based identity management system might create a user in the middle of the night based on the appearance in our authoritative database of that user's identity, and need to create a ZFS filesystem and quota for that user. So, I need to be able to manipulate ZFS remotely via a programmatic API. While it won't help you in your case since your users access the files using protocols other than CIFS, if you use only CIFS it's possible to configure Samba to automatically create a user's home directory the first time the user connects to the server. This is done using the root preexec share option in smb.conf and an example is provided at the above URL. Ed Plese ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss