Re: [zfs-discuss] Pools inside pools
On Thu, Sep 23, 2010 at 08:48, Haudy Kazemi kaze0...@umn.edu wrote: Mattias Pantzare wrote: ZFS needs free memory for writes. If you fill your memory with dirty data zfs has to flush that data to disk. If that disk is a virtual disk in zfs on the same computer those writes need more memory from the same memory pool and you have a deadlock. If you write to a zvol on a different host (via iSCSI) those writes use memory in a different memory pool (on the other computer). No deadlock. Isn't this a matter of not keeping enough free memory as a workspace? By free memory, I am referring to unallocated memory and also recoverable main memory used for shrinkable read caches (shrinkable by discarding cached data). If the system keeps enough free and recoverable memory around for workspace, why should the deadlock case ever arise? Slowness and page swapping might be expected to arise (as a result of a shrinking read cache and high memory pressure), but deadlocks too? Yes. But what is enough reserved free memory? If you need 1Mb for a normal configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just guessing). This is the same problem as mounting an NFS server on itself via NFS. Also not supported. The system has shrinkable caches and so on, but that space will sometimes run out. All of it. There is also swap to use, but if that is on ZFS These things are also very hard to test. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pools inside pools
On Wed, Sep 22, 2010 at 20:15, Markus Kovero markus.kov...@nebula.fi wrote: Such configuration was known to cause deadlocks. Even if it works now (which I don't expect to be the case) it will make your data to be cached twice. The CPU utilization will also be much higher, etc. All in all I strongly recommend against such setup. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! Well, CPU utilization can be tuned downwards by disabling checksums in inner pools as checksumming is done in main pool. I'd be interested in bug id's for deadlock issues and everything related. Caching twice is not an issue, prefetching could be and it can be disabled I don't understand what makes it difficult for zfs to handle this kind of setup. Main pool (testpool) should just allow any writes/reads to/from volume, not caring what they are, where as anotherpool would just work as any other pool consisting of any other devices. This is quite similar setup to iscsi-replicated mirror pool, where you have redundant pool created from iscsi volumes locally and remotely. ZFS needs free memory for writes. If you fill your memory with dirty data zfs has to flush that data to disk. If that disk is a virtual disk in zfs on the same computer those writes need more memory from the same memory pool and you have a deadlock. If you write to a zvol on a different host (via iSCSI) those writes use memory in a different memory pool (on the other computer). No deadlock. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Suggested RaidZ configuration...
On Wed, Sep 8, 2010 at 06:59, Edward Ned Harvey sh...@nedharvey.com wrote: On Tue, Sep 7, 2010 at 4:59 PM, Edward Ned Harvey sh...@nedharvey.com wrote: I think the value you can take from this is: Why does the BPG say that? What is the reasoning behind it? Anything that is a rule of thumb either has reasoning behind it (you should know the reasoning) or it doesn't (you should ignore the rule of thumb, dismiss it as myth.) Let's examine the myth that you should limit the number of drives in a vdev because of resilver time. The myth goes something like this: You shouldn't use more than ___ drives in a vdev raidz_ configuration, because all the drives need to read during a resilver, so the more drives are present, the longer the resilver time. The truth of the matter is: Only the size of used data is read. Because this is ZFS, it's smarter than a hardware solution which would have to read all disks in their entirety. In ZFS, if you have a 6-disk raidz1 with capacity of 5 disks, and a total of 50G of data, then each disk has roughly 10G of data in it. During resilver, 5 disks will each read 10G of data, and 10G of data will be written to the new disk. If you have a 11-disk raidz1 with capacity of 10 disks, then each disk has roughly 5G of data. 10 disks will each read 5G of data, and 5G of data will be written to the new disk. If anything, more disks means a faster resilver, because you're more easily able to saturate the bus, and you have a smaller amount of data that needs to be written to the replaced disk. It is not a question of a vdev with 6 disk vs a vdev with 12 disks. It is about 1 vdev with 12 disk or 2 vdev with 6 disks. If you have 2 vdev you have to read half the data compared to 1 vdev to resilver a disk. Or look at it this way, you will put more data on a 12 disk vdev than on a 6 disk vdev. IO other than the resilver will also slow the resilver down more if you have large vdevs. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Suggested RaidZ configuration...
On Wed, Sep 8, 2010 at 15:27, Edward Ned Harvey sh...@nedharvey.com wrote: From: pantz...@gmail.com [mailto:pantz...@gmail.com] On Behalf Of Mattias Pantzare It is about 1 vdev with 12 disk or 2 vdev with 6 disks. If you have 2 vdev you have to read half the data compared to 1 vdev to resilver a disk. Let's suppose you have 1T of data. You have 12-disk raidz2. So you have approx 100G on each disk, and you replace one disk. Then 11 disks will each read 100G, and the new disk will write 100G. Let's suppose you have 1T of data. You have 2 vdev's that are each 6-disk raidz1. Then we'll estimate 500G is on each vdev, so each disk has approx 100G. You replace a disk. Then 5 disks will each read 100G, and 1 disk will write 100G. Both of the above situations resilver in equal time, unless there is a bus bottleneck. 21 disks in a single raidz3 will resilver just as fast as 7 disks in a raidz1, as long as you are avoiding the bus bottleneck. But 21 disks in a single raidz3 provides better redundancy than 3 vdev's each containing a 7 disk raidz1. In my personal experience, approx 5 disks can max out approx 1 bus. (It actually ranges from 2 to 7 disks, if you have an imbalance of cheap disks on a good bus, or good disks on a crap bus, but generally speaking people don't do that. Generally people get a good bus for good disks, and cheap disks for crap bus, so approx 5 disks max out approx 1 bus.) In my personal experience, servers are generally built with a separate bus for approx every 5-7 disk slots. So what it really comes down to is ... Instead of the Best Practices Guide saying Don't put more than ___ disks into a single vdev, the BPG should say Avoid the bus bandwidth bottleneck by constructing your vdev's using physical disks which are distributed across multiple buses, as necessary per the speed of your disks and buses. This is assuming that you have no other IO besides the scrub. You should of course keep the number of disks in a vdev low for general performance reasons unless you only have linear reads (as your IOPS will be close to what only one disk can give for the whole vdev). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs lists discrepancy after added a new vdev to pool
On Sat, Aug 28, 2010 at 02:54, Darin Perusich darin.perus...@cognigencorp.com wrote: Hello All, I'm sure this has been discussed previously but I haven't been able to find an answer to this. I've added another raidz1 vdev to an existing storage pool and the increased available storage isn't reflected in the 'zfs list' output. Why is this? The system in question is runnning Solaris 10 5/09 s10s_u7wos_08, kernel Generic_139555-08. The system does not have the lastest patches which might be the cure. Thanks! Here's what I'm seeing. zpool create datapool raidz1 c1t50060E800042AA70d0 c1t50060E800042AA70d1 zpool status pool: datapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t50060E800042AA70d0 ONLINE 0 0 0 c1t50060E800042AA70d1 ONLINE 0 0 0 zfs list NAME USED AVAIL REFER MOUNTPOINT datapool 108K 196G 18K /datapool zpool add datapool raidz1 c1t50060E800042AA70d2 c1t50060E800042AA70d3 zpool status pool: datapool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t50060E800042AA70d0 ONLINE 0 0 0 c1t50060E800042AA70d1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c1t50060E800042AA70d2 ONLINE 0 0 0 c1t50060E800042AA70d3 ONLINE 0 0 0 zfs list NAME USED AVAIL REFER MOUNTPOINT datapool 112K 392G 18K /datapool I think you have to explain your problem more, 392G is more than 196G? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS
On Sun, May 30, 2010 at 23:37, Sandon Van Ness san...@van-ness.com wrote: I just wanted to make sure this is normal and is expected. I fully expected that as the file-system filled up I would see more disk space being used than with other file-systems due to its features but what I didn't expect was to lose out on ~500-600GB to be missing from the total volume size right at file-system creation. Comparing two systems, one being JFS and one being ZFS, one being raidz2 one being raid6. Here is the differences I see: ZFS: r...@opensolaris: 11:22 AM :/data# df -k /data Filesystem kbytes used avail capacity Mounted on data 17024716800 258872352 16765843815 2% /data JFS: r...@sabayonx86-64: 11:22 AM :~# df -k /data2 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd1 17577451416 2147912 17575303504 1% /data2 zpool list shows the raw capacity right? r...@opensolaris: 11:25 AM :/data# zpool list data NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 18.1T 278G 17.9T 1% 1.00x ONLINE - Ok, i would expect it to be rounded to 18.2 but that seems about right for 20 trillion bytes (what 20x1 TB is): r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 20/1024/1024/1024/1024}' 18.1899 Now minus two drives for parity: r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print 18/1024/1024/1024/1024}' 16.3709 Yet when running zfs list it also lists the amount of storage significantly smaller: r...@opensolaris: 11:23 AM :~# zfs list data NAME USED AVAIL REFER MOUNTPOINT data 164K 15.9T 56.0K /data I would expect this to be 16.4T. Taking the df -k values JFS gives me a total volume size of: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17577451416/1024/1024/1024}' 16.3703 and zfs is: r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print 17024716800/1024/1024/1024}' 15.8555 So basically with JFS I see no decrease in total volume size but a huge difference on ZFS. Is this normal/expected? Can anything be disabled to not lose 500-600 GB of space? This may be the answer: http://www.cuddletech.com/blog/pivot/entry.php?id=1013 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
On Sat, May 1, 2010 at 16:23, casper@sun.com wrote: I understand you cannot lookup names by inode number in general, because that would present a security violation. Joe User should not be able to find the name of an item that's in a directory where he does not have permission. But, even if it can only be run by root, is there some way to lookup the name of an object based on inode number? Sure, that's typically how NFS works. The inode itself is not sufficient; an inode number might be recycled and and old snapshot with the same inode number may refer to a different file. No, a NFS client will not ask the NFS server for a name by sending the inode or NFS-handle. There is no need for a NFS client to do that. There is no way to get a name from an inode number. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
On Sat, May 1, 2010 at 16:49, casper@sun.com wrote: No, a NFS client will not ask the NFS server for a name by sending the inode or NFS-handle. There is no need for a NFS client to do that. The NFS clients certainly version 2 and 3 only use the file handle; the file handle can be decoded by the server. It filehandle does not contain the name, only the FSid, the inode number and the generation. There is no way to get a name from an inode number. The nfs server knows how so it is clearly possible. It is not exported to userland but the kernel can find a file by its inumber. The nfs server can find the file but not the file _name_. inode is all that the NFS server needs, it does not need the file name if it has the inode number. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
If the kernel (or root) can open an arbitrary directory by inode number, then the kernel (or root) can find the inode number of its parent by looking at the '..' entry, which the kernel (or root) can then open, and identify both: the name of the child subdir whose inode number is already known, and (b) yet another '..' entry. The kernel (or root) can repeat this process recursively, up to the root of the filesystem tree. At that time, the kernel (or root) has completely identified the absolute path of the inode that it started with. The only question I want answered right now is: Although it is possible, is it implemented? Is there any kind of function, or existing program, which can be run by root, to obtain either the complete path of a directory by inode number, or to simply open an inode by number, which would leave the recursion and absolute path generation yet to be completed? You can do in the kernel by calling vnodetopath(). I don't know if it is exposed to user space. But that could be slow if you have large directories so you have to think about where you would use it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Secure delete?
OpenSolaris needs support for the TRIM command for SSDs. This command is issued to an SSD to indicate that a block is no longer in use and the SSD may erase it in preparation for future writes. There does not seem to be very much `need' since there are other ways that a SSD can know that a block is no longer in use so it can be erased. In fact, ZFS already uses an algorithm (COW) which is friendly for SSDs. What ways would that be? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Secure delete?
On Mon, Apr 12, 2010 at 19:19, David Magda dma...@ee.ryerson.ca wrote: On Mon, April 12, 2010 12:28, Tomas Ögren wrote: On 12 April, 2010 - David Magda sent me these 0,7K bytes: On Mon, April 12, 2010 10:48, Tomas Ögren wrote: For flash to overwrite a block, it needs to clear it first.. so yes, clearing it out in the background (after erasing) instead of just before the timing critical write(), you can make stuff go faster. Except that ZFS does not overwrite blocks because it is copy-on-write. So CoW will enable infinite storage, so you never have to write on the same place again? Cool. Your comment was regarding making write()s go faster by pre-clearing unused blocks so there's always writable blocks available. Because ZFS doesn't go to the same LBAs when writing data, the SSD doesn't have to worry about read-modify-write circumstances like it has to with traditional file systems. Given that ZFS probably would not have to go back to old blocks until it's reached the end of the disk, that should give the SSDs' firmware plenty of time to do block-remapping and background erasing--something that's done now anyway regardless of whether an SSD supports TRIM or not. You don't need TRIM to make ZFS go fast, though it doesn't hurt. Why would the disk care about if the block was written recently? There is old data on it that has to be preserved anyway. The SSD does not know if the old data was important. ZFS will overwrite just as any other filesystem. The only thing that makes ZFS SSD friendly is that it tries to make large writes. But that only works if you have few synchronous writes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Fri, Apr 2, 2010 at 16:24, Edward Ned Harvey solar...@nedharvey.com wrote: The purpose of the ZIL is to act like a fast log for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm hearing conflicting information, and don't know what to believe. Does anyone here work on ZFS as an actual ZFS developer for Sun/Oracle? Can claim I can answer this question, I wrote that code, or at least have read it? Questions to answer would be: Is a ZIL log device used only by sync() and fsync() system calls? Is it ever used to accelerate async writes? sync() will tell the filesystems to flush writes to disk. sync() will not use ZIL, it will just start a new TXG, and could return before the writes are done. fsync() is what you are interested in. Suppose there is an application which sometimes does sync writes, and sometimes async writes. In fact, to make it easier, suppose two processes open two files, one of which always writes asynchronously, and one of which always writes synchronously. Suppose the ZIL is disabled. Is it possible for writes to be committed to disk out-of-order? Meaning, can a large block async write be put into a TXG and committed to disk before a small sync write to a different file is committed to disk, even though the small sync write was issued by the application before the large async write? Remember, the point is: ZIL is disabled. Question is whether the async could possibly be committed to disk before the sync. Writers from a TXG will not be used until the whole TXG is committed to disk. Everything from a half written TXG will be ignored after a crash. This means that the order of writes within a TXG is not important. The only way to do a sync write without ZIL is to start a new TXG after the write. That costs a lot so we have the ZIL for sync writes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect
These days I am a fan for forward check access lists, because any one who owns a DNS server can say that for IPAddressX returns aserver.google.com. They can not set the forward lookup outside of their domain but they can setup a reverse lookup. The other advantage is forword looking access lists is you can use DNS Alias in access lists as well. That is not true, you have to have a valid A record in the correct domain. This is how it works (and how you should check you reverse lookups in your applications): 1. Do a reverse lookup. 2. Do a lookup with the name from 1. 3. Check that the IP address is one of the adresses you got in 2. Ignore the reverse lookup if the check in 3 fails. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there something like udev in OpenSolaris
On Sat, Feb 20, 2010 at 11:14, Lutz Schumann presa...@storageconcepts.de wrote: Hello list, beeing a Linux Guy I'm actually quite new to Opensolaris. One thing I miss is udev. I found that when using SATA disks with ZFS - it always required manual intervention (cfgadm) to do SATA hot plug. I would like to automate the disk replacement, so that it is a fully automatic process without manual intervention if: a) the new disk contains no ZFS labels b) the new disk does not contain a partition table .. thus it is a real replacement part On Linux I would write a udev hot plug script to automate this. Is there something like udev on OpenSolaris ? (A place / hook that is executed every time new hardware is added / detected) Have you tried to set autoreplace to on on your pool? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is ZFS internal reservation excessive?
Ext2/3 uses 5% by default for root's usage; 8% under FreeBSD for FFS. Solaris (10) uses a bit more nuance for its UFS: That reservation is to preclude users to exhaust diskspace in such a way that ever root can not login and solve the problem. No, the reservation in UFS/FFS is to keep the performance up. It will be harder and harder to find free space as the disk fills. Is is even more important for ZFS to be able to find free space as all writes need free space. The root-thing is just a side effect. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repeating scrub does random fixes
On Sun, Jan 10, 2010 at 16:40, Gary Gendel g...@genashor.com wrote: I've been using a 5-disk raidZ for years on SXCE machine which I converted to OSOL. The only time I ever had zfs problems in SXCE was with snv_120, which was fixed. So, now I'm at OSOL snv_111b and I'm finding that scrub repairs errors on random disks. If I repeat the scrub, it will fix errors on other disks. Occasionally it runs cleanly. That it doesn't happen in a consistent manner makes me believe it's not hardware related. That is a good indication for hardware related errors. Software will do the same thing every time but hardware errors are often random. But you are running an older version now, I would recommend an upgrade. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thin device support in ZFS?
On Wed, Dec 30, 2009 at 19:23, roland devz...@web.de wrote: making transactional,logging filesystems thin-provisioning aware should be hard to do, as every new and every changed block is written to a new location. so what applies to zfs, should also apply to btrfs or nilfs or similar filesystems. If that where a problem it would be a problem for UFS when you write new files... ZFS knows what blocks are free and that is all you need send to the disk system. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] repost - high read iops
On Tue, Dec 29, 2009 at 18:16, Brad bene...@yahoo.com wrote: @eric As a general rule of thumb, each vdev has the random performance roughly the same as a single member of that vdev. Having six RAIDZ vdevs in a pool should give roughly the performance as a stripe of six bare drives, for random IO. It sounds like we'll need 16 vdevs striped in a pool to at least get the performance of 15 drives plus another 16 mirrored for redundancy. If we are bounded in iops by the vdev, would it make sense to go with the bare minimum of drives (3) per vdev? Minimum is 1 drive per vdev. Minimum with redundancy is 2 if you use mirroring. You should do mirroring to get the best performance. This winds up looking similar to RAID10 in layout, in that you're striping across a lot of disks that each consists of a mirror, though the checksumming rules are different. Performance should also be similar, though it's possible RAID10 may give slightly better random read performance at the expense of some data quality guarantees, since I don't believe RAID10 normally validates checksums on returned data if the device didn't return an error. In normal practice, RAID10 and a pool of mirrored vdevs should benchmark against each other within your margin of error. That's interesting to know that with ZFS's implementation of raid10 it doesn't have checksumming built-in. He was talking about RAID10, not mirroring in ZFS. ZFS will always use checksums. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris
On Thu, Dec 24, 2009 at 04:36, Ian Collins i...@ianshome.com wrote: Mattias Pantzare wrote: I'm not sure how to go about it. Basically, how should i format my drives in FreeBSD, create a ZPOOL which can be imported into OpenSolaris. I'm not sure about BSD, but Solaris ZFS works with whole devices. So there isn't any OS specific formatting involved. I assume BSD does the same. That is not true. ZFS will use a EFI partition table with one partition if you give it the whole disk. An EFI label isn't OS specific formatting! It is. Not all OS will read an EFI label. Whole device on BSD is really the whole device. No partition table. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris
An EFI label isn't OS specific formatting! It is. Not all OS will read an EFI label. You misunderstood the concept of OS specific, I feel. EFI is indeed OS independent; however, that doesn't necesssarily imply that all OSs can read EFI disks. My Commodore 128D could boot CP/M but couldn't understand FAT32 - that doesn't mean that therefore FAT32 isn't OS independent either. PC partition table is also not OS specific and is OS independent. Most partition tables are OS independent, they ar often specified by how you boot the platform. On a PC EFI is very OS specific as most OS on that platform does not support EFI. EFI is platform independent for solaris. Solaris on sparc and solaris on PC uses different partition tables unless you use EFI as EFI is supported by Solaris on sparc and x86. This is manly a Solaris problem as there is no reason for Solaris on sparc to not read a pc partition table. The reason that EFI is used by default for zfs is that it is platform independent (and that it can handle bigger disks on sparc). Unless you have to boot from it, then it is very platform dependent... But the point was that ZFS on solaris have to have a partition table, so you must make a partition table on FreeBSD that solaris can read. It does not mater if the format is OS specific or not. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris
I'm not sure how to go about it. Basically, how should i format my drives in FreeBSD, create a ZPOOL which can be imported into OpenSolaris. I'm not sure about BSD, but Solaris ZFS works with whole devices. So there isn't any OS specific formatting involved. I assume BSD does the same. That is not true. ZFS will use a EFI partition table with one partition if you give it the whole disk. My guess is that you should put it in an EFI partition. But a normal partition should work. I would test this in virtualbox or vmware if I where you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iSCSI with Deduplication, is there any point?
I have already run into one little snag that I don't see any way of overcoming with my chosen method. I've upgraded to snv_129 with high hopes for getting the most out of deduplication. But using iSCSI volumes I'm not sure how I can gain any benefit from it. The volumes are a set size, Windows sees those volumes as that size despite any sort of block level deduplication or compression taking place on the other side of the iSCSI connection. I can't create volumes that add up to more than the original pool size from what I can tell. I can see the pool is saving space but it doesn't appear to become available to zfs volumes. Dedup being pretty new I haven't found much on the subject online. Create sparse volumes. -s when you create at volume or change the reservation on your volumes. Search for sparse in the zfs man-page. And don't run out of space. :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Something wrong with zfs mount
Is there better solution to this problem, what if the machine crashes? Crashes are abnormal conditions. If it crashes you should fix the problem to avoid future crashes and probably you will need to clear the pool dir hierarchy prior to import the pool. Are you serious? I really hope that you have nothing to do with OS development, or database development for that matter. A crash can be something as simple as the battery in a portable computer running out of power. A user should _not_ have to do anything manual to get the system going again. We got rid off manual fsck many years ago. Let's not move back to the stone age! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Boot Recovery after Motherboard Death
On Sat, Dec 12, 2009 at 18:08, Richard Elling richard.ell...@gmail.com wrote: On Dec 12, 2009, at 12:53 AM, dick hoogendijk wrote: On Sat, 2009-12-12 at 00:22 +, Moritz Willers wrote: The host identity had - of course - changed with the new motherboard and it no longer recognised the zpool as its own. 'zpool import -f rpool' to take ownership, reboot and it all worked no problem (which was amazing in itself as I had switched from AMD to Intel ...). Do I understand correctly if I read this as: OpenSolaris is able to switch between systems without reinstalling? Just a zfs import -f and everything runs? Wow, that would be an improvemment and would make things more like *BSD/linux. Solaris has been able to do that for 20+ years. Why do you think it should be broken now? Solaris has _not_ been able to do that for 20+ years. In fact Sun has always recommended a reinstall. You could do it if you really knew how, but it was not easy. If you switch between identical system it will of course work fine (before zfs that is, now you may have to import the pool on the new system). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
Thanks for the info. Glad to hear it's in the works, too. It is not in the works. If you look at the bug IDs in the bug database you will find no indication of work done on them. Paul 1:21pm, Mark J Musante wrote: On Thu, 24 Sep 2009, Paul Archer wrote: I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? Not yet. CR 6483179 covers this. On a related(?) note, is there a way to split an existing filesystem? Not yet. CR 6400399 covers this. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Real help
On Mon, Sep 21, 2009 at 13:34, David Magda dma...@ee.ryerson.ca wrote: On Sep 21, 2009, at 06:52, Chris Ridd wrote: Does zpool destroy prompt are you sure in any way? Some admin tools do (beadm destroy for example) but there's not a lot of consistency. No it doesn't, which I always found strange. Personally I always thought you should be queried for a zfs destroy, but add a -f option for things like scripts. Not sure if things can be changed now. You can import a destroyed pool, you can find them wiht zpool import -D. But the problem in this case was not zpool destroy, it was zpool create. zpool create will overwrite whatever was on the partition. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send older version?
On Wed, Sep 16, 2009 at 09:34, Erik Trimble erik.trim...@sun.com wrote: Carson Gaspar wrote: Erik Trimble wrote: I haven't see this specific problem, but it occurs to me thus: For the reverse of the original problem, where (say) I back up a 'zfs send' stream to tape, then later on, after upgrading my system, I want to get that stream back. Does 'zfs receive' support reading a version X stream and dumping it into a version X+N zfs filesystem? If not, frankly, that's a higher priority than the reverse. Your question confuses me greatly - am I missing something? zfs recv of a full stream will create a new filesystem of the appropriate version, which you may then zfs upgrade if you wish. And restoring incrementals to a different fs rev doesn't make sense. As long as support for older fs versions isn't removed from the kernel, this shouldn't ever be a problem. You are correct in that restoring a full stream creates the appropriate versioned filesystem. That's not the problem. The /much/ more likely scenario is this: (1) Let's say I have a 2008.11 server. I back up the various ZFS filesystems, with both incremental and full streams off to tape. (2) I now upgrade that machine to 2009.05, and upgrade all the zpool/zfs filesystems to the later versions, which is what most people will do. (3) Now, I need to get back a snapshot from before step #2. I don't want a full stream recovery, just a little bit of data. I now am in the situation that I have a current (active) ZFS filesystem which has a later version than the (incremental) stream I stored earlier. This is what a typical recover instance is. If I can't recover an incremental into an existing filesystem, it effectively means my backups are lost and useless. (not quite true, but it creates a huge headache.) Congratulations! You now know why you should use a backup program instead of zfs send for your backups. (There are more reasons than this) zfs send streams are not designed for backups! (But a backup program that understand zfs send streams and uses that instead of re cursing the filesystem would be nice...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send speed
On Tue, Aug 18, 2009 at 22:22, Paul Krauspk1...@gmail.com wrote: Posted from the wrong address the first time, sorry. Is the speed of a 'zfs send' dependant on file size / number of files ? We have a system with some large datasets (3.3 TB and about 35 million files) and conventional backups take a long time (using Netbackup 6.5 a FULL takes between two and three days, differential incrementals, even with very few files changing, take between 15 and 20 hours). We already use snapshots for day to day restores, but we need the 'real' backups for DR. Conventional backups can be faster that that! I have not used netbackup but you should be able to configure netbackup to run several backup streams in parallel. You may have to point netbackup to subdirs instead of the file system root. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] file change long - was zfs fragmentation
It would be nice if ZFS had something similar to VxFS File Change Log. This feature is very useful for incremental backups and other directory walkers, providing they support FCL. I think this tangent deserves its own thread. :) To save a trip to google... http://sfdoccentral.symantec.com/sf/5.0MP3/linux/manpages/vxfs/man1m/fcladm.html This functionality would come in very handy. It would seem that it isn't too big of a deal to identify the files that changed, as this type of data is already presented via zpool status -v when corruption is detected. http://docs.sun.com/app/docs/doc/819-5461/gbctx?a=view In fact ZFS has a good transaction log, maybe the issue is there isn't software out there yet that uses it. Where is that log? ZIL does not log all transactions and is cleared very quickly. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
Adding another pool and copying all/some data over to it would only a short term solution. I'll have to disagree. What is the point of a filesystem the can grow to such a huge size and not have functionality built in to optimize data layout? Real world implementations of filesystems that are intended to live for years/decades need this functionality, don't they? Our mail system works well, only the backup doesn't perform well. All the features of ZFS that make reads perform well (prefetch, ARC) have little effect. We think backup is quite important. We do quite a few restores of months old data. Snapshots help in the short term, but for longer term restores we need to go to tape. Your scalability problem may be in your backup solution. The problem is not how many Gb data you have but the number of files. It was a while since I worked with networker so things may have changed. If you are doing backups directly to tape you may have a buffering problem. By simply staging backups on disk we got at lot faster backups. Have you configured networker to do several simultaneous backups from your pool? You can do that by having several zfs on the same pool or tell networker to do backups one directory level down so that it thinks you have more file systems. And don't forget to play with the parallelism settings in networker. This made a huge difference for us on VxFS. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, Aug 8, 2009 at 20:20, Ed Spencered_spen...@umanitoba.ca wrote: On Sat, 2009-08-08 at 08:14, Mattias Pantzare wrote: Your scalability problem may be in your backup solution. We've eliminated the backup system as being involved with the performance issues. The servers are Solaris 10 with the OS on UFS filesystems. (In zfs terms, the pool is old/mature). Solaris has been patched to a fairly current level. Copying data from the zfs filesystem to the local ufs filesystem enjoys the same throughput as the backup system. The test was simple. Create a test filesystem on the zfs pool. Restore production email data to it. Reboot the server. Backup the data (29 minutes for a 15.8 gig of data). Reboot the server. Copy data from zfs to ufs using a 'cp -pr ...' command, which also took 29 minutes. Yes, that was expected. What hapens if you run two cp -pr at the same time? I am guessing that two cp will take almost the same time as one. If you get twice the performance from two cp then you will get twice the performance from doing two backups in parallel. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. You can move zpools between computers, you can't move individual file systems. Remember that there is a SAN involved. The disk array does not run Solaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, Aug 6, 2009 at 12:45, Ian Collinsi...@ianshome.com wrote: Mattias Pantzare wrote: If they accept virtualisation, why can't they use individual filesystems (or zvol) rather than pools? What advantage do individual pools have over filesystems? I'd have thought the main disadvantage of pools is storage flexibility requires pool shrink, something ZFS provides at the filesystem (or zvol) level. You can move zpools between computers, you can't move individual file systems. send/receive? :-) What is the downtime for doing a send/receive? What is the downtime for zpool export, reconfigure LUN, zpool import? And you still need to shrink the pool. Move a 100Gb application from server A to server B using send/receive and you will have 100Gb stuck on server A that you can't use on server B where you relay need it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Shrinking a zpool?
On Thu, Aug 6, 2009 at 16:59, Rossno-re...@opensolaris.org wrote: But why do you have to attach to a pool? Surely you're just attaching to the root filesystem anyway? And as Richard says, since filesystems can be shrunk easily and it's just as easy to detach a filesystem from one machine and attach to it from another, why the emphasis on pools? What filesystems are you talking about? A zfs pool can be attached to one and only one computer at any given time. All file systems in that pool are attached to the same computer. For once I'm beginning to side with Richard, I just don't understand why data has to be in separate pools to do this. All accounting for data and free blocks are done at the pool level. That is why you can share space between file systems. You could write code that made ZFS a cluster file system, maybe just for the pool but that is a lot of work and would require all attached computer so talk to each other. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No files but pool is full?
On Fri, Jul 24, 2009 at 09:33, Markus Koveromarkus.kov...@nebula.fi wrote: During our tests we noticed very disturbing behavior, what would be causing this? System is running latest stable opensolaris. Any other means to remove ghost files rather than destroying pool and restoring from backups? You may have snapshots, try: zfs list -t snapshot ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No files but pool is full?
On Fri, Jul 24, 2009 at 09:57, Markus Koveromarkus.kov...@nebula.fi wrote: r...@~# zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT rpool/ROOT/opensola...@install 146M - 2.82G - r...@~# Then it is probably some process that has a deleted file open. You can find those with: fuser -c /testpool But if you can't find the space after a reboot something is not right... -Original Message- From: pantz...@gmail.com [mailto:pantz...@gmail.com] On Behalf Of Mattias Pantzare Sent: 24. heinäkuuta 2009 10:56 To: Markus Kovero Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] No files but pool is full? On Fri, Jul 24, 2009 at 09:33, Markus Koveromarkus.kov...@nebula.fi wrote: During our tests we noticed very disturbing behavior, what would be causing this? System is running latest stable opensolaris. Any other means to remove ghost files rather than destroying pool and restoring from backups? You may have snapshots, try: zfs list -t snapshot ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The zfs performance decrease when enable the MPxIO round-robin
On Sun, Jul 19, 2009 at 08:25, lf yangno-re...@opensolaris.org wrote: Hi Guys I have a SunFire X4200M2 and the Xyratex RS1600 JBOD which I try to run the ZFS on it.But I found a problem: I set mpxio-disable=yes in the /kernel/drv/fp.conf to enable the MPxIO, I assume you mean mpxio-disable=no and set load-balance=round-robin in the /kernel/drv/scsi_vhci.conf enable the round-robin.The ZFS performance is very low, it is about 50% performance decrease. If I disable the MPxIO or just set load-balance=none, the performace is good to accept. I am confused. I google the website and found this :http://xyratex.mobi/pdfs/products/storage-systems/tips/TIP107_Configuring_Solaris10_x86_for_Xyratex_storage_1-0.pdf It is the document of Xyratex, they have same result but no explain.Maybe it is the SunMdi bug? How can I do this ? That is probably a limitation in the hardware. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about ZFS Incremental Send/Receive
O I feel like I understand what tar is doing, but I'm curious about what is it that ZFS is looking at that makes it a successful incremental send? That is, not send the entire file again. Does it have to do with how the application (tar in this example) does a file open, fopen(), and what mode is used? i.e. open for read, open for write, open for append. Or is it looking at a file system header, or checksum? I'm just trying to explain some observed behavior we're seeing during our testing. My proof of concept is to remote replicate these container files, which are created by a 3rd party application. ZFS knows what blocks where written since the first snapshot was taken. Filenames or type of open is not important. If you open a file and rewrite all blocks in that file with the same content all those block will be sent. If you rewrite 5 block only 5 blocks are sent (plus the meta data that where updated). The way it works is that all blocks have a time stamp. Block with a time stamp newer that the first snapshot will be sent. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How recoverable is an 'unrecoverable error'?
On Thu, Apr 16, 2009 at 11:38, Uwe Dippel udip...@gmail.com wrote: On Thu, Apr 16, 2009 at 1:05 AM, Fajar A. Nugraha fa...@fajar.net wrote: [...] Thanks, Fajar, et al. What this thread actually shows, alas, is that ZFS is rocket science. In 2009, one would expect a file system to 'just work'. Why would anyone want to have to 'status' it regularly, in case 'scrub' it, and if scrub doesn't do the trick (and still not knowing how serious the 'unrecoverable error' is - like in this case), 'clear' it, 'scrub' You don not have to status it regularly if you don't want to. Just as with any other file system. The difference is that you can. Just as you can and should do on your RAID system that you use with any other file system. If you do not have any problems ZFS will just work. If you have problems ZFS will śhow them to you much better than EXT3, FFS, UFS or other traditional filesystem. And often fix them for you. In many cases you would get corrupted data or have to run fsck for the same error on FFS/UFS. Scrub is much nicer than fsck, it is not easy to know the best answer to the questons that fsck will give you if you have a serious metadata problem on FFS/UFS. And yes, you can get into trouble even on OpenBSD. You also have to look at the complexity of your volume manager as ZFS is both a filesystem and volume manager in one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Notations in zpool status
A useful way to obtain the mount point for a directory is with the df' command. Just do 'df .' while in a directory to see where its filesystem mount point is: % df . Filesystem 1K-blocks Used Available Use% Mounted on Sun_2540/home/bfriesen 119677846 65811409 53866437 55% /home/bfriesen Nice, I see by default it appears the gnu/bin is put ahead of /bin in $PATH, or maybe some my meddling did it, but I see running the Solaris df several more and confusing entries too: /system/contract (ctfs ): 0 blocks 2147483609 files Add -h or -k to df: df -h . ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpools on USB zpool.cache
MP It would be nice to be able to move disks around when a system is MP powered off and not have to worry about a cache when I boot. You don't have to unless you are talking about share disks and importing a pool on another system while the original is powered off and the pool was not exported... For a configuration when disks are not shared among different systems you can move disk around without worrying about zpool.cache So , what you are saying is that I can power off my computer, move my zfs-disks to a different controller and then power on my computer and the zfs file systems will show up? zpool export is not always practical, especially on a root pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpools on USB zpool.cache
I suggest ZFS at boot should (multi-threaded) scan every disk for ZFS disks, and import the ones with the correct host name and with a import flag set, without using the cache file. Maybe just use the cache file for non-EFI disk/partitions, but without the storing the pool name, but you should be able to tell ZFS to do a full scan which includes partition disk. Full scans are a bad thing, because they cannot scale. This is one good reason why zpool.cache exists. What do you mean by cannot scale? Is it common to not use the majority of disks available to a system? If you taste all buses in parallel there should not be a scalability problem. What problem are you trying to solve? It would be nice to be able to move disks around when a system is powered off and not have to worry about a cache when I boot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpools on USB zpool.cache
On Mon, Mar 23, 2009 at 22:15, Richard Elling richard.ell...@gmail.com wrote: Mattias Pantzare wrote: I suggest ZFS at boot should (multi-threaded) scan every disk for ZFS disks, and import the ones with the correct host name and with a import flag set, without using the cache file. Maybe just use the cache file for non-EFI disk/partitions, but without the storing the pool name, but you should be able to tell ZFS to do a full scan which includes partition disk. Full scans are a bad thing, because they cannot scale. This is one good reason why zpool.cache exists. What do you mean by cannot scale? Is it common to not use the majority of disks available to a system? No, it is uncommon. So, what do you mean by cannot scale? If you taste all buses in parallel there should not be a scalability problem. Don't think busses, think networks. NB, busses are on the way out, most modern designs are point-to-point (SAS, SATA, USB) or networked (iSCSI, SAN, NAS). Do you want to scan the internet for LUNs? Du you know how a device is made available to zfs, cache or no cache? All busses has to be probed when you do a reconfigure boot or run devfsadm. zfs will only se the devices that you se in /dev. If I can run zpool import in a reasonable amount of time the cahe is not needed. Are there cases where I can't run zpool import? What problem are you trying to solve? It would be nice to be able to move disks around when a system is powered off and not have to worry about a cache when I boot. Why are you worrying about it? If I put my disks on a diffrent controler zfs won't find them when I boot. That is bad. It is also an extra level of complexity. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpools on USB zpool.cache
On Tue, Mar 24, 2009 at 00:21, Tim t...@tcsac.net wrote: On Mon, Mar 23, 2009 at 4:45 PM, Mattias Pantzare pantz...@gmail.com wrote: If I put my disks on a diffrent controler zfs won't find them when I boot. That is bad. It is also an extra level of complexity. Correct me if I'm wrong, but wading through all of your comments, I believe what you would like to see is zfs automatically scan if the cache is invalid vs. requiring manual intervention, no? That would be nice, but if there really is a problem with a scan that would not be good as that would trigger the very problem that the cache is supposed to avoid. But I don't understand why we need it in the first place, except as a list of pools to import at boot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Nexsan SATABeast and ZFS
On Tue, Mar 10, 2009 at 23:57, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Tue, 10 Mar 2009, Moore, Joe wrote: As far as workload, any time you use RAIDZ[2], ZFS must read the entire stripe (across all of the disks) in order to verify the checksum for that data block. This means that a 128k read (the default zfs blocksize) requires a 32kb read from each of 6 disks, which may include a relatively slow seek to the relevant part of the spinning rust. So for random I/O, even though the data is striped This is not quite true. Raidz2 is not the same as RAID6. ZFS has an independent checksum for its data blocks. The traditional RAID type technology is used to repair in case data corruption is detected. What he is saying is true. RAIDZ will spread blocks on all disks, and therefore requires full stripe reads to read the block. The good thing is that it will always do full stripe writes so writes are fast. RAID6 has no blocks so you can read any sector by reading from 1 disk, you only have to read from the other disks in the stipe in case of a fault. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs streams data corruption
On Tue, Feb 24, 2009 at 19:18, Nicolas Williams nicolas.willi...@sun.com wrote: On Mon, Feb 23, 2009 at 10:05:31AM -0800, Christopher Mera wrote: I recently read up on Scott Dickson's blog with his solution for jumpstart/flashless cloning of ZFS root filesystem boxes. I have to say that it initially looks to work out cleanly, but of course there are kinks to be worked out that deal with auto mounting filesystems mostly. The issue that I'm having is that a few days after these cloned systems are brought up and reconfigured they are crashing and svc.configd refuses to start. When you snapshot a ZFS filesystem you get just that -- a snapshot at the filesystem level. That does not mean you get a snapshot at the _application_ level. Now, svc.configd is a daemon that keeps a SQLite2 database. If you snapshot the filesystem in the middle of a SQLite2 transaction you won't get the behavior that you want. In other words: quiesce your system before you snapshot its root filesystem for the purpose of replicating that root on other systems. That would be a bug in ZFS or SQLite2. A snapshoot should be an atomic operation. The effect should be the same as power fail in the meddle of an transaction and decent databases can cope with that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
Right, well I can't imagine it's impossible to write a small app that can test whether or not drives are honoring correctly by issuing a commit and immediately reading back to see if it was indeed committed or not. Like a zfs test cXtX. Of course, then you can't just blame the hardware everytime something in zfs breaks ;) A read of data in the disk cache will be read from the disk cache. You can't tell the disk to ignore its cache and read directly from the plater. The only way to test this is to write and the remove the power from the disk. Not easy in software. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: unreliable for professional usage?
What filesystem likes it when disks are pulled out from a LIVE filesystem? Try that on UFS and you're f** up too. Pulling a disk from a live filesystem is the same as pulling the power from the computer. All modern filesystems can handle that just fine. UFS with logging on do not even need fsck. Now if you have a disk that lies and don't write to the disk when it should all bets are off. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Max size of log device?
On Sun, Feb 8, 2009 at 22:12, Vincent Fox vincent_b_...@yahoo.com wrote: Thanks I think I get it now. Do you think having log on a 15K RPM drive with the main pool composed of 10K RPM drives will show worthwhile improvements? Or am I chasing a few percentage points? I don't have money for new hardware SSD. Just recycling some old components here are and there are a few 15K RPM drives on the shelf I thought I could throw strategically into the mix. Application will likely be NFS serving. Might use same setup for a list-serve system which does have local storage for archived emails etc. The 3310 has battery backed write cache, that is faster than any disk. You might get more from the cache if you use it only for the log. The RPM of the disks used for the log is not important when you have a RAM write cache in front of the disk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Alternatives to increading the number of copies on a ZFS snapshot
On Sat, Feb 7, 2009 at 19:33, Sriram Narayanan sri...@belenix.org wrote: How do I set the number of copies on a snapshot ? Based on the error message, I believe that I cannot do so. I already have a number of clones based on this snapshot, and would like the snapshot to have more copies now. For higher redundancy and peace of mind, what alternatives do I have ? You have to set the number of copies before you write the file. Snapshots won't write anything so you can't change that on snapshots. Your best option (and only if you value your data) is mirroring. (zpool attach) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why does a file on a ZFS change sizes?
On Tue, Feb 3, 2009 at 20:55, SQA sqa...@gmail.com wrote: I set up a ZFS system on a Linux x86 box. [b] zpool history History for 'raidpool': 2009-01-15.17:12:48 zpool create -f raidpool raidz1 c4t1d0 c4t2d0 c4t3d0 c4t4d0 c4t5d0 2009-01-15.17:15:54 zfs create -o mountpoint=/vol01 -o sharenfs=on -o canmount=on raidpool/vol01[/b] I did not make the export (vol01) into a volume. I know you can set default blocksizes when you create volumes but you cannot make them exportable NFS exports. Thus, I did not make the NFS exports into volumes and I did not specify a blocksize on the NFS exports. I am assuming that vol01 is using variable blocksizes because I did not explicitly specify a blocksize. Thus, my assumption is that ZFS would use a blocksize that is the the smallest power of 2 and the smallest blocksize is 512 bytes while the biggest would be 128k I use the stat command to check the filesize, the blocksize, and the # of blocks. I created a file that is exactly 512 bytes in size on /vol01 I do the following stat command: [b]stat --printf %n %b %B %s %o\n * [/b] The %b is the number of blocks used, %B is the blocksize. The number of blocks changes after a few minutes after the file is created: # stat --printf %n %b %B %s %o\n * file.512 [b]1[/b] 512 512 4096 # stat --printf %n %b %B %s %o\n * file.512 [b]1[/b] 512 512 4096 # stat --printf %n %b %B %s %o\n * file.512 [b]1[/b] 512 512 4096 Q1) Why does the # of blocks change after a few minutes? And why are we using 3 blocks when the file is only 512 bytes in size (in other words, only 1 block is needed)??? This makes it seem that the minimum blocksize isn't 512 bytes but 1536 bytes. You probably have a cut'n'paste error as all block numbers are 1 in your example. My guess is that the number of blocks are updated every 5 seconds. Q2) Is there a way to force ZFS to use 512 blocksizes? That means that if a file is 512 bytes in size or smaller, it should only use 512 bytes -- the number of blocks it uses should be 1. It is, or at least is is on my solaris system. But it has to store metadata in one block. Try creating a 600 byte file and it should use one more 512 byte block. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on partitions
On Wed, Jan 14, 2009 at 20:03, Tim t...@tcsac.net wrote: On Tue, Jan 13, 2009 at 6:26 AM, Brian Wilson bfwil...@doit.wisc.edu wrote: Does creating ZFS pools on multiple partitions on the same physical drive still run into the performance and other issues that putting pools in slices does? Is zfs going to own the whole drive or not? The *issue* is that zfs will not use the drive cache if it doesn't own the whole disk since it won't know whether or not it should be flushing cache at any given point in time. ZFS will always flush the disk cache at appropriate times. If ZFS thinks that is alone it will turn the write cache on the disk on. It could cause corruption if you had UFS and zfs on the same disk. It is safe to have UFS and ZFS on the same disk and it has always been safe. Write cache on the disk is not safe for UFS, that is why zfs will turn it on only if it is alone. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on partitions
ZFS will always flush the disk cache at appropriate times. If ZFS thinks that is alone it will turn the write cache on the disk on. I'm not sure if you're trying to argue or agree. If you're trying to argue, you're going to have to do a better job than zfs will always flush disk cache at appropriate times, because that's outright false in the case where zfs doesn't own the entire disk. That flush may very well produce an outcome zfs could never pre-determine. You can send flush cache commands to the disk how often you wish, the only thing that happens is that the disk writes dirty sectors from its cache to the disk. That is, no writes will be done that should not have happend at some time anyway. This will not harm UFS or any other user of the disk. Other users can issue flush cache command without affecting ZFS. Please read up on what the flus cache command does! ZFS will send flush cache commands even when it is not alone on the disk. There are many disks with write cache on by default. There have even been disks that won't turn it off even if told so. It could cause corruption if you had UFS and zfs on the same disk. It is safe to have UFS and ZFS on the same disk and it has always been safe. ***unless you turn on write cache. And without write cache, performance sucks. Hence me answering the OP's question. There was no mention of cache at all in the question. It was not clear that this sentence reffered to your own text, hence the misunderstanding: It could cause corruption if you had UFS and zfs on the same disk. I read that as a separate statement. As to the performance sucks, that is putting it a bit harsh, you will get better performance with write cache but the system will be perfectly usable without. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mount ZFS pool on different system
Now I want to mount that external zfs hdd on a different notebook running solaris and supporting zfs as well. I am unable to do so. If I'd run zpool create, it would wipe out my external hdd what I of course want to avoid. So how can I mount a zfs filesystem on a different machine without destroying it? zpool import ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?
On Tue, Dec 30, 2008 at 11:30, Carsten Aulbert carsten.aulb...@aei.mpg.de wrote: Hi Marc, Marc Bevand wrote: Carsten Aulbert carsten.aulbert at aei.mpg.de writes: In RAID6 you have redundant parity, thus the controller can find out if the parity was correct or not. At least I think that to be true for Areca controllers :) Are you sure about that ? The latest research I know of [1] says that although an algorithm does exist to theoretically recover from single-disk corruption in the case of RAID-6, it is *not* possible to detect dual-disk corruption with 100% certainty. And blindly running the said algorithm in such a case would even introduce corruption on a third disk. Well, I probably need to wade through the paper (and recall Galois field theory) before answering this. We did a few tests in a 16 disk RAID6 where we wrote data to the RAID, powered the system down, pulled out one disk, inserted it into another computer and changed the sector checksum of a few sectors (using hdparm's utility makebadsector). The we reinserted this into the original box, powered it up and ran a volume check and the controller did indeed find the corrupted sector and repaired the correct one without destroying data on another disk (as far as we know and tested). You are talking about diffrent types of errors. You tested errors that the disk can detect. That is not a problem on any RAID, that is what it is designed for. He was talking about errors that the disk can't detect (errors introduced by other parts of the system, writes to the wrong sector or very bad luck). You can simulate that by writing diffrent data to the sector, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
Interestingly, the size fields under top add up to 950GB without getting to the bottom of the list, yet it shows NO swap being used, and 150MB free out of 768 of RAM! So how can the size of the existing processes exceed the size of the virtual memory in use by a factor of 2, and the size of total virtual memory by a factor of 1.5? This is not the resident size - this is the total size! Size is how much address space the process has allocated. Part of that is executables and shared libraries (they are backed by the file, not by swap). A large portion of that is shared, the same memory is used by many processes. Processes can also allocate shared memory by other means. Memory is not a big problem for ZFS, address space is. You may have to give the kernel more address space on 32-bit CPUs. eeprom kernelbase=0x8000 This will reduce the usable address space of user processes though. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
If the critical working set of VM pages is larger than available memory, then the system will become exceedingly slow. This is indicated by a substantial amount of major page fault activity. Since disk is 10,000 times slower than RAM, major page faults can really slow things down dramatically. Imagine what happens if ZFS or an often-accessed part of the kernel is not able to fit in available RAM. ZFS and most of the kernel is locked in physical memory. Swap is never used for ZFS. In this case (NFS) everything is done in kernel. working set can not be larger than available memory. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
On Sat, Nov 29, 2008 at 22:19, Ray Clark [EMAIL PROTECTED] wrote: Pantzer5: Thanks for the top size explanation. Re: eeprom kernelbase=0x8000 So this makes the kernel load at the 2G mark? What is the default, something like C00... for 3G? Yes on both questions (i have not checked the hex conversions). This might not be your problem, but it is easy to test. My symptom was that zpool scrub made the computer go slower and slower and finally just stop. But this was a long time ago so this might not be a problem today. Are PCI and AGP space in there too, such that kernel space is 4G - (kernelbase + PCI_Size + AGP_Size) ? (Shot in the dark)? No. This is virtual memory. The big difference in memory usage between UFS and ZFS is that ZFS will have all data it caches mapped in the kernel address space. UFS leaves data unmapped. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
On Sun, Nov 30, 2008 at 00:04, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Sat, 29 Nov 2008, Mattias Pantzare wrote: The big difference in memory usage between UFS and ZFS is that ZFS will have all data it caches mapped in the kernel address space. UFS leaves data unmapped. Another big difference I have heard about is that Solaris 10 on x86 only uses something like 64MB of filesystem caching by default for UFS. This is different than SPARC where the caching is allowed to grow. I am not sure if OpenSolaris maintains this arbitrary limit for x86. That is not true. I doubt that any Solaris version had that type of limit. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow death-spiral with zfs gzip-9 compression
On Sun, Nov 30, 2008 at 01:10, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Sun, 30 Nov 2008, Mattias Pantzare wrote: Another big difference I have heard about is that Solaris 10 on x86 only uses something like 64MB of filesystem caching by default for UFS. This is different than SPARC where the caching is allowed to grow. I am not sure if OpenSolaris maintains this arbitrary limit for x86. That is not true. I doubt that any Solaris version had that type of limit. What is what I heard Jim Mauro tell us. I recall feeling a bit disturbed when I heard it. If it is true, perhaps it applies only to x86 32 bits, which has obvious memory restrictions. I recall that he showed this parameter via DTrace. However on my Solaris 10U5 AMD64 system I see this limit: 429293568 maximum memory allowed in buffer cache (bufhwm) which seems much higher than 64MB. The Solaris Tuning And Tools book says that by default the buffer cache is allowed to grow to 2% of physical memory. Obtain the value via sysdef | grep bufhwm My 32-bit Belenix system running under VirtualBox with 2GB allocated to the VM reports a value of 41,762,816. That is only a small part of the cache used for file system metadata. File data caching is integrated in the normal memory management. http://docs.sun.com/app/docs/doc/817-0404/chapter2-37?a=view ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] continuous replication
I think you're confusing our clustering feature with the remote replication feature. With active-active clustering, you have two closely linked head nodes serving files from different zpools using JBODs connected to both head nodes. When one fails, the other imports the failed node's pool and can then serve those files. With remote replication, one appliance sends filesystems and volumes across the network to an otherwise separate appliance. Neither of these is performing synchronous data replication, though. That is _not_ active-active, that is active-passive. If you have a active-active system I can access the same data via both controllers at the same time. I can't if it works like you just described. You can't call it active-active just because different volumes are controlled by different controllers. Most active-passive RAID controllers can do that. The data sheet talks about active-active clusters, how does that work? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] continuous replication
On Sat, Nov 15, 2008 at 00:46, Richard Elling [EMAIL PROTECTED] wrote: Adam Leventhal wrote: On Fri, Nov 14, 2008 at 10:48:25PM +0100, Mattias Pantzare wrote: That is _not_ active-active, that is active-passive. If you have a active-active system I can access the same data via both controllers at the same time. I can't if it works like you just described. You can't call it active-active just because different volumes are controlled by different controllers. Most active-passive RAID controllers can do that. The data sheet talks about active-active clusters, how does that work? What the Sun Storage 7000 Series does would more accurately be described as dual active-passive. This is ambiguous in the cluster market. It is common to describe HA clusters where each node can be offering services concurrently, as active/active, even though the services themselves are active/passive. This is to appease folks who feel that idle secondary servers are a bad thing. But this product is not in the cluster market. It is in the storage market. By your definition virtually all dual controller RAID boxes are active/active. You should talk to Veritas so that they can change all their documentation... Active/active and active/passive has a real technical meaning, don't let marketing destroy that! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Fit-PC Slim?
Planning to stick in a 160-gig Samsung drive and use it for lightweight household server. Probably some Samba usage, and a tiny bit of Apache RADIUS. I don't need it to be super-fast, but slow as watching paint dry won't You know that you need a minimum of 2 disks to form a (mirrored) pool with ZFS? A pool with no redundancy is not a good idea! My pools with no redundancy is working very fine. Redundancy is better but you can certainly run without. You should do backups in all cases. work either. Just curious if anyone else has tried something similar everything I read says ZFS wants 1-gig RAM but don't say what size of penalty I would pay for having less. I could run Linux on it of course but now prefer to remain free of the tyranny of fsck. I don't think that there is enough CPU horse-power on this platform to run OpenSolaris - and you need approx 768Kb (3/4 of a Gb) of RAM just to install it. After that OpenSolaris will only increase in size over time To try to run it as a ZFS server would be madness - worse than watching paint dry. I don't know about the CPU but 1Gb RAM on a home server works fine. I even have a 256Mb debian in virtualbox on my server with 1Gb RAM. Just turn X11 off. (/usr/dt/bin/dtconfig -d) The installation have a higher RAM requirement than the installed system as you can't have swap for the installation. Before ZFS solaris has improved its RAM usage for every release. Workstations are a different matter. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recommendations on adding vdev to raidz zpool
On Sun, Oct 26, 2008 at 5:31 AM, Peter Baumgartner [EMAIL PROTECTED] wrote: I have a 7x150GB drive (+1 spare) raidz pool that I need to expand. There are 6 open drive bays, so I bought 6 300GB drives and went to add them as a raidz vdev to the existing zpool, but I didn't realize the raidz vdevs needed to have the same number of drives. (why is that?) They do not have to have the same number of drivers, you can even mix raidz and plain disks. That is more a recommendation. Add -f to the command. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recommendations on adding vdev to raidz zpool
On Sun, Oct 26, 2008 at 3:00 PM, Peter Baumgartner [EMAIL PROTECTED] wrote: On Sun, Oct 26, 2008 at 4:02 AM, Mattias Pantzare [EMAIL PROTECTED] wrote: On Sun, Oct 26, 2008 at 5:31 AM, Peter Baumgartner [EMAIL PROTECTED] wrote: I have a 7x150GB drive (+1 spare) raidz pool that I need to expand. There are 6 open drive bays, so I bought 6 300GB drives and went to add them as a raidz vdev to the existing zpool, but I didn't realize the raidz vdevs needed to have the same number of drives. (why is that?) They do not have to have the same number of drivers, you can even mix raidz and plain disks. That is more a recommendation. Add -f to the command. What is the risk of creating a pool consisting of two raidz vdevs that don't have the same number of disks? Slightly different reliability and performance on different parts of the pool. Nothing to worry about in your case. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver speed.
On Wed, Oct 8, 2008 at 10:29 AM, Ross [EMAIL PROTECTED] wrote: bounce Can anybody confirm how bug 6729696 is going to affect a busy system running synchronous NFS shares? Is the sync activity from NFS going to be enough to prevent resilvering from ever working, or have I mis-understood this bug? A synchronous write will not trigger a sync. The ZIL is used for synchronous writes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do I add my own Attributes to a ZAP object, and then search on it?
On Sun, Sep 14, 2008 at 12:37 AM, Anon K Adderlan [EMAIL PROTECTED] wrote: How do I add my own Attributes to a ZAP object, and then search on it? For example, I want to be able to attach the gamma value to each image, and be able to search and sort them based on it. From reading the on disk format documentation I've been led to believe that this would be done through ZAP objects, but what I really need is a reference to the C/C++ or Shell API, and whoever put together the Administration Guide has for some reason decided that the code segments should be in a white font on a white background. You can't access ZFS internals from applications. A database sounds like the right solution, but you could use extended attributes. This is from What's New in the Solaris 9 8/03 Operating Environment, http://docs.sun.com/app/docs/doc/817-0493/6mg9pruau?a=view Extended File Attributes The UFS, NFS, and TMPFS file systems have been enhanced to include extended file attributes. Application developers can associate specific attributes to a file. For example, a developer of a file management application for a windowing system might choose to associate a display icon with a file. Extended attributes are logically represented as files within a hidden directory that is associated with the target file. You can use the extended file attribute API and a set of shell commands to add and manipulate file system attributes. See the fsattr(5), openat(2), and runat(1) man pages for more information. Many file system commands in Solaris provide an attribute-aware option that you can use to query, copy, modify, or find file attributes. For more information, see the specific file system command in the man pages. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk failure,
2008/8/27 Richard Elling [EMAIL PROTECTED]: Either the drives should be loaded with special firmware that returns errors earlier, or the software LVM should read redundant data and collect the statistic if the drive is well outside its usual response latency. ZFS will handle this case as well. How is ZFS handling this? Is there a timeout in ZFS? Not for this case, but if configured to manage redundancy, ZFS will read redundant data from alternate devices. No, ZFS will not, ZFS waits for the device driver to report an error, after that it will read from alternate devices. ZFS could detect that there is probably a problem with the device and read from an alternate device much faster while it waits for the device to answer. You can't do this at any other level than ZFS. One thing other LVM's seem like they may do better than ZFS, based on not-quite-the-same-scenario tests, is not freeze filesystems unrelated to the failing drive during the 30 seconds it's waiting for the I/O request to return an error. This is not operating in ZFS code. In what way is freezing a ZFS filesystem not operating in ZFS code? Notice that he wrote filesystems unrelated to the failing drive. At the ZFS level, this is dictated by the failmode property. But that is used after ZFS has detected an error? I find comparing unprotected ZFS configurations with LVMs using protected configurations to be disingenuous. I don't think anyone is doing that. What is your definition of unrecoverable reads? I wrote data, but when I try to read, I don't get back what I wrote. There is only one case where ZFS is better, that is when wrong data is returned. All other cases are managed by layers below ZFS. Wrong data returned is not normaly called unrecoverable reads. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk failure,
2008/8/26 Richard Elling [EMAIL PROTECTED]: Doing a good job with this error is mostly about not freezing the whole filesystem for the 30sec it takes the drive to report the error. That is not a ZFS problem. Please file bugs in the appropriate category. Who's problem is it? It can't be the device driver as that has no knowledge of zfs filesystems or redundancy. Either the drives should be loaded with special firmware that returns errors earlier, or the software LVM should read redundant data and collect the statistic if the drive is well outside its usual response latency. ZFS will handle this case as well. How is ZFS handling this? Is there a timeout in ZFS? One thing other LVM's seem like they may do better than ZFS, based on not-quite-the-same-scenario tests, is not freeze filesystems unrelated to the failing drive during the 30 seconds it's waiting for the I/O request to return an error. This is not operating in ZFS code. In what way is freezing a ZFS filesystem not operating in ZFS code? Notice that he wrote filesystems unrelated to the failing drive. In terms of FUD about ``silent corruption'', there is none of it when the drive clearly reports a sector is unreadable. Yes, traditional non-big-storage-vendor RAID5, and all software LVM's I know of except ZFS, depend on the drives to report unreadable sectors. And, generally, drives do. so let's be clear about that and not try to imply that the ``dominant failure mode'' causes silent corruption for everyone except ZFS and Netapp users---it doesn't. In my field data, the dominant failure mode for disks is unrecoverable reads. If your software does not handle this case, then you should be worried. We tend to recommend configuring ZFS to manage data redundancy for this reason. He is writing that all software LVM's will handle unrecoverable reads. What is your definition of unrecoverable reads? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corrupt zfs stream? checksum mismatch
2008/8/13 Jonathan Wheeler [EMAIL PROTECTED]: So far we've established that in this case: *Version mismatches aren't causing the problem. *Receiving across the network isn't the issue (because I have the exact same issue restoring the stream directly on my file server). *All that's left was the initial send, and since zfs guarantees end to end data integrity, it should have been able to deal with any network possible randomness in the middle (zfs on both ends) - or at absolute worst, the zfs send command should have failed, if it encountered errors. Seems fair, no? So, is there a major bug here, or at least an oversight in the zfs send part of the code? Does zfs send not do checksumming, or, verification after sending? I'm not sure how else to interpret this data. zfs send can't do any verification after sending. It is sending to a pipe, it does not know that it is writing to a file. ZFS receive can verify the data, as you know. ZFS is not involved in moving the data over the network when you are using NFS. There are many places where data can get corrupt even when you are using ZFS. Non ECC memory is one example. There might be a bug in zfs but that is hard to check as you can't reproduce the problem. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corrupt zfs stream? checksum mismatch
2008/8/10 Jonathan Wheeler [EMAIL PROTECTED]: Hi Folks, I'm in the very unsettling position of fearing that I've lost all of my data via a zfs send/receive operation, despite ZFS's legendary integrity. The error that I'm getting on restore is: receiving full stream of faith/[EMAIL PROTECTED] into Z/faith/[EMAIL PROTECTED] cannot receive: invalid stream (checksum mismatch) Background: I was running snv_91, and decided to upgrade to snv_95 converting to the much awaited zfs-root in the process. You could try to restore on a snv_91 system. zfs send streams is not for backups. This is from the zfs man page: The format of the stream is evolving. No backwards com- patibility is guaranteed. You may not be able to receive your streams on future versions of ZFS. Or the file was corrupted when you transfered it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Block unification in ZFS
Therefore, I wonder if something like block unification (which seems to be an old idea, though I know of it primarily through Venti[1]) would be useful to ZFS. Since ZFS checksums all of the data passing through it, it seems natural to hook those checksums and have a hash table from checksum to block pointer. It would seem that one could write a shim vdev which used the ZAP and a host vdev to store this hash table and could inform the higher layers that, when writing a block, that they should simply alias an earlier block (and increment its reference count -- already there for snapshots -- appropriately; naturally if the block's reference count becomes zero, its checksum should be deleted from the hash). De duplication has been discussed many times, but it is not trivial to implement. There are no reference counts for blocks.Blocks have a time stamp that is compared to the creation time of snapshots to work out if it can be freed when you destroy a snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed
4. While reading an offline disk causes errors, writing does not! *** CAUSES DATA LOSS *** This is a big one: ZFS can continue writing to an unavailable pool. It doesn't always generate errors (I've seen it copy over 100MB before erroring), and if not spotted, this *will* cause data loss after you reboot. I discovered this while testing how ZFS coped with the removal of a hot plug SATA drive. I knew that the ZFS admin tools were hanging, but that redundant pools remained available. I wanted to see whether it was just the ZFS admin tools that were failing, or whether ZFS was also failing to send appropriate error messages back to the OS. This is not unique for zfs. If you need to know that your writes has reached stable store you have to call fsync(). It is not enough to close a file. This is true even for UFS, but UFS won't delay writes for all operations so you will notice faster. But you will still loose data. I have been able to undo rm -rf / on a FreeBSD system by pulling the power cord before it wrote the changes... Databases use fsync (or similar) before they close a transaction, that one of the reasons that databases like hardware write caches. cp will not. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] copying a ZFS
2008/7/20 James Mauro [EMAIL PROTECTED]: Is there an optimal method of making a complete copy of a ZFS, aside from the conventional methods (tar, cpio)? We have an existing ZFS that was not created with the optimal recordsize. We wish to create a new ZFS with the optimal recordsize (8k), and copy all the data from the existing ZFS to the new ZFS. Obviously, we know how to do this using conventional utilities and commands. Is there a ZFS-specific method for doing that beats the heck of out tar, etc? (RTFM indicates there is not; I R'd the FM :^). Use zfs send | zfs receive if you wish to keep your snapshots or if you will be doing the copy several times. You can send just the changes between two snapshots. (zfs send is in the FM :-) This may or may not be a copy to the same zpool, and I'd also be interested in knowing of that makes a difference (I do not think it does)? It does not. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confusion with snapshot send-receive
2008/6/21 Andrius [EMAIL PROTECTED]: Hi, there is a small confusion with send receive. zfs andrius/sounds was snapshoted @421 and should be copied to new zpool beta that on external USB disk. After /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta or usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv beta/sounds answer come ssh: host1: node name or service name not known What has been done bad? There is no computer named host1? That is a ssh error message, start by checking the ssh part by itself. If both zpools are on the same computer you don't have to use ssh. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
The problem with that argument is that 10.000 users on one vxfs or UFS filesystem is no problem at all, be it /var/mail or home directories. You don't even need a fast server for that. 10.000 zfs file systems is a problem. So, if it makes you happier, substitute mail with home directories. If you feel strongly, please pile onto CR 6557894 http://bugs.opensolaris.org/view_bug.do?bug_id=6557894 If we continue to talk about it on the alias, we will just end up finding ways to solve the business problem using available technologies. If I need to count useage I can use du. But if you can implement space usage info on a per-uid basis you are not far from quota per uid... A single file system serving 10,000 home directories doesn't scale either, unless the vast majority are unused -- in which case it is a practical problem for much less than 10,000 home directories. I think you will find that the people who scale out have a better long-term strategy. We have a file system (vxfs) that is serving 30,000 home directories. Yes, most of those are unused, but we still have to have them as we don't know when the student will use it. If this where zfs we would have to create 30,000 filesystem. Every file system has a cost in RAM and in performance. So, in ufs or vxfs unused home directories costs close to nothing. In zfs they have a very real cost. The limitations of UFS do become apparent as you try to scale to the size permitted with ZFS. For example, the largest UFS file system supported is 16 TBytes, or 1/4 of a thumper. So if you are telling me that you are serving 10,000 home directories in a 16 TByte UFS file system with quotas (1.6 GBytes/user? I've got 16 GBytes in my phone :-), then I will definitely buy you a beer. And aspirin. I'll bring a calendar so we can measure the fsck time when the log can't be replayed. Actually, you'd probably run out of inodes long before you filled it up. I wonder how long it would take to run quotacheck? But I digress. Let's just agree that UFS won't scale well and the people who do serve UFS as home directories for large populations tend to use multiple file systems. We have 30,000 accounts on a 1TByte file system. If we need to we could make 16 1Tb file systems, no problem. But 30,000 file systems on one server? Maybe not so good... If we could lower the cost of a zfs file system to zero all would be good for my usages. The best thing to do is probably AFS on ZFS. AFS can handle many volumes (file systems) and ZFS is very good at the storage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
2008/6/6 Richard Elling [EMAIL PROTECTED]: Richard L. Hamilton wrote: A single /var/mail doesn't work well for 10,000 users either. When you start getting into that scale of service provisioning, you might look at how the big boys do it... Apple, Verizon, Google, Amazon, etc. You should also look at e-mail systems designed to scale to large numbers of users which implement limits without resorting to file system quotas. Such e-mail systems actually tell users that their mailbox is too full rather than just failing to deliver mail. So please, when we start having this conversation again, lets leave /var/mail out. I'm not recommending such a configuration; I quite agree that it is neither scalable nor robust. I was going to post some history of scaling mail, but I blogged it instead. http://blogs.sun.com/relling/entry/on_var_mail_and_quotas -- richard The problem with that argument is that 10.000 users on one vxfs or UFS filesystem is no problem at all, be it /var/mail or home directories. You don't even need a fast server for that. 10.000 zfs file systems is a problem. So, if it makes you happier, substitute mail with home directories. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
A single /var/mail doesn't work well for 10,000 users either. When you start getting into that scale of service provisioning, you might look at how the big boys do it... Apple, Verizon, Google, Amazon, etc. You [EMAIL PROTECTED] /var/mail echo *|wc 1 20632 185597 [EMAIL PROTECTED] /var/mail /usr/platform/sun4u/sbin/prtdiag System Configuration: Sun Microsystems sun4u Sun Enterprise 220R (2 X UltraSPA RC-II 450MHz) System clock frequency: 113 MHz Memory size: 2048 Megabytes So, 10,000 mailaccounts on a new server is not a problem. Of course depending on usage patterns. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] help with a BIG problem, can't import my zpool anymore
2008/5/24 Hernan Freschi [EMAIL PROTECTED]: I let it run while watching TOP, and this is what I got just before it hung. Look at free mem. Is this memory allocated to the kernel? can I allow the kernel to swap? No, the kernel will not use swap for this. But most of the memory used by the kernel is probably in caches that should release memory when needed. Is this a 32 or 64 bit system? ZFS will sometimes use all kernel address space on a 32-bit system. You can give the kernel more address space with this command (only on 32-bit system): eeprom kernelbase=0x5000 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz in zfs questions
2. in a raidz do all the disks have to be the same size? I think this one has been answered, but I'll add/ask this: I'm not sure what would happen if you had 3x 320gb and 3x 1tb in a 6 disk raidz array. I know you'd have a 6 * 320gb array, but I don't know if the unused space on the 3x 1tb could be made into another raidz array. If zfs is limited in this way, you could work around it by making 320gb and 1tb-320gb partitions on the 1tb disks. You have to use partitions. FYI, if you have some data that doesn't really need to be redundant, you can simplify your setup with block copies and maybe get closer to raidz efficiency than a straight mirror. Then you could easily replace any disk any time with a bigger one, or add a disk any time. To do this, just make one file system with copies=2 and store important stuff there. Store less important stuff in a copies=1 file system. That is bad advice. Both copies may end up on the same disk. zfs will try to put your copies on diffrent disks, but it won't tell you if it can't. Use mirror or RAIDZ if your data is important! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz in zfs questions
2008/3/7, Paul Kraus [EMAIL PROTECTED]: On Thu, Mar 6, 2008 at 8:56 PM, MC [EMAIL PROTECTED] wrote: 1. In zfs can you currently add more disks to an existing raidz? This is important to me as i slowly add disks to my system one at a time. No, but solaris and linux raid5 can do this (in linux, grow with mdadm). Be aware that growing an SLVM / DiskSuite RAID5 doesn't really grow the RAID5 set, it just concats more components onto the end of it. If those components are mirrors then you still have redundancy, if they aren't then you don't for that data that ends up out there. I don't consider growing RIAD5 this was with DiskSuite a good way to go, short or long term. No, that is not how it works. If you grow a RAID5 set the new data is concatenated to the original RAID5, but it IS protected by the parity in the RAID5 set. You have redundancy but not the best performance. From the docs: http://docs.sun.com/app/docs/doc/816-4520/about-raid5-1?a=view You can expand a RAID-5 volume by concatenating additional components to the volume. Concatenating a new component to an existing RAID-5 volume decreases the overall performance of the volume because the data on concatenations is sequential. Data is not striped across all components. The original components of the volume have data and parity striped across all components. This striping is lost for the concatenated component. However, the data is still recoverable from errors because the parity is used during the component I/O. The resulting RAID-5 volume continues to handle a single component failure. Concatenated components also differ in the sense that they do not have parity striped on any of the regions. Thus, the entire contents of the component are available for data. Any performance enhancements for large or sequential writes are lost when components are concatenated. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs 32bits
2008/3/6, Brian Hechinger [EMAIL PROTECTED]: On Thu, Mar 06, 2008 at 11:39:25AM +0100, [EMAIL PROTECTED] wrote: I think it's specfically problematic on 32 bit systems with large amounts of RAM. Then you run out of virtual address space in the kernel quickly; a small amount of RAM (I have one with 512MB) works fine. I have a 32-bit machine with 4GB of ram. I've been researching this for some time now, but can't find it anywhere. At some point, someone posted a system config tweak to increase the amount of memory available to the ARC on a 32-bit platform. Who was that, and could you please re-post that tweak? I don't know how to change the ARC sise, but use this to increase kernel addres space: eeprom kernelbase=0x5000 Your user address space will shrink when you do that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Regression with ZFS best practice
I've just put my first ZFS into production, and users are complaining about some regressions. One problem for them is that now, they can't see all the users directories in the automount point: the homedirs used to be part of a single UFS, and were browsable with the correct autofs option. Now, following the ZFS best-practice, each user has his own FS - but being all shared separately, they're not browsable anymore. Is there a way to work around that, and have the same behaviour as before, ie, all homedirs shown in /home, whether they're mounted or not? Remove -nobrowse from the map in auto_master. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendations for per-user NFS shared home directories?
2008/2/17, Bob Friesenhahn [EMAIL PROTECTED]: I am attempting to create per-user ZFS filesystems under an exported /home ZFS filesystem. This would work fine except that the ownership/permissions settings applied to the mount point of those per-user filesystems on the server are not seen by NFS clients. Instead NFS clients see directory ownership of root:other (Solaris 9 clients), root:wheel (OS-X clients), and root:daemon (FreeBSD clients). Only Solaris 10 clients seem to preserve original ownership and permissions. Have the clients mounted your per-user filesystems? It is not enough to mount /home. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommendations for per-user NFS shared home directories?
2008/2/17, Bob Friesenhahn [EMAIL PROTECTED]: On Sun, 17 Feb 2008, Mattias Pantzare wrote: Have the clients mounted your per-user filesystems? It is not enough to mount /home. It is enough to mount /home if the client is Solaris 10. I did not want to mess with creating per-user mounting for all of my different type of systems so I punted and all the users are in one filesystem. Probably the ZFS documentation which suggests creating per-user home directories should be updated so that the existing drawbacks are also known. This is standard NFS behavior, this has nothing to do with ZFS. Solaris has some new features in this area. You should use automount for your mountings if you have many clients. Change the automount map and all clients will mount the new filesystem if needed. You can move some users to a new server with very little work, just change the mapping for that user. You should be able to get all your systems to read the automount maps from NIS or LDAP. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes
If you created them after, then no worries, but if I understand correctly, if the *file* was created with 128K recordsize, then it'll keep that forever... Files have nothing to do with it. The recordsize is a file system parameter. It gets a little more complicated because the recordsize is actually the maximum recordsize, not the minimum. Please read the manpage: Changing the file system's recordsize only affects files created afterward; existing files are unaffected. Nothing is rewritten in the file system when you change recordsize so is stays the same for existing files. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDz2 reporting odd (smaller) size
2008/2/13, Sam [EMAIL PROTECTED]: I saw some other people have a similar problem but reports claimed this was 'fixed in release 42' which is many months old, I'm running the latest version. I made a RAIDz2 of 8x500GB which should give me a 3TB pool: Disk manufacturers use ISO units, where 1k is 1000. ZFS uses computer units, where 1k is 1024. So your 500GB is realy 465GB. Check the exact number with format. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Modify fsid/guid of dataset for NFS failover
2007/11/10, asa [EMAIL PROTECTED]: Hello all. I am working on an NFS failover scenario between two servers. I am getting the stale file handle errors on my (linux) client which point to there being a mismatch in the fsid's of my two filesystems when the failover occurs. I understand that the fsid_guid attribute which is then used as the fsid in an NFS share, is created at zfs create time, but I would like to see and modify that value on any particular zfs filesystem after creation. More details were discussed at http://www.mail-archive.com/zfs- [EMAIL PROTECTED]/msg03662.html but this was talking about the same filesystem sitting on a san failing over between two nodes. On a linux NFS server one can specify in the nfs exports -o fsid=num which can be an arbitrary number, which would seem to fix this issue for me, but it seems to be unsupported on Solaris. As the fsid is created when the file system is created it will be the same when you mount it on a different NFS server. Why change it? Or are you trying to match two different file systems? Then you also have to match all inode-numbers on your files. That is not possible at all. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Error: Volume size exceeds limit for this system
2007/11/9, Anton B. Rang [EMAIL PROTECTED]: The comment in the header file where this error is defined says: /* volume is too large for 32-bit system */ So it does look like it's a 32-bit CPU issue. Odd, since file systems don't normally have any sort of dependence on the CPU type This is not a file system limit, it is a device limit. A zfs volume is a device, not a file system. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs/zpools iscsi
2007/10/12, Krzys [EMAIL PROTECTED]: Hello all, sorry if somebody already asked this or not. I was playing today with iSCSI and I was able to create zpool and then via iSCSI I can see it on two other hosts. I was courious if I could use zfs to have it shared on those two hosts but aparently I was unable to do it for obvious reasons. On my linuc oracle rac I was using ocfs which works just as I need it, does anyone know if such could be acheived with zfs maybe? maybe if not now but in the future? is there anything that I could do at this moment to be able to have my two other solaris clients see my zpool that I am presenting via iscsi to them both? Is there any solutions out there of this kind? Why not use NFS? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Multi-level ZFS in a SAN
2007/9/23, James L Baker [EMAIL PROTECTED]: I'm a small-time sysadmin with big storage aspirations (I'll be honest - for a planned MythTV back-end, and *ahem*, other storage), and I've recently discovered ZFS. I'm thinking about putting together a homebrew SAN with a NAS head, and am wondering if the following will work (hoping the formatting will stick!): SAN Box 1: 8-disk raid-z2 -- iSCSI over GbE --+ | SAN Box 2: |NAS Head: 8-disk raid-z2 -- iSCSI over GbE --+-- N-volume zfs pool -- NFS/SMB | SAN Box N: | 8-disk raid-z2 -- iSCSI over GbE --+ In plain english, for each SAN box, combining 8 (or so) disks in a ZFS raid-z2 pool, sharing the pool over GbE via iSCSI, then combining it with other (similar) SAN volumes in a non-redundant zfs pool on the NAS head, working out the partitioning, quotas, etc there. It would probably be better to iSCSI export the raw disks on the SAN boxes to the NAS Head. Let the NAS head do raidz2. That will make it easier to move disks between computers if you have to. Then you will have a redundant zfs pool on the NAS head without loosing any disk space. You could do 3 way raidz so that you can loose any SAN box. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Please help! ZFS crash burn in SXCE b70!
The problems I'm experiencing are as follows: ZFS creates the storage pool just fine, sees no errors on the drives, and seems to work great...right up until I attempt to put data on the drives. After only a few moments of transfer, things start to go wrong. The system doesn't power off, it just beeps 4-5 times. The X session dies and the monitor turns off (doesn't drop back to a console). All network access dies. It seems that the system panics (is it called something else in solaris-land?). The HD access light stays on (though I can hear no drives doing anything strenuous), and the CD light blinks. This has happened two or three times, every time I've tried to start copying data to the ZFS pool. I've been transfering over the network, via SCP or NFS. This could be a hardware problem. Bad powersuply for the load? Try removing 2 of the large disks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs space efficiency
2007/6/25, [EMAIL PROTECTED] [EMAIL PROTECTED]: I wouldn't de-duplicate without actually verifying that two blocks were actually bitwise identical. Absolutely not, indeed. But the nice property of hashes is that if the hashes don't match then the inputs do not either. I.e., the likelyhood of having to do a full bitwise compare is vanishingly small; the likelyhood of it returning equal is high. For this application (deduplication data) the likelihood of matching hashes are very high. In fact it has to be, otherwise there would not be any data to deduplicate. In the cp example, all writes would have matching hashes and all need a verify. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs kills box, memory related?
2007/6/10, arb [EMAIL PROTECTED]: Hello, I'm new to OpenSolaris and ZFS so my apologies if my questions are naive! I've got solaris express (b52) and a zfs mirror, but this command locks up my box within 5 seconds: % cmp first_4GB_file second_4GB_file It's not just these two 4GB files, any serious work in the filesystem (but I suspect the larger the file the worse it gets) bring the box to its knees. I've tried setting the maximum ARC size (seting c,p,c_max with mdb) but it doesn't help. Any other suggestions? If this is on a 32-bit machine, you may be running out of virtual memory for the kernel. You can try this, and reboot: eeprom kernelbase=0x5000 This will limit your userspace processes to about 1Gb. I would download a new DVD and do an upgrade from that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss