Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
On 27-Jan-07, at 10:15 PM, Anantha N. Srirama wrote: ... ZFS will not stop alpha particle induced memory corruption after data has been received by server and verified to be correct. Sadly I've been hit with that as well. My brother points out that you can use a rad hardened CPU. ECC should take care of the RAM. :-) I wonder when the former will become data centre best practice? Alpha particles which hit CPUs must have their origin inside said CPU. (Alpha particles do not penentrate skin, paper, let alone system cases or CPU packagaging) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
[EMAIL PROTECTED] wrote: On 27-Jan-07, at 10:15 PM, Anantha N. Srirama wrote: ... ZFS will not stop alpha particle induced memory corruption after data has been received by server and verified to be correct. Sadly I've been hit with that as well. My brother points out that you can use a rad hardened CPU. ECC should take care of the RAM. :-) I wonder when the former will become data centre best practice? Alpha particles which hit CPUs must have their origin inside said CPU. (Alpha particles do not penentrate skin, paper, let alone system cases or CPU packagaging) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss But, but, but, they'll get my brain without this nice shiny aluminum cap I made! Cosmic (aka Gamma) Radiation, folks. And, I think we've jumped the shark. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
[EMAIL PROTECTED] wrote: Alpha particles which hit CPUs must have their origin inside said CPU. (Alpha particles do not penentrate skin, paper, let alone system cases or CPU packagaging) Gamma rays cannot be shielded in a senseful way. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: can I use zfs on just a partition?
Take note though, that giving zfs the entire disk gives a possible performance win, as zfs will only enable the write cache for the disk if it is given the entire disk. really? why this? is this tuneable somehow/somewhere? can i enabyle writecache if only using a dedicated partition ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: can I use zfs on just a partition?
Take note though, that giving zfs the entire disk gives a possible performance win, as zfs will only enable the write cache for the disk if it is given the entire disk. really? why this? In the old days, Sun never enabled the write cache on devices because of reliability issues. (Sun SCSI disks were shipped with caches disabled; but the OS never bothered to change the caching behaviour; the behaviour on SCSI drives is persistent. On ata drives, the drive cache was specifically disabled (the behaviour is not persistent and drives default to write cache on) This behaviour was changed under competetive pressure for sata disks; they now default to write cache on; set using the sata:sata_write_cache variable The change came about with ZFS and the addition of a mechanism to flush the write cache (ZFS needs this to guarantee transactional safety)( is this tuneable somehow/somewhere? can i enabyle writecache if only using a dedicated partition ? If does put the additional data at some what of a risk; not really for swap but perhaps not nice for UFS. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
On 28-Jan-07, at 7:59 AM, [EMAIL PROTECTED] wrote: On 27-Jan-07, at 10:15 PM, Anantha N. Srirama wrote: ... ZFS will not stop alpha particle induced memory corruption after data has been received by server and verified to be correct. Sadly I've been hit with that as well. My brother points out that you can use a rad hardened CPU. ECC should take care of the RAM. :-) I wonder when the former will become data centre best practice? Alpha particles which hit CPUs must have their origin inside said CPU. (Alpha particles do not penentrate skin, paper, let alone system cases or CPU packagaging) Thanks. But what about cosmic rays? --T Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
On 28-Jan-07, at 7:59 AM, [EMAIL PROTECTED] wrote: On 27-Jan-07, at 10:15 PM, Anantha N. Srirama wrote: ... ZFS will not stop alpha particle induced memory corruption after data has been received by server and verified to be correct. Sadly I've been hit with that as well. My brother points out that you can use a rad hardened CPU. ECC should take care of the RAM. :-) I wonder when the former will become data centre best practice? Alpha particles which hit CPUs must have their origin inside said CPU. (Alpha particles do not penentrate skin, paper, let alone system cases or CPU packagaging) Thanks. But what about cosmic rays? I was just in pedantic mode; cosmic rays is the term covering all different particles, including alpha, beta and gamma rays. Alpha rays don't reach us from the cosmos; they are caught long before they can do any harm. Ditto beta rays. Both have an electrical charge that makes passing magnetic fields or passing through materials difficult. Both do exist in the free but are commonly caused by slow radioactive decay of our natural environment. Gamma rays are photons with high energy; they are not capture by magnetic fields (such as those existing in atoms: electons, protons). They need to take a direct hit before they're stopped; they can only be stopped by dense materials, such as lead. Unfortunately, natural occuring lead is polluted by pollonium and uranium and is an alpha/beta source in its own right. That's why 100 year old lead from roofs is worth more money than new lead: it's radioisotopes have been depleted. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: ZFS or UFS - what to do?
You're right that storage level snapshots are filesystem agnostic. I'm not sure why you believe you won't be able to restore individual files by using a NetApp snapshot? In the case of ZFS you'd take a periodic snapshot and use it to restore files, in the case of NetApp you can do the same (of course you've to have the additional step to mount the new snapshot volume.) Is this convenience tipping the scales for you to pursue ZFS? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] bug id 6381203
Hello, what is the status of the bug 6381203 fix in S10 u3 ? (deadlock due to i/o while assigning (tc_lock held)) Was it integrated? Is there a patch? Thanks, [i]-- leon[/i] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham wrote: What do you guys think about implementing 'zfs/zpool rewrite' command? It'll read every block older than the date when the command was executed and write it again (using standard ZFS COW mechanism, simlar to how resilvering works, but the data is read from the same disk it is written to= ). #1 How do you control I/O overhead? The same way it is handled for scrub and resilver. #2 Snapshot blocks are never rewritten at the moment. Most of your suggestions seem to imply working on the live data, but doing that for snapshots as well might be tricky. Good point, see below. 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! It's only the metadata that is modified anyway, not the file data. I would hope that this could be done more easily than a full tree rewrite (and again the issue with snapshots). Also, the overhead there probably isn't going to be very high (since the metadata will be cached in most cases). Agreed. Probably in this case there should be rewrite-only-metadata mode. I agree the overhead is probably not high, but on the other hand, I'm quite sure there are workload, which will see the difference, eg. 'find / -name something'. Other than that, I'm guessing something like this will be necessary to implement disk evacuation/removal. If you have to rewrite data from one disk to elsewhere in the pool, then rewriting the entire tree shouldn't be much harder. How did I forget about this one?:) That's right. I belive ZFS will gain such ability at some point and rewrite functionality fits very nice here: mark the disk/mirror/raid-z as no-more-writes and start rewrite process (probably only limited to this entity). To implement such functionality there also has to be a way to migrate snapshot data, so sooner or later there will be a need for moving snapshot blocks. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpsIUZEgB2Q6.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6381203
Hi Leon, This was fixed in March 2006, and is in S10_U2. Neil. Leon Koll wrote On 01/28/07 08:58,: Hello, what is the status of the bug 6381203 fix in S10 u3 ? (deadlock due to i/o while assigning (tc_lock held)) Was it integrated? Is there a patch? Thanks, [i]-- leon[/i] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: bug id 6381203
Too bad...I was in the situation where every zpool ... command was stuck (as well as df command) and my hope was - it's a known/fixed bug. I could not save the core files, not sure I can reproduce the bug. Thank you for quick reply, [i]-- leon[/i] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: ZFS or UFS - what to do?
On Sat, Jan 27, 2007 at 04:15:30PM -0800, Anantha N. Srirama wrote: I'm not sure what benefit you forsee by running a COW filesystem (ZFS) on a COW array (NetApp). The application requires a filesystem with POSIX semantics. My first choice would be NFS from the Netapp, but this won't work in this case. My next choice is an iSCSI LUN with a local filesystem on it. I'm assuming that since ZFS is more modern than UFS, that ZFS would be the best of the two, even though the JBOD-oriented features of ZFS will not be used. ZFS does seem to be more manageable than UFS. Filesystems that draw their space from a common pool is ideal for our application. The ability to expand a pool by adding another device, or by extending a existing device, is also ideal. Another feature is snapshots, which I've mentioned earlier. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: ZFS or UFS - what to do?
On Sun, Jan 28, 2007 at 06:19:25AM -0800, Anantha N. Srirama wrote: You're right that storage level snapshots are filesystem agnostic. I'm not sure why you believe you won't be able to restore individual files by using a NetApp snapshot? In the case of ZFS you'd take a periodic snapshot and use it to restore files, in the case of NetApp you can do the same (of course you've to have the additional step to mount the new snapshot volume.) Is this convenience tipping the scales for you to pursue ZFS? Yes, we'd run out of LUNs. We're talking about two weeks of daily snapshots on six filesystems. Each snapshot on the Netapp would become a separate iSCSI LUN. They need to be mounted on the server so that our admins can locate and restore missing files when necessary. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
On January 28, 2007 4:59:48 PM +0100 Pawel Jakub Dawidek [EMAIL PROTECTED] wrote: On Fri, Jan 26, 2007 at 06:08:50PM -0800, Darren Dunham wrote: 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! It's only the metadata that is modified anyway, not the file data. I would hope that this could be done more easily than a full tree rewrite (and again the issue with snapshots). Also, the overhead there probably isn't going to be very high (since the metadata will be cached in most cases). Agreed. Probably in this case there should be rewrite-only-metadata mode. I agree the overhead is probably not high, but on the other hand, I'm quite sure there are workload, which will see the difference, eg. 'find / -name something'. I'd imagine even for that it wouldn't matter. The I/O time will dwarf any time spent byte-swapping. Easily tested though. Make sure you set atime=off so that your find isn't causing write I/O. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: high density SAS
Anton B. Rang wrote: How badly can you mess up a JBOD? Two words: vibration, cooling. Three more: power, signal quality. I've seen even individual drive cases with bad enough signal quality to cause bit errors. Yep, if I crank up the amp to over 1kW, then on some frequencies, I see lots of noise on USB links as an example. You may have noticed that many vendors are now making USB cables with torroids builtin. There is still some black art involved in eliminating noise problems. However, one easy way to do it is well proven in the PCB design space. We leverage that with Thumper which has no internal disk cables. In fact, you should notice that many Sun designs have few, if any, internal cables. Cables are a source of reliability issues, so they are best when they don't exist. [waxing nostalgic] When the designers were planning the Shinkansen (Japanese high speed train system) they had 150 years of train accident data to study. Not surprisingly, most train accidents occured at crossings. To help avoid accidents, they eliminated crossings. Good design is a good thing. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] data wanted: disk kstats
Hello Richard, Friday, January 26, 2007, 11:36:07 PM, you wrote: RE We've been talking a lot recently about failure rates and types of RE failures. As you may know, I do look at field data and generally don't RE ask the group for more data. But this time, for various reasons (I RE might have found a bug or deficiency) I'm soliciting for more data at RE large. RE What I'd like to gather is the error rates per bytes transferred. This RE data is collected in kstats, but is reset when you reboot. One of the RE features of my vast collection of field data is that it is often collected RE rather soon after a reboot. Thus, there aren't very many bytes transferred RE yet, and the corresponding error rates tend to be small (often 0). A perfect RE collection would be from a machine connected to lots of busy disks which RE has been up for a very long time. RE Can you help? It is real simple. Just email me the output of: I've sent you off list. Will those results (total statistics, not site specific) be publicly provided by you (here?)? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] zfs rewrite?
Hello Jeff, Saturday, January 27, 2007, 8:27:09 AM, you wrote: JB You're all correct. File data is never byte-swapped. Most metadata JB needs to be byte-swapped, but it's generally only 1-2% of your space. JB So the overhead shouldn't be significant, even if you never rewrite. I remember some time ago Sun touted ZFS has some interesting new technology to deal with endianess and that patent is pending for it. Can you share what was it about? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Hello Anantha, Friday, January 26, 2007, 5:06:46 PM, you wrote: ANS All my feedback is based on Solaris 10 Update 2 (aka 06/06) and ANS I've no comments on NFS. I strongly recommend that you use ZFS ANS data redundancy (z1, z2, or mirror) and simply delegate the ANS Engenio to stripe the data for performance. Striping on an array and then doing redundancy with ZFS has at least one drawback - what if one of disks fails? You've got to replace bad disk, re-create stripe on an array and resilver on ZFS (or stay with hotspare). Lot of hassle. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] ZFS or UFS - what to do?
Hello Francois, Friday, January 26, 2007, 4:09:43 PM, you wrote: FD On Fri, 2007-01-26 at 06:16 -0800, Jeffery Malloch wrote: Hi Folks, I am currently in the midst of setting up a completely new file server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for LSI Logic so Engenio is a no brainer). I have configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I then created sub zfs systems below that and set quotas and sharenfs'd them so that it appears that these file systems are dynamically shrinkable and growable. It looks very good... I can see the correct file system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I resize the quota it's picked up in NFS right away. But I would be the first in our organization to use this in an enterprise system so I definitely have some concerns that I'm hoping someone here can address. 1. How stable is ZFS? The Engenio box is completely configured for RAID5 with hot spares FD That partly defeats the purpose of ZFS. ZFS offers raid-z and raid-z2 FD (double parity) with all the advantages of raid-5 or raid-6 but without FD several of the raid-5 issues. It also has features that a raid-5 FD controller could never do: ensure data integrity from the kernel to the FD disk, and self correction. Not always true. Actually you can get much more performance for some workloads doing raid-5 in HW than raid-z. Also with some other entry level arrays there're limits on how much LUNs can be presented and you actually can't expose all disks each as a LUN due to the limit (yes, Sun's 3510). and write cache (8GB) has battery backup so I'm not too concerned from a hardware side. FD Whereas the cache/battery backup is a requirement if you run raid-5, it FD is not for zfs. Still it doesn't mean it won't help for some workloads. 2. Recommended config. FD The most reliable setup is a JBOD + zfs. But if you have cache, on your I would argue this. No matter what you still get less reliable setup when using ZFS on top of simple JBOD than Symmetrix box. It's just that in many cases that simple JBOD can be good enough. FD box, there might be some magic setup you have to do for that box, and FD I'm sure somebody on the list will help you with that. I dont have an FD Engenio. There's a workaround for Enginie devices. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: ZFS or UFS - what to do?
Agreed, I guess I didn't articulate my point/thought very well. The best config is to present JBoDs and let ZFS provide the data protection. This has been a very stimulating conversation thread; it is shedding new light into how to best use ZFS. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss