[zfs-discuss] Re: Re[5]: Re: Re: Due to 128KB limit in ZFS it can'tsaturate disks

2006-05-16 Thread Anton B. Rang
One issue is what we mean by saturation. It's easy to bring a disk to 100% busy. We need to keep this discussion in the context of a workload. Generally when people care about streaming throghput of a disk, it's because they are reading or writing a single large file, and they want to reach

[zfs-discuss] Re: user undo

2006-05-26 Thread Anton B. Rang
Anything that attempts to append characters on the end of the filename will run into trouble when the file name is already at NAME_MAX. One simple solution is to restrict the total length of the name to NAME_MAX, truncating the original filename as necessary to allow appending. This does

[zfs-discuss] Re: Re: [osol-discuss] Re: I wish Sun would open-sourceQFS... /was:Re: Re: Distributed File System for Solaris

2006-06-01 Thread Anton B. Rang
We'll be much better able to help you reach your performance goals if you can state them as performance goals. In particular, knowing the latency requirements is important. Uncompressed HD video runs at 1.5 Gbps; two streams would require 3 Gbps, or 375 MB/sec. The requirement for real-time

[zfs-discuss] Re: 3510 configuration for ZFS

2006-06-01 Thread Anton B. Rang
What about small random writes? Won't those also require reading from all disks in RAID-Z to read the blocks for update, where in mirroring only one disk need be accessed? Or am I missing something? (It seems like RAID-Z is similar to RAID-3 in its performance characteristics, since both

[zfs-discuss] Re: Re: disk write cache, redux

2006-06-15 Thread Anton B. Rang
The write cache decouples the actual write to disk from the data transfer from the host. For a streaming operation, this means that the disk can typically stream data onto tracks with almost no latency (because the cache can aggregate multiple I/O operations into full tracks which can be

[zfs-discuss] Re: ZFS questions (hybrid HDs)

2006-06-21 Thread Anton B. Rang
Actually, while Seagate's little white paper doesn't explicitly say so, the FLASH is used for a write cache and that provides one of the major benefits: Writes to the disk rarely need to spin up the motor. Probably 90+% of all writes to disk will fit into the cache in a typical laptop

[zfs-discuss] Re: raidz and 512 byte files

2006-07-14 Thread Anton B. Rang
In RAID-Z, the width of the stripe can vary. For a small block (such as would hold 512 bytes or less of file data), the stripe will be 1 data block and either 1 or 2 parity blocks. So a full-stripe write will simply look like either mirroring (RAIDZ1) or mirror plus one additional ECC block

[zfs-discuss] Re: Clones and rm -rf

2006-08-03 Thread Anton B. Rang
I'd filed 6452505 (zfs create should set permissions on underlying mountpoint) so that this shouldn't cause problems in the future This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

[zfs-discuss] Re: Lots of seeks?

2006-08-08 Thread Anton B. Rang
So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the

[zfs-discuss] Re: user quotas vs filesystem quotas?

2006-08-15 Thread Anton B. Rang
One problem with this approach is that software expects /var/mail to be full of files, not directories, for each user. I don't think you can get the right semantics out of ZFS for this yet (loopback mounting a file comes to mind, but breaks down if something tries to delete the user's mailbox

[zfs-discuss] Re: Re: user quotas vs filesystem quotas?

2006-08-15 Thread Anton B. Rang
Delivering into $HOME raises some new failure modes if the home directory servers are NFS mounted, but otherwise often works OK. However, in some cases it's simply impossible--for instance, in a secure NFS environment where the home directory can't be mounted without a Kerberos ticket. I think

[zfs-discuss] Re: SCSI synchronize cache cmd

2006-08-21 Thread Anton B. Rang
Yes, ZFS uses this command very frequently. However, it only does this if the whole disk is under the control of ZFS, I believe; so a workaround could be to use slices rather than whole disks when creating a ZFS pool on a buggy device. This message posted from opensolaris.org

[zfs-discuss] Re: Re: SCSI synchronize cache cmd

2006-08-22 Thread Anton B. Rang
Bill, I realized just now that we're actually sending the wrong variant of SYNCHRONIZE CACHE, at least for SCSI devices which support SBC-2. SBC-2 (or possibly even SBC-1, I don't have it handy) added the SYNC_NV bit to the command. If SYNC_NV is set to 0, the device is required to flush data

[zfs-discuss] Re: Re: Re: SCSI synchronize cache cmd

2006-08-22 Thread Anton B. Rang
Filed as 6462690. If our storage qualification test suite doesn't yet check for support of this bit, we might want to get that added; it would be useful to know (and gently nudge vendors who don't yet support it). This message posted from opensolaris.org

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

2006-09-07 Thread Anton B. Rang
The bigger problem with system utilization for software RAID is the cache, not the CPU cycles proper. Simply preparing to write 1 MB of data will flush half of a 2 MB L2 cache. This hurts overall system performance far more than the few microseconds that XORing the data takes. (A similar

[zfs-discuss] Re: Re: How to destroy a pool wich you can't import

2006-09-07 Thread Anton B. Rang
A determined administrator can always get around any checks and cause problems. We should do our very best to prevent data loss, though! This case is particularly bad since simply booting a machine can permanently damage the pool. And why would we want a pool imported on another host, or not

[zfs-discuss] Re: Proposal: multiple copies of user data

2006-09-12 Thread Anton B. Rang
The biggest problem I see with this is one of observability, if not all of the data is encrypted yet what should the encryption property say ? If it says encryption is on then the admin might think the data is safe, but if it says it is off that isn't the truth either because some of it maybe

[zfs-discuss] Re: Proposal: multiple copies of user data

2006-09-12 Thread Anton B. Rang
True - I'm a laptop user myself. But as I said, I'd assume the whole disk would fail (it does in my experience). That's usually the case, but single-block failures can occur as well. They're rare (check the uncorrectable bit error rate specifications) but if they happen to hit a critical file,

[zfs-discuss] Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread Anton B. Rang
And if we are still writing to the file systems at that time ? New writes should be done according to the new state (if encryption is being enabled, all new writes are encrypted), since the goal is that eventually the whole disk will be in the new state. The completion percentage should

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang
It would be interesting to have a zfs enabled HBA to offload the checksum and parity calculations. How much of zfs would such an HBA have to understand? That's an interesting question. For parity, it's actually pretty easy. One can envision an HBA which took a group of related write commands

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang
just measured quickly that a 1.2Ghz sparc can do [400-500]MB/sec of encoding (time spent in misnamed function vdev_raidz_reconstruct) for a 3 disk raid-z group. Strange, that seems very low. Ah, I see. The current code loops through each buffer, either copying or XORing it into the parity.

[zfs-discuss] Re: Re: Recommendation ZFS on StorEdge 3320

2006-09-13 Thread Anton B. Rang
With ZFS however the in-between cache is obsolete, as individual disk caches can be used directly. I also openly question whether even the dedicated RAID HW is faster than the newest CPUs in modern servers. Individual disk caches are typically in the 8-16 MB range; for 15 disks, that gives you

[zfs-discuss] Re: Comments on a ZFS multiple use of a pool, RFE.

2006-09-13 Thread Anton B. Rang
I think there are at least two separate issues here. The first is that ZFS doesn't support multiple hosts accessing the same pool. That's simply a matter of telling people. UFS doesn't support multiple hosts, but it doesn't have any special features to prevent administrators from *trying* it.

[zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Anton B. Rang
If I'm reading the source correctly, for the $60xx boards, the only supported revision is $09. Yours is $07, which presumably has some errata with no workaround, and which the Solaris driver refuses to support. Hope you can return it ... ? This message posted from opensolaris.org

[zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Anton B. Rang
A quick peek at the Linux source shows a small workaround in place for the 07 revision...maybe if you file a bug against Solaris to support this revision it might be possible to get it added, at least if that's the only issue. This message posted from opensolaris.org

[zfs-discuss] Re: Proposal: multiple copies of user data

2006-09-13 Thread Anton B. Rang
Is this true for single-sector, vs. single-ZFS-block, errors? (Yes, it's pathological and probably nobody really cares.) I didn't see anything in the code which falls back on single-sector reads. (It's slightly annoying that the interface to the block device drivers loses the SCSI error

[zfs-discuss] Re: Re: Comments on a ZFS multiple use of a pool, RFE.

2006-09-14 Thread Anton B. Rang
If you *never* want to import a pool automatically on reboot you just have to delete the /etc/zfs/zpool.cache file before the zfs module is being loaded. This could be integrated into SMF. Or you could always use import -R / create -R for your pool management. Of course, there's no way to

[zfs-discuss] Re: please remove my ignorance of raiding and mirroring

2006-09-29 Thread Anton B. Rang
Mirroring is more efficient for small reads (the size of one ZFS block or less) because only one disk has to be accessed. Since RAID-Z spreads a ZFS block across multiple disks, and the data from all disks is required to verify the checksum, every read accesses every disk. Mirror read: 1.

[zfs-discuss] Re: jbod questions

2006-09-29 Thread Anton B. Rang
Actually, random writes on a RAID-5, while not performing that well because of the pre-read, don't require a full stripe read (or write). They only require reading the old data and parity, then writing the new data and parity. This is quite a bit better than a full stripe, since only two

[zfs-discuss] Re: single memory allocation in the ZFS intent log

2006-10-05 Thread Anton B. Rang
Hi Mitchell, I do work for Sun, but I don't consider myself biased towards the slab allocator or any other Solaris or Sun code. I know we've got plenty of improvements to make! That said, your example is not multi-threaded. There are two major performance issues which come up with a list

[zfs-discuss] Re: A versioning FS

2006-10-06 Thread Anton B. Rang
ClearCase is a version control system, though — not the same as file versioning. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: A versioning FS

2006-10-06 Thread Anton B. Rang
I think our problem is that we look at FV from different angles. I look at it from the point of view of people who have NEVER used FV, and you look at it from the view of people who have ALWAYS used FV. That's certainly a part of it. It's interesting reading this discussion, as someone who

[zfs-discuss] Re: A versioning FS

2006-10-06 Thread Anton B. Rang
People are oriented to their files, not to snapshots. True, though with NetApp-style snapshots, it's not that difficult to translate 'src/file.c' to '.snapshot/hourly.0/src/file.c' and see what it was like an hour ago. I imagine that a syntax like '.snapshot/22:20/src/file.c' would also be

[zfs-discuss] Re: A versioning FS

2006-10-06 Thread Anton B. Rang
Versioning cannot be automated; taking periodic snapshots != capturing application state. But I think we have existence proofs of operating systems which do automate versioning. It's true that capturing a new version each time a file has been modified and closed may not be perfect, but if it

[zfs-discuss] Re: Re: Snapshots of an active file

2006-10-08 Thread Anton B. Rang
I'm showing my lack of knowledge on this one but I thought SAM-FS could do something like this. Anyone know for sure? It's not quite the same, and not out-of-the-box. SAM-FS has the ability to create an archive copy of files onto disk or tape when the files are closed after having been

[zfs-discuss] Re: Unbootable system recovery

2006-10-08 Thread Anton B. Rang
The scan order won't make any difference to ZFS, as it identifies the drives by a label written to them, rather than by their controller path. Perhaps someone in ZFS support could analyze the panic to determine the cause, or look at the disk labels; have you made the core file available to Sun?

[zfs-discuss] Re: Where is the ZFS configuration data stored?

2006-10-12 Thread Anton B. Rang
The configuration data is stored on the disk devices themselves, at least primarily. There is also a copy of the basic configuration data in the file /etc/zfs/zpool.cache on the boot device. If this file is missing, ZFS will not automatically import pools, but you can manually import them.

[zfs-discuss] Re: zfs/raid configuration question for an application

2006-10-12 Thread Anton B. Rang
Mirroring will give you the best performance for small write operations. If you can get by with two disks, I’d divide each of them into two slices, s0 and s1, say. Set up an SVM mirror between d0s0 and d1s0 and use that for your root. Set up a ZFS mirror between d0s1 and d1s1 and use that for

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

2006-10-12 Thread Anton B. Rang
fsync() should theoretically be better because O_SYNC requires that each write() include writing not only the data but also the inode and all indirect blocks back to the disk. This message posted from opensolaris.org ___ zfs-discuss mailing list

[zfs-discuss] Re: zfs/raid configuration question for an application

2006-10-12 Thread Anton B. Rang
Yes, set the block size to 8K, to avoid a read-modify-write cycle inside ZFS. As you suggest, using a separate mirror for the transaction log will only be useful if you're on different disks -- otherwise you will be forcing the disk head to move back and forth between slices each time you

[zfs-discuss] Re: Re: zfs/raid configuration question for an

2006-10-13 Thread Anton B. Rang
Most ZFS improvements should be available through patches. Some may require moving to a future update (for instance, ZFS booting, which may have other implications throughout the system). On most systems, you won’t see a lot of difference between hardware or software mirroring. The benefit of

[zfs-discuss] Re: [nfs-discuss] Re: Re: NFS Performance and Tar

2006-10-13 Thread Anton B. Rang
For what it's worth, close-to-open consistency was added to Linux NFS in the 2.4.20 kernel (late 2002 timeframe). This might be the source of some of the confusion. This message posted from opensolaris.org ___ zfs-discuss mailing list

[zfs-discuss] Re: Self-tuning recordsize

2006-10-13 Thread Anton B. Rang
One technique would be to keep a histogram of read write sizes. Presumably one would want to do this only during a “tuning phase” after the file was first created, or when access patterns change. (A shift to smaller record sizes can be detected by a large proportion of write operations which

[zfs-discuss] Re: Self-tuning recordsize

2006-10-17 Thread Anton B. Rang
No, the reason to try to match recordsize to the write size is so that a small write does not turn into a large read + a large write. In configurations where the disk is kept busy, multiplying 8K of data transfer up to 256K hurts. This is really orthogonal to the cache — in fact, if we had a

[zfs-discuss] Re: Mirrored Raidz

2006-10-24 Thread Anton B. Rang
Our thinking is that if you want more redundancy than RAID-Z, you should use RAID-Z with double parity, which provides more reliability and more usable storage than a mirror of RAID-Zs would. This is only true if the drives have either independent or identical failure modes, I think. Consider

[zfs-discuss] Re: raid-z random read performance

2006-11-02 Thread Anton B. Rang
I don't see how you can get both end-to-end data integrity and read avoidance. Checksum the individual RAID-5 blocks, rather than the entire stripe? In more detail: Allow the pointer to the block to contain one checksum per device used (the count will vary if you're using a RAID-Z style

[zfs-discuss] Re: df -e in ZFS

2006-11-09 Thread Anton B. Rang
A UFS file system has a fixed number of inodes, set when the file system is created. df can simply report how many of those have been used, and how many are free. Most file systems, including ZFS and QFS, allocate inodes dynamically. In this case, there really isn’t a “number of files free”

[zfs-discuss] Re: Re: df -e in ZFS

2006-11-10 Thread Anton B. Rang
The reason that I want to use up the inode is we need to test the behaviors in the case of both block and inode are used up. If only fill up the block, creating an empty file still succeeds. Pretty much the only way to tell if you've used up all the space available for file nodes is to

[zfs-discuss] Re: I/O patterns during a zpool replace:whywritetothe disk being replaced?

2006-11-10 Thread Anton B. Rang
I'd appreciate it if only people who have made changes to the ZFS codebase found in opensolaris respond further to this thread. Well. I haven't made changes, but I can read code. When replacing a device, ZFS internally takes the device being replaced and creates a mirror between the old and

[zfs-discuss] Re: zfs corrupted my data!

2006-11-28 Thread Anton B. Rang
With zfs, there's this ominous message saying destroy the filesystem and restore from tape. That's  not so good, for one corrupt file. It is strictly correct that to restore the data you'd need to refer to a backup, in this case. It is not, however, correct that to restore the data you

[zfs-discuss] Re: zfs corrupted my data!

2006-11-28 Thread Anton B. Rang
No, you still have the hardware problem. What hardware problem? There seems to be an unspoken assumption that any checksum error detected by ZFS is caused by a relatively high error rate in the underlying hardware. There are at least two classes of hardware-related errors. One class are those

[zfs-discuss] Re: Re: Production ZFS Server Death (06/06)

2006-12-03 Thread Anton B. Rang
RAID level what? How is anything salvagable if you lose your only copy? [ ... ] ZFS does store multiple copies of metadata in a single vdev, so I assume we're talking about data here. I believe we're talking about metadata, as that is the case where ZFS reports that the pool (as opposed

[zfs-discuss] Re: Re: Production ZFS Server Death (06/06)

2006-12-03 Thread Anton B. Rang
Bit errors happen. When they do, data is corrupted. This is rather an oversimplification. Single-bit errors *on the media* happen relatively frequently. In fact, multi-bit errors are not too uncommon either. Hence there is a lot of error-correction data written to the disc media. The

[zfs-discuss] Re: ZFS on multi-volume

2006-12-04 Thread Anton B. Rang
It is possible to configure ZFS in the way you describe, but your performance will be limited by the older array. All mirror writes have to be stored on both arrays before they are considered complete, so writes will be as slow as the slowest disk or array involved. ZFS does not currently

[zfs-discuss] Re: ZFS related kernel panic

2006-12-04 Thread Anton B. Rang
And to panic? How can that in any sane way be good way to protect the application? *BANG* - no chance at all for the application to handle the problem... I agree -- a disk error should never be fatal to the system; at worst, the file system should appear to have been forcibly unmounted (and

[zfs-discuss] Re: need Clarification on ZFS

2006-12-05 Thread Anton B. Rang
is there any command to know the presence of ZFS file system on a device ? fstyp is the Solaris command to determine what type of file system may be present on a disk: # fstyp /dev/dsk/c0t1d0s6 zfs When a device is shared between two machines [ ... ] You can use the same mount/unmount

[zfs-discuss] Re: Re: ZFS related kernel panic

2006-12-05 Thread Anton B. Rang
But it's still not the application's problem to handle the underlying device failure. But it is the application's problem to handle an error writing to the file system -- that's why the file system is allowed to return errors. ;-) Some applications might not check them, some applications

[zfs-discuss] Re: Shared ZFS pools

2006-12-05 Thread Anton B. Rang
You specify the mirroring configuration. The top-level vdevs are implicitly striped. So if you, for instance, request something like zpool create mirror AA BA mirror AB BB then you will have a pool consisting of a stripe of two mirrors. Each mirror will have one copy of its data at each

[zfs-discuss] Re: Re: Managed to corrupt my pool

2006-12-05 Thread Anton B. Rang
I think the pool is busted. Even the message printed in your previous email is bad: DATASET OBJECT RANGE 15 0 lvl=4294967295 blkid=0 as level is way out of range. I think this could be from dmu_objset_open_impl(). It sets object to 0 and level to -1 (=

[zfs-discuss] Re: raidz DEGRADED state

2006-12-05 Thread Anton B. Rang
Creating an array configuration with one element being a sparse file, then removing that file, comes to mind, but I wouldn't want to be the first to attempt it. ;-) This message posted from opensolaris.org ___ zfs-discuss mailing list

[zfs-discuss] Re: A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-07 Thread Anton B. Rang
I'm still confused though, I believe that locking an adaptive mutex will spin for a short period then context switch and so they shouldn't be burning CPU - at least not .4s worth! An adaptive mutex will spin as long as the thread which holds the mutex is on CPU. If the lock is moderately

[zfs-discuss] Re: System pause peculiarity with mysql on zfs

2006-12-07 Thread Anton B. Rang
This does look like the ATA driver bug rather than a ZFS issue per se. (For the curious, the reason ZFS triggers this when UFS doesn't is because ZFS sends a synchronize cache command to the disk, which is not handled in DMA mode by the controller; and for this particular controller, switching

[zfs-discuss] Re: ZFS Usage in Warehousing (lengthy intro)

2006-12-09 Thread Anton B. Rang
If your database performance is dominated by sequential reads, ZFS may not be the best solution from a performance perspective. Because ZFS uses a write-anywhere layout, any database table which is being updated will quickly become scattered on the disk, so that sequential read patterns become

[zfs-discuss] Re: Netapp to Solaris/ZFS issues

2006-12-12 Thread Anton B. Rang
NetApp can actually grow their RAID groups, but they recommend adding an entire RAID group at once instead. If you add a disk to a RAID group on NetApp, I believe you need to manually start a reallocate process to balance data across the disks. This message posted from opensolaris.org

[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Anton B. Rang
Are you looking purely for performance, or for the added reliability that ZFS can give you? If the latter, then you would want to configure across multiple LUNs in either a mirrored or RAID configuration. This does require sacrificing some storage in exchange for the peace of mind that any

[zfs-discuss] Re: Re: Sol10u3 -- is du bug fixed?

2006-12-12 Thread Anton B. Rang
Is there an easy way to determine whether a pool has this fix applied or not? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS behavior under heavy load (I/O that is)

2006-12-12 Thread Anton B. Rang
I think you may be observing that fsync() is slow. The file will be written, and visible to other processes via the in-memory cache, before the data has been pushed to disk. vi forces the data out via fsync, and that can be quite slow when the file system is under load, especially before a fix

[zfs-discuss] Re: Uber block corruption?

2006-12-12 Thread Anton B. Rang
Also note that the UB is written to every vdev (4 per disk) so the chances of all UBs being corrupted is rather low. The chances that they're corrupted by the storage system, yes. However, they are all sourced from the same in-memory buffer, so an undetected in-memory error (e.g. kernel bug)

[zfs-discuss] Re: ZFS Storage Pool advice

2006-12-12 Thread Anton B. Rang
Were looking for pure performance. What will be contained in the LUNS is Student User account files that they will access and Department Share files like, MS word documents, excel files, PDF. There will be no applications on the ZFS Storage pools or pool Does this help on what strategy

[zfs-discuss] Re: ZFS and write caching (SATA)

2006-12-12 Thread Anton B. Rang
It took manufacturers of SCSI drives some years to get this right. Around 1997 or so we were still seeing drives at my former employer that didn't properly flush their caches under all circumstances (and had other interesting behaviours WRT caching). Lots of ATA disks never did bother to

[zfs-discuss] Re: Kickstart hot spare attachment

2006-12-12 Thread Anton B. Rang
If the SCSI commands hang forever, then there is nothing that ZFS can do, as a single write will never return. The more likely case is that the commands are continually timining out with very long response times, and ZFS will continue to talk to them forever. It looks like the sd driver

[zfs-discuss] Re: Re: Disappearing directories

2006-12-15 Thread Anton B. Rang
The implication in what you've written is that ZFS doesn't report an error if it detects an invalid checksum. Is that correct? No, sorry I wasn't more clear. ZFS detects and reports the invalid checksum. If the checksum error occurs on a directory, this can result in an error being returned

[zfs-discuss] Re: Re: Disappearing directories

2006-12-15 Thread Anton B. Rang
Just to make sure there's no confusion ;-), this error message was added to 'ls' after Solaris 10, and hasn't been backported yet. (Bug 4985395, *ls* does not report errors from getdents().) This message posted from opensolaris.org ___ zfs-discuss

[zfs-discuss] Re: ZFS and SE 3511

2006-12-18 Thread Anton B. Rang
I have a Sun SE 3511 array with 5 x 500 GB SATA-I disks in a RAID 5. This 2 TB logical drive is partitioned into 10 x 200GB slices. I gave 4 of these slices to a Solaris 10 U2 machine and added each of them to a concat (non-raid) zpool as listed below: This is certainly a supportable

[zfs-discuss] Re: Mailing list issues (Re: Re: [security-discuss]Thoughts on ZFS Secure Delete- withoutusing Crypto)

2006-12-18 Thread Anton B. Rang
BTW, Jeff's posts to zfs-discuss are being rejected with this message [ ... ] ... while the spam is coming through loud clear. ;-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

[zfs-discuss] Re: ZFS in a SAN environment

2006-12-19 Thread Anton B. Rang
I thought this is what the T10 OSD spec was set up to address. We've already got device manufacturers beginning to design and code to the spec. Precisely. The interface to block-based devices forces much of the knowledge that the file system and application have about access patterns to be

[zfs-discuss] Re: Re[2]: ZFS in a SAN environment

2006-12-19 Thread Anton B. Rang
INFORMATION: If a member of this striped zpool becomes unavailable or develops corruption, Solaris will kernel panic and reboot to protect your data. OK, I'm puzzled. Am I the only one on this list who believes that a kernel panic, instead of EIO, represents a bug? This message posted

[zfs-discuss] Re: !

2006-12-22 Thread Anton B. Rang
Unfortunately there are some cases, where the disks lose data, these cannot be detected by traditional filesystems but with ZFS: * bit rot: some bits on the disk gets flipped (~ 1 in 10^11) * phantom writes: a disk 'forgets' to write data (~ 1 in 10^8) * misdirected reads/writes: disk

[zfs-discuss] Re: Re: zfs list and snapshots..

2006-12-22 Thread Anton B. Rang
Do you have more than one snapshot? If you have a file system a, and create two snapshots [EMAIL PROTECTED] and [EMAIL PROTECTED], then any space shared between the two snapshots does not get accounted for anywhere visible. Only once one of those two is deleted, so that all the space is

[zfs-discuss] Re: Re: ZFS and SE 3511

2006-12-23 Thread Anton B. Rang
Hmm... But, how is my current configuration (1 striped zpool consisting of 4 x 200 GB luns from a hardware RAID 5 logical drive) analogous to taking a single disk, partitioning it into several partitions, then striping across those partitions if each 200 GB lun is presented to solaris as a

[zfs-discuss] Re: Re: [security-discuss] Thoughts on ZFS SecureDelete - without usingCrypto

2007-01-01 Thread Anton B. Rang
Good point. Verifying that the new überblock is readable isn’t actually sufficient, since it might become unreadable in the future. You’d need to wait for several transaction groups, until the block was unreachable by the oldest remaining überblock, to be safe in this sense. On the other

[zfs-discuss] Re: Re[2]: RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Anton B. Rang
In our recent experience RAID-5 due to the 2 reads, a XOR calc and a write op per write instruction is usually much slower than RAID-10 (two write ops). Any advice is greatly appreciated. RAIDZ and RAIDZ2 does not suffer from this malady (the RAID5 write hole). 1. This isn't the write

[zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Anton B. Rang
Is there some reason why a small read on a raidz2 is not statistically very likely to require I/O on only one device? Assuming a non-degraded pool of course. ZFS stores its checksums for RAIDZ/RAIDZ2 in such a way that all disks must be read to compute and verify the checksum. This

[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton B. Rang
What happens when a sub-block is missing (single disk failure)? Surely it doesn't have to discard the entire checksum and simply trust the remaining blocks? The checksum is over the data, not the data+parity. So when a disk fails, the data is first reconstructed, and then the block checksum

[zfs-discuss] Re: ZFS direct IO

2007-01-05 Thread Anton B. Rang
DIRECT IO is a set of performance optimisations to circumvent shortcomings of a given filesystem. Direct I/O as generally understood (i.e. not UFS-specific) is an optimization which allows data to be transferred directly between user data buffers and disk, without a memory-to-memory copy.

[zfs-discuss] Re: Solid State Drives?

2007-01-05 Thread Anton B. Rang
If [SSD or Flash] devices become more prevalent, and/or cheaper I'm curious what ways ZFS could be made to bast take advantage of them? The intent log is a possibility, but this would work better with SSD than Flash; Flash writes can actually be slower than sequential writes to a real disk.

[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-05 Thread Anton B. Rang
It's not about the checksum but about how a fs block is stored in raid-z[12] case - it's spread out to all non-parity disks so in order to read one fs block you have to read from all disks except parity disks. However, if we didn't need to verify the checksum, we wouldn't have to read the

[zfs-discuss] Re: Re: Solid State Drives?

2007-01-05 Thread Anton B. Rang
Summary (1.8 form factor): write: 35MB/Sec, Read: 62MB/Sec IOPS: 7,000 That is on par with a 5400 rpm disk, except for the 100x more small, random read iops. The biggest issue is the pricing, which will become interestingly competitive for mortals this year. $600+ for a 32 GB device

[zfs-discuss] Re: Re: Re: Solid State Drives?

2007-01-06 Thread Anton B. Rang
$600+ for a 32 GB device isn't exactly competitive, though the low-power and random access are attractive. Look at previous SSD offerings. $600 is a steal. ;) This isn't a performance-oriented SSD, since it's using Flash RAM (limited lifetime, slow writes). It's really meant as a hard

[zfs-discuss] Re: Implementation Question

2007-01-17 Thread Anton B. Rang
Turnaround question - why *should* ZFS define an underlying storage arrangement at the filesystem level? It would be nice to provide it at the directory hierarchy level, but since file systems in ZFS are cheap, providing it at the file system level instead might be reasonable. (I say might be

[zfs-discuss] Re: Re: Re: Heavy writes freezing system

2007-01-17 Thread Anton B. Rang
Yes, Anantha is correct that is the bug id, which could be responsible for more disk writes than expected. I believe, though, that this would explain at most a factor of 2 of write expansion (user data getting pushed to disk once in the intent log, then again in its final location). If the

[zfs-discuss] Re: External drive enclosures + Sun Server for massstorage

2007-01-20 Thread Anton B. Rang
To me, hard drives today are as much a commodity item as network cable, GBICs, NICs, DVD drives, etc. They are and they aren't. Reliability, particularly in high-heat vibration environments, can vary quite a bit. For sun to charge 4-8 times street price for hard drives that they order just

[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Anton B. Rang
1. How stable is ZFS? It's a new file system; there will be bugs. It appears to be well-tested, though. There are a few known issues; for instance, a write failure can panic the system under some circumstances. UFS has known issues too 2. Recommended config. Above, I have a fairly

[zfs-discuss] Re: high density SAS

2007-01-26 Thread Anton B. Rang
How badly can you mess up a JBOD? Two words: vibration, cooling. Three more: power, signal quality. I've seen even individual drive cases with bad enough signal quality to cause bit errors. This message posted from opensolaris.org ___

[zfs-discuss] Re: hot spares - in standby?

2007-02-02 Thread Anton B. Rang
Often, the spare is up and running but for whatever reason you'll have a bad block on it and you'll die during the reconstruct. Shouldn't SCSI/ATA block sparing handle this? Reconstruction should be purely a matter of writing, so bit rot shouldn't be an issue; or are there cases I'm not

[zfs-discuss] Re: ZFS panic on B54

2007-02-02 Thread Anton B. Rang
The affected DIMM? Did you have memory errors before this? The message you posted looked like a ZFS encountered an error writing to the drive (which could, admittedly, have been caused by bad memory). This message posted from opensolaris.org ___

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

2007-02-02 Thread Anton B. Rang
In general, your backup software should handle making incremental dumps, even from a split mirror. What are you using to write data to tape? Are you simply dumping the whole file system, rather than using standard backup software? ZFS snapshots use a pure copy-on-write model. If you have a

[zfs-discuss] Re: Best Practises = Keep Pool Below 80%?

2007-02-13 Thread Anton B. Rang
The space management algorithms in many file systems don't always perform well when they can't find a free block of the desired size. There's often a cliff where on average, once the file system is too full, performance drops off exponentially. UFS deals with this by reserving space explicitly

[zfs-discuss] Re: Google paper on disk reliability

2007-02-20 Thread Anton B. Rang
It turns out that even rather poor prediction accuracy is good enough to make a big difference (10x) in the failure probability of a RAID system. See Gordon Hughes Joseph Murray, Reliability and Security of RAID Storage Systems and D2D Archives Using SATA Disk Drives, ACM Transactions on

[zfs-discuss] Re: ZFS checksum error detection

2007-03-16 Thread Anton B. Rang
It's possible (if unlikely) that you are only getting checksum errors on metadata. Since ZFS always internally mirrors its metadata, even on non-redundant pools, it can recover from metadata corruption which does not affect all copies. (If there is only one LUN, the mirroring happens at

  1   2   3   >