Re: [zfs-discuss] dedup accounting anomaly / dedup experiments

2010-07-03 Thread Lutz Schumann
Actually it does if you have compression turned on and the blocks compress away to 0 bytes. See http://src.opensolaris.org/source/xref/onnv/onnv-gate/ usr/src/uts/common/fs/zfs/zio.c#zio_write_bp_init Specifically line 1005: 1005 if (psize == 0) { 1006

Re: [zfs-discuss] dedup accounting anomaly / dedup experiments

2010-07-02 Thread Lutz Schumann
Hi, I don't know about the rest of your test, but writing zeroes to a ZFS filesystem is probably not a very good test, because ZFS recognizes these blocks of zeroes and doesn't actually write anything. Unless maybe encryption is on, but maybe not even then. Not true. If I want ZFS to

[zfs-discuss] dedup accounting anomaly / dedup experiments

2010-07-01 Thread Lutz Schumann
Hello list, I wanted to test deduplication a little and did a experiment. My question was: can I dedupe infinite or is ther a upper limit ? So for that I did a very basic test. - I created a ramdisk-pool (1GB) - enabled dedup and - wrote zeros to it (in one single file) until an error is

[zfs-discuss] ZFS synchronous and asynchronous I/O

2010-05-28 Thread Lutz Schumann
Hello ZFS guru's on the list :) I started ZFS approx 1.something years ago and I'm following the discussions here for some time now. What confused my all the time is the different parameters and ZFS tunables and how they affect data integrity and availability. Now I took some time and tried

[zfs-discuss] ZIL Ramdisk Tests and very poor OpenSolaris Ramdisk Performance

2010-05-12 Thread Lutz Schumann
Hello, probably a lot of people have done this, now its my time. I wanted to test the performance of comstar over 8 GB FC. My Idea was to create a pool from a ramdisk, a thin provisioned zvol over it and so some benchmarks. However performance is worse then to the disk backend. So I measured

[zfs-discuss] ZFS Disk Drive Qualification

2010-05-09 Thread Lutz Schumann
Hello, I see strange behaviour when qualifying disk drives for ZFS. The tests I want to run should make sure that the drives honour the cache flush command. For this I do the following: 1) Create singe disk pools (only one disk in the pool) 2) Perorm I/O on the pools This is done via SQLIte

Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-05-08 Thread Lutz Schumann
I have to come back to this issue after a while cause it just hit me. I have a VMWare vSphere 4 test host. I have various machines in there to do tests for performance and other stuff. So a lot of IO/ benchmarks are done and a lot of data is created during this benchmarks. The vSphere test

Re: [zfs-discuss] Exporting iSCSI - it's still getting all the ZFS protection, right?

2010-05-08 Thread Lutz Schumann
Everything that has readed the storage will be written to disk as sent. However watch our for the writeback cache setting of comstar. If you enable a writeback cache AND your machine boots very fast ( 2 Minutes), you may have data integrity issues because Windows thinks the target was just

Re: [zfs-discuss] Spare in use althought disk is healthy ?

2010-05-02 Thread Lutz Schumann
Hello, thanks for the feedback and sorry for the delay in answering. I checked the log and the fmadm. It seems the log does not show changes, however fmadm shows: Apr 23 2010 18:32:26.363495457 ereport.io.scsi.cmd.disk.dev.rqs.derr Apr 23 2010 18:32:26.363482031

Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-05-01 Thread Lutz Schumann
I was going though this posting and it seems that were is some personal tension :). However going back to the technical problem of scrubbing a 200 TB pool I think this issue needs to be addressed. One warning up front: This writing is rather long, and if you like to jump to the part dealing

[zfs-discuss] Spare in use althought disk is healthy ?

2010-04-26 Thread Lutz Schumann
Hello list, a pool shows some strange status: volume: zfs01vol state: ONLINE scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38 2010 config: NAME STATE READ WRITE CKSUM zfs01vol ONLINE 0 0 0 mirror ONLINE

[zfs-discuss] Which zfs options are replicated

2010-04-04 Thread Lutz Schumann
Hello list, I started playing aroud with Comstar in snv_134. In snv_116 version of ZFS, a new hidden property for the Comstar MetaData has been intoduced (stmf_sbd_lu). This makes it possible to migrate from legacy (iscsi target daemon) to Comstar without data loss, which is great. Before

Re: [zfs-discuss] RAID-Z with Permanent errors detected in files

2010-04-02 Thread Lutz Schumann
I guess it will then remain a mystery how did this happen, since I'm very careful when engaging the commands and I'm sure that I didn't miss the raidz parameter. You can be sure by calling zpool history. Robert -- This message posted from opensolaris.org

Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-03-30 Thread Lutz Schumann
Hello, wanted to know if there are any updates on this topic ? Regards, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS priorization for filesystems / zvols ?

2010-03-07 Thread Lutz Schumann
Hello list, when consolidating storage services, it may be required to prioritize I/O. e.g. the important SAP database get all the I/O we can deliver and that it needs. The test systems should use whats left. While this is a difficult topic in disk based systems (even little I/o with long

[zfs-discuss] Clear vdev information from disk

2010-02-28 Thread Lutz Schumann
Hello list, it is damn difficult to destroy ZFS labels :) I try to remove the vedev labels of disks used in a pool before. According to http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf I created a script that removes the first 512 KB and the last 512 KB,

[zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Lutz Schumann
Hello list, ZFS can be used in both file level (zfs) and block level access (zvol). When using zvols, those are always thin provisioned (space is allocated on first write). We use zvols with comstar to do iSCSI and FC access - and exuse me in advance - but this may also be a more comstar

Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Lutz Schumann
This would be an idea and I thought about this. However I see the following problems: 1) using deduplication This will reduce the on disk size however the DDT will grow forever and for the deletion of zvols this will mean a lot of time and work (see other threads regarding DDT memory issues

Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Lutz Schumann
I use the Intel X25-V and I like it :) Actuall I have 2 in a striped setup. 40 MB Write / Sec (just enought for ZIL filling) something like 130 MB / sec reads. Just enough. -- This message posted from opensolaris.org ___ zfs-discuss mailing list

Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Lutz Schumann
with the Intel product...but save a few more pennies up and get the X-25M. The extra boost on read and write performance is worth it. Or use multiple X25-V (L2ARC is not filled fast anyhow, so write does not matter). You can get 4 of them for 1 160 GB X25-M. With 4 X25-V you get ~500 MB /sec

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-24 Thread Lutz Schumann
I fully agree. This needs fixing. I can think of so many situations, where device names change in OpenSolaris (especially with movable pools). This problem can lead to serious data corruption. Besides persistent L2ARC (which is much more difficult I would say) - Making L2ARC also rely on

[zfs-discuss] Is there something like udev in OpenSolaris

2010-02-20 Thread Lutz Schumann
Hello list, beeing a Linux Guy I'm actually quite new to Opensolaris. One thing I miss is udev. I found that when using SATA disks with ZFS - it always required manual intervention (cfgadm) to do SATA hot plug. I would like to automate the disk replacement, so that it is a fully automatic

[zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Lutz Schumann
Hello, an idea popped into my mind while talking about security and intrusion detection. Host based ID may use Checksumming for file change tracking. It works like this: Once installed and knowning the software is OK, a baseline is created. Then in every check - verify the current status

Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Lutz Schumann
Only with the zdb(1M) tool but note that the checksums are NOT of files but of the ZFS blocks. Thanks - bocks, right (doh) - thats what I was missing. Damn it would be so nice :( -- This message posted from opensolaris.org ___ zfs-discuss mailing

[zfs-discuss] disk devices missing but zfs uses them ?

2010-02-07 Thread Lutz Schumann
Hello I have a strange issue, I'm having a setup with 24 disk enclosure connected with LSI3801-R. I created two pools. Pool have 24 healthy disks. I disabled LUN persistency on the LSI adapter. When I cold boot the server (power off by pulling all power cables), a warning is shown on the

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-01 Thread Lutz Schumann
I tested some more and found that Pool disks are picked UP. Head1: Cachedevice1 (c0t0d0) Head2: Cachedevice2 (c0t0d0) Pool: Shared, c1tXdY I created a pool on shared storage. Added the cache device on Head1. Switched the pool to Head2 (export + import). Created a pool on head1 containing

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-01 Thread Lutz Schumann
Created a pool on head1 containing just the cache device (c0t0d0). This is not possible, unless there is a bug. You cannot create a pool with only a cache device. I have verified this on b131: # zpool create norealpool cache /dev/ramdisk/rc1 1 invalid vdev

Re: [zfs-discuss] b131 OpenSol x86, 4 disk RAIDZ-1 scrub performance way down.

2010-02-01 Thread Lutz Schumann
When you send data (as the mate did) , all data is rewritten and the settings you made (dedupe etc.) are effectivly applied. If you change a parameter (dedupe, compression) this holds only true for NEWLY written data. If you do not cange data, all data is still duped. Also when you send, all

Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Lutz Schumann
Some very interesting insights on the availability calculations: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl For streaming also look at: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6732803 Regards, Robert -- This message posted from

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-28 Thread Lutz Schumann
Actuall I tested this. If I add a l2arc device to the syspool it is not used when issueing I/O to the data pool (note: on root pool it must no be a whole disk, but only a slice of it otherwise ZFS complains that root disks may not contain some EFI label). So this does not work -

Re: [zfs-discuss] Sun Storage J4400 SATA Interposer Card

2010-01-28 Thread Lutz Schumann
No picture, but something like this: http://www.provantage.com/supermicro-aoc-smp-lsiss9252~7SUP91MC.htm ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

[zfs-discuss] Large scale ZFS deployments out there (200 disks)

2010-01-28 Thread Lutz Schumann
While thinking about ZFS as the next generation filesystem without limits I am wondering if the real world is ready for this kind of incredible technology ... I'm actually speaking of hardware :) ZFS can handle a lot of devices. Once in the import bug

Re: [zfs-discuss] raidz using partitions

2010-01-28 Thread Lutz Schumann
Also write performance may drop because of write dache disable: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools Just a hint, have not tested this. Robert -- This message posted from opensolaris.org ___

Re: [zfs-discuss] ZFS cache flush ignored by certain devices ?

2010-01-25 Thread Lutz Schumann
One problem with the write cache is that I do not know if it is needed for write wearing ? As mentioned, disabeling write cache might be ok in terms of performance (I want to use MLC SSD as data disks, not as ZIL, to have a SSD only appliance - I'm looking for read speed for dedupe, zfs send

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-24 Thread Lutz Schumann
Thanks for the feedback Richard. Does that mean that the L2ARC can be part of ANY pool and that there is only ONE L2ARC for all pools active on the machine ? Thesis: - There is one L2ARC on the machine for all pools - all Pools active share the same L2ARC - the L2ARC can be part of any

[zfs-discuss] Drive Identification

2010-01-24 Thread Lutz Schumann
Is there a way (besides format and causing heavy I/O on the device in question) how to identify a drive. Is there some kind of SES (enclosure service) for this ?? (e.g. and now let the red led blink) Regards, Robert -- This message posted from opensolaris.org

[zfs-discuss] Degrated pool menbers excluded from writes ?

2010-01-24 Thread Lutz Schumann
Hello, I'm testing with snv_131 (nexentacore 3 alpha 4). I did a bonnie benchmark to my disks and pulled a disk whil benchmarking. Everything went smoothly,however I found that the now degrated device is excluded from the writes. So this is my pool after I have pulled the disk pool:

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-23 Thread Lutz Schumann
Hi, i found some time and was able to test again. - verify with unique uid of the device - verify with autoreplace = off Indeed autoreplace was set to yes for the pools. So I disabled the autoreplace. VOL PROPERTY VALUE SOURCE nxvol2 autoreplaceoff default

[zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Lutz Schumann
Hello, we tested clustering with ZFS and the setup looks like this: - 2 head nodes (nodea, nodeb) - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc) - two external jbods - two mirror zpools (pool1,pool2) - each mirror is a mirror of one disk from each jbod - no ZIL (anyone knows

Re: [zfs-discuss] Mirror of SAN Boxes with ZFS ? (split site mirror)

2010-01-20 Thread Lutz Schumann
Actually I found some time (and reason) to test this. Environment: - 1 osol server - one SLES10 iSCSI Target - two LUN's exported via iSCSi to the OSol server I did some rescilver tests to see how ZFS resilvers devices. Prep: osol: create a pool (myiscsi) with one mirror pair made from the

Re: [zfs-discuss] Is the disk a member of a zpool?

2010-01-15 Thread Lutz Schumann
The on Disk Layout is shown here: http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf You can use the name value pairs in the vdev label. ( I guess). Unfortunately I do not know any scripts. -- This message posted from opensolaris.org

Re: [zfs-discuss] ZFS cache flush ignored by certain devices ?

2010-01-11 Thread Lutz Schumann
Maybe it is lost in this much text :) .. thus this re-post Does anyone know the impact of disabeling the write cache for the write amplification factor of the intel SSD's ? How can I permanently disable the write cache on the Intel X25-M SSD's ? Thanks, Robert -- This message posted from

Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-11 Thread Lutz Schumann
Ok, tested this myself ... (same hardware used for both tests) OpenSolaris svn_104 (actually Nexenta Core 2): 100 Snaps r...@nexenta:/volumes# time for i in $(seq 1 100); do zfs snapshot ssd/v...@test1_$i; done real0m24.991s user0m0.297s sys 0m0.679s Import:

Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-11 Thread Lutz Schumann
Cause you mention the fixed / bugs I have a more general question. Is there a way to see all commits to OSOL that are related to a Bug Report ? Background: I'm interested in how e.g. the zfs import bug was fixed. -- This message posted from opensolaris.org

Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-11 Thread Lutz Schumann
.. however ... a lot of snaps still have a impact on system performance. After the import of the 1 snaps volume, I saw devfsadm eating up all CPU: If you are snapshotting ZFS volumes, then each will create an entry in the device tree. In other words, if these were file systems

Re: [zfs-discuss] internal backup power supplies?

2010-01-11 Thread Lutz Schumann
Actually for the ZIL you may use the a-card (memory sata disk + bbu + compact flash write out). For the data disks there is no solution yet - would be nice. However I prefer the supercapacitor on disk method. Why ? because the recharge logic is chellenging. There needs to be communication

[zfs-discuss] ZFS cache flush ignored by certain devices ?

2010-01-10 Thread Lutz Schumann
A very interesting thread (http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/) and some thinking about the design of SSD's lead to a experiment I did with the Intel X25-M SSD. The question was: Is my data safe, once it has reached the

Re: [zfs-discuss] ZFS cache flush ignored by certain devices ?

2010-01-10 Thread Lutz Schumann
I managed to disable the write cache (did not know a tool on Solaris, hoever hdadm from the EON NAS binary_kit does the job): Same power discuption test with Seagate HDD and write cache disabled ... ---

Re: [zfs-discuss] ZFS cache flush ignored by certain devices ?

2010-01-10 Thread Lutz Schumann
Actually the performance decrease when disableing the write cache on the SSD is aprox 3x (aka 66%). Setup: node1 = Linux Client with open-iscsi server = comstar (cache=write through) + zvol (recordsize=8k, compression=off) --- with SSD-Disk-write cache disabled: node1:/mnt/ssd# iozone

Re: [zfs-discuss] ssd pool + ssd cache ?

2010-01-09 Thread Lutz Schumann
Depends. a) Pool design 5 x SSD as raidZ = 4 SSD space - read I/O performance of one drive Adding 5 cheap 40 GB L2ARC device (which are pooled) increases the read performance for your working window of 200 GB. If you have a pool of mirrors - adding L2ARC does not make sence. b) SSD type Is

Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-09 Thread Lutz Schumann
I finally managed to resolve this. I received some useful info from Richard Elling (without List CC): (ME) However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail. (Richard) Yes, this is a little bit odd. The sd driver should be in

[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Hello, today I wanted to test that the failure of the L2ARC device is not crucial to the pool. I added a Intel X25-M Postville (160GB) as cache device to a 54 disk mittor pool. Then I startet a SYNC iozone on the pool: iozone -ec -r 32k -s 2048m -l 2 -i 0 -i 2 -o Pool: pool mirror-0

Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Ok, I now waited 30 minutes - still hung. After that I pulled the SATA cable to the L2ARC device also - still no success (I waited 10 minutes). After 10 minutes I put the L2ARC device back (SATA + Power) 20 seconds after that the system continues to run. dmesg shows: Jan 8 15:41:57

Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Ok, after browsing I found that the sata disks are not shown via cfgadm. I found http://opensolaris.org/jive/message.jspa?messageID=287791tstart=0 which states that you have to set the mode to AHCI to enable hot-plug etc. However I sill think, also the plain IDE driver needs a timeout to hande

Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread Lutz Schumann
See the reads on the pool with the low I/O ? I suspect reading the DDT causes the writes to slow down. See this bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566. It seems to give some backgrounds. Can you test setting the primarycache=metadata on the volume you test ?

Re: [zfs-discuss] ZFS Web Administration?

2010-01-07 Thread Lutz Schumann
You could use NexetaStor (www.nexenta.com) which is a commercial storage appliance, however - it based on opensolaris, but it is not just a package to install. Also there is EONStor (www.genunix.org). -- This message posted from opensolaris.org ___

Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-06 Thread Lutz Schumann
Snapshots do not impact write performance. Deletion of the snapshots seems to be also a constant operation (time taken = number of snapshots x some time). However see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6761786. When importing a pool with many snapshots (which happens

[zfs-discuss] Mirror of SAN Boxes with ZFS ? (split site mirror)

2009-12-22 Thread Lutz Schumann
Hello, I'm thinking about a setup that looks like this: - 2 headnodes with FC connectivity (OpenSolaris) - 2 backend FC srtorages (Disk Shelves with RAID Controllers presenting a huge 15 TB RAID5) - 2 datacenters (distance 1 km with dark fibre) - one headnode and one storage in each data