Re: [zfs-discuss] improve meta data performance

2010-02-19 Thread Kjetil Torgrim Homme
Chris Banal cba...@gmail.com writes:

 We have a SunFire X4500 running Solaris 10U5 which does about 5-8k nfs
 ops of which about 90% are meta data. In hind sight it would have been
 significantly better  to use a mirrored configuration but we opted for
 4 x (9+2) raidz2 at the time. We can not take the downtime necessary
 to change the zpool configuration.

 We need to improve the meta data performance with little to no
 money. Does anyone have any suggestions?

I believe the latest Solaris update will improve metadata caching.
always good to be up-to-date on patches, no?

 Is there such a thing as a Sun supported NVRAM PCI-X card compatible
 with the X4500 which can be used as an L2ARC?

I think they only have PCIe, and it hardly qualifies as little to no
money.

  http://www.sun.com/storage/disk_systems/sss/f20/specs.xml

I'll second the recommendations for Intel X25-M for L2ARC if you can
spare a SATA slot for it.
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirrored boot disks

2010-02-19 Thread Terry Hull
Interestingly, with the machine running, I can pull the first drive in the 
mirror, replace it with an unformatted one, format it, mirror rpool over to it, 
install the boot loader, and at that point the machine will boot with no 
problems.   It s just when the first disk is missing that I have a problem with 
it. 

--
Terry
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirrored boot disks

2010-02-19 Thread Fajar A. Nugraha
On Fri, Feb 19, 2010 at 7:42 PM, Terry Hull t...@nrg-inc.com wrote:
 Interestingly, with the machine running, I can pull the first drive in the 
 mirror, replace it with an unformatted one, format it, mirror rpool over to 
 it, install the boot loader, and at that point the machine will boot with no 
 problems.   It s just when the first disk is missing that I have a problem 
 with it.

I had a problem cloning a disk for xVM domU where it hangs just after
displaying hostname, similar to your result. I had to boot with
livecd, force-import and export the pool, and reboot. It works. So you
might want to try that.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Growing ZFS Volume with SMI/VTOC label

2010-02-19 Thread Tony MacDoodle
Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC label
without losing data as the OS is built on this volume?


Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk controllers changing the names of disks

2010-02-19 Thread Markus Kovero
 I am curious how admins are dealing with controllers like the Dell Perc 5 and 
 6 that can  change the device name on a disk if a disk fails and the machine 
 reboots.   These  
 controllers are not nicely behaved in that they happily fill in the device 
 numbers for 
 the physical drive that is missing.  In that case, how can you recover the 
 zpool that was  on the disk?   I understand if the pool was exported, you 
 can then re-import it.   
 However, what happens if the machine completely dies and you have no chance 
 to export the  pool? 

 --
 Terry
 -- 

You still can import it, Although you might loose some inflight data that was 
going in during crash and it can take a while during import to finish 
transactions, anyway, it will be fine.

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing ZFS Volume with SMI/VTOC label

2010-02-19 Thread Tony MacDoodle
So in a ZFS boot disk configuration (rpool) in a running environment, it's
not possible?

On Fri, Feb 19, 2010 at 9:25 AM, casper@sun.com wrote:



 Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC
 label
 without losing data as the OS is built on this volume?


 Sure as long as the new partition starts on the same block and is longer.

 It was a bit more difficult with UFS but for zfs it is very simple.

 I had a few systems with two ufs root slices using live upgrade:

slice 1slice 2swap

 First I booted from slice 2
 ludelete slice1
 zpool create rpool slice1
 lucreate -p rpool
 luactivate slice1
 init 6
 from the zfs root:
 ludelete slice2
 format:
 remove slice2;
 grow slice1 to incorporate slice2
 label

 At that time I needed to reboot to get the new device size reflected in
 zpool list; today that is no longer needed

 Casper


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing ZFS Volume with SMI/VTOC label

2010-02-19 Thread Casper . Dik

So in a ZFS boot disk configuration (rpool) in a running environment, it's
not possible?

The example I have does grows the rpool while running from the rpool.

But you need a recent version of zfs to grow the pool while it is in use.

On Fri, Feb 19, 2010 at 9:25 AM, casper@sun.com wrote:



 Is it possible to grow a ZFS volume on a SPARC system with a SMI/VTOC
 label
 without losing data as the OS is built on this volume?


 Sure as long as the new partition starts on the same block and is longer.

 It was a bit more difficult with UFS but for zfs it is very simple.

 I had a few systems with two ufs root slices using live upgrade:

slice 1slice 2swap

 First I booted from slice 2
 ludelete slice1
 zpool create rpool slice1
 lucreate -p rpool
 luactivate slice1
 init 6
 from the zfs root:
 ludelete slice2
 format:
 remove slice2;
 grow slice1 to incorporate slice2
 label

 At that time I needed to reboot to get the new device size reflected in
 zpool list; today that is no longer needed

 Casper



--Boundary_(ID_oehH7aQu3QEaJqsmuxeYyA)
Content-type: text/html; charset=ISO-8859-1
Content-transfer-encoding: QUOTED-PRINTABLE

So in a ZFS boot disk configuration (rpool) in a running environment,=
 it#39;s not possible?brbrdiv class=3Dgmail_quoteOn Fri, Feb=
 19, 2010 at 9:25 AM,  span dir=3Dltrlt;a href=3Dmailto:Casper=
@sun.comcasper@sun.com/agt;/span wrote:br
blockquote class=3Dgmail_quote style=3Dmargin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex;div class=3Dimbr
br
gt;Is it possible to grow a ZFS volume on a SPARC system with a SMI/=
VTOC labelbr
gt;without losing data as the OS is built on this volume?br
br
br
/divSure as long as the new partition starts on the same block and =
is longer.br
br
It was a bit more difficult with UFS but for zfs it is very simple.b=
r
br
I had a few systems with two ufs root slices using live upgrade:br
br
 =A0 =A0 =A0 =A0lt;slice 1gt;lt;slice 2gt;lt;swapgt;br
br
First I booted from lt;slice 2gt;br
ludelete quot;slice1quot;br
zpool create rpool quot;slice1quot;br
lucreate -p rpoolbr
luactivate slice1br
init 6br
=66rom the zfs root:br
ludelete slice2br
format:br
 =A0 =A0 =A0 =A0 remove slice2;br
 =A0 =A0 =A0 =A0 grow slice1 to incorporate slice2br
 =A0 =A0 =A0 =A0 labelbr
br
At that time I needed to reboot to get the new device size reflected =
inbr
zpool list; today that is no longer neededbr
br
Casperbr
br
/blockquote/divbr

--Boundary_(ID_oehH7aQu3QEaJqsmuxeYyA)--


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirrored boot disks

2010-02-19 Thread David Dyer-Bennet

On Fri, February 19, 2010 00:32, Terry Hull wrote:
 I have a machine with the Supermicro 8 port SATA card installed.  I have
 had no problem creating a mirrored boot disk using the oft-repeated
 scheme:

 prtvtoc /dev/rdsk/c4t0d0s2 | fmthard -s – /dev/rdsk/c4t1d0s2
 zpool attach rpool c4t0d0s0 c4t1d0s0
 wait for sync
 installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t1d0s0

 Unfortunately when I shut the machine down and remove the primary boot
 disk, it will no longer boot.  I get the boot loader, and if I turn off
 the splash screen I see it get to the point of displaying the host name.
 At that point, it hangs forever.   From the posts I've seen it looks like
 this is a very standard scheme that just works.  What can be missing with
 my procedure.

 I am running Build 132, if that matters.

Disk boot order in your bios?

I know that I succeeded in booting off the third (of four) disks in a
mirror group Wednesday evening, but only after altering the disk boot
order in the bios.   Using that exact controller card, come to think of
it.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS unit of compression

2010-02-19 Thread Thanos Makatos
Hello.

I want to know what is the unit of compression in ZFS. Is it 4 KB or larger? Is 
it tunnable?

Thanks.

Thanos
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS unit of compression

2010-02-19 Thread Darren J Moffat

On 19/02/2010 15:43, Thanos Makatos wrote:

Hello.

I want to know what is the unit of compression in ZFS. Is it 4 KB or larger? Is 
it tunnable?


I don't understand what you mean.

For user data ZFS compresses ZFS blocks these would be 512 bytes minimum 
upto 128k maximum and depend on the configuration of the dataset 
(recordsize property) and the write pattern of the applications using it.


If a block doesn't compress by more than 12.5% ZFS stores the 
uncompressed data instead - note this is not tunable and is hardcoded to 
the same value of all compression methods.


The only tunnables for compression are selecting a different 
compression algorithm for the filesystem.


What problem do you think you have or are you trying to solve ?

If you read the source for the lzjb algorithm used in ZFS the lempel 
size is 1k, is that what you mean ?


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Edward Ned Harvey
One more thing I¹d like to add here:

The PERC cache measurably and significantly accelerates small disk writes.
However, for read operations, it is insignificant compared to system ram,
both in terms of size and speed.  There is no significant performance
improvement by enabling adaptive readahead in the PERC.  I will recommend
instead, the PERC should be enabled for Write Back, and have the readahead
disabled.  Fortunately this is the default configuration on a new perc
volume, so unless you changed it, you should be fine.

It may be smart to double check, and ensure your OS does adaptive readahead.

In Linux (rhel/centos) you can check that the ³readahead² service is
loading.  I noticed this is enabled by default in runlevel 5, but disabled
by default in runlevel 3.  Interesting.

I don¹t know how to check solaris or opensolaris, to ensure adaptive
readahead is enabled.




On 2/18/10 8:08 AM, Edward Ned Harvey sola...@nedharvey.com wrote:

 Ok, I¹ve done all the tests I plan to complete.  For highest performance, it
 seems:
 ·The measure I think is the most relevant for typical operation is the
 fastest random read /write / mix.  (Thanks Bob, for suggesting I do this
 test.)
 The winner is clearly striped mirrors in ZFS
 
 ·The fastest sustained sequential write is striped mirrors via ZFS, or
 maybe raidz
 
 ·The fastest sustained sequential read is striped mirrors via ZFS, or
 maybe raidz
 
  
 Here are the results:
 ·Results summary of Bob's method
 
http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.pd
f
 
 ·Raw results of Bob's method
 http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip
 
 ·Results summary of Ned's method
 
http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.pd
f
 
 ·Raw results of Ned's method
 http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip
 
  
  
  
  
  
 
 From: Edward Ned Harvey [mailto:sola...@nedharvey.com]
 Sent: Saturday, February 13, 2010 9:07 AM
 To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org
 Subject: ZFS performance benchmarks in various configurations
  
 I have a new server, with 7 disks in it.  I am performing benchmarks on it
 before putting it into production, to substantiate claims I make, like
 ³striping mirrors is faster than raidz² and so on.  Would anybody like me to
 test any particular configuration? Unfortunately I don¹t have any SSD, so I
 can¹t do any meaningful test on the ZIL etc.  Unless someone in the Boston
 area has a 2.5² SAS SSD they wouldn¹t mind lending for a few hours.  ;-)
  
 My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I
 pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks
 when the file operations aren¹t all cached.)  ;-)  Solaris 10 10/09.  PERC 6/i
 controller.  All disks are configured in PERC for Adaptive ReadAhead, and
 Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is occupying 1
 disk, so I have 6 disks to play with.
  
 I am currently running the following tests:
  
 Will test, including the time to flush(), various record sizes inside file
 sizes up to 16G, sequential write and sequential read. Not doing any mixed
 read/write requests.  Not doing any random read/write.
 iozone -Reab somefile.wks -g 17G -i 1 -i 0
  
 Configurations being tested:
 ·Single disk
 
 ·2-way mirror
 
 ·3-way mirror
 
 ·4-way mirror
 
 ·5-way mirror
 
 ·6-way mirror
 
 ·Two mirrors striped (or concatenated)
 
 ·Three mirrors striped (or concatenated)
 
 ·5-disk raidz
 
 ·6-disk raidz
 
 ·6-disk raidz2
 
  
 Hypothesized results:
 ·N-way mirrors write at the same speed of a single disk
 
 ·N-way mirrors read n-times faster than a single disk
 
 ·Two mirrors striped read and write 2x faster than a single mirror
 
 ·Three mirrors striped read and write 3x faster than a single mirror
 
 ·Raidz and raidz2:  No hypothesis. Some people say they perform
 comparable to many disks working together. Some people say it¹s slower than a
 single disk.  Waiting to see the results.
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Günther
hello
i have made some benchmarks with my napp-it zfs-serverbr
a href=http://www.napp-it.org/bench.pdf; target=_blankscreenshot/abr
br
a href=http://www.napp-it.org/bench.pdf; 
target=_blankwww.napp-it.org/bench.pdf/abr
br
- 2gb vs 4 gb vs 8 gb rambr
- mirror vs raidz vs raidz2 vs raidz3br
- dedup and compress enabled vs disabledbr
br
result in short:br
8gb ram vs 2 Gb: + 10% .. +500% more power (green drives)br
compress and dedup enabled: + 50% .. +300%br
mirror vs Raidz: fastest is raidz, slowest mirror, raidz level +/-20%br
br
br
gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Felix Buenemann

Hi,

I'm currently testing a Mtron Pro 7500 16GB SLC SSD as a ZIL device and 
seeing very poor performance for small file writes via NFS.


Copying a source code directory with around 4000 small files to the ZFS 
pool over NFS without the SSD log device yields around 1000 IOPS (pool 
of 8 sata shared mirrors).


When adding the SSD as ZIL, performance drops to 50 IOPS!

I can see similarly poor performance when creating a ZFS pool on the SSD 
and sharing it via NFS. However copy the files locally on the server 
from the sata to the ssd pool only takes a few seconds.


The SSD's specs reveal:
sequential r/w 512B: 83,000/51,000
sequential r/w 4KB: 21,000/13,000
random r/w 512B: 19,000/130
random r/w 4KB: 12,000/130

So it is apparent, that the SSD has really poor random writes.

But I was under the impression, that the ZIL is mostly sequential writes 
or was I misinformed here?


Maybe the cache syncs bring the device to it's knees?

Best Regards,
Felix Buenemann

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Bob Friesenhahn

On Fri, 19 Feb 2010, Felix Buenemann wrote:


So it is apparent, that the SSD has really poor random writes.

But I was under the impression, that the ZIL is mostly sequential writes or 
was I misinformed here?


Maybe the cache syncs bring the device to it's knees?


That's what it seems like.  This particular device must actually being 
obeying the cache sync request rather than just pretending to like 
many SSDs.


Most SSDs are very good at seeking, and very good at random reads, but 
most are rather poor at small synchronous writes.  The ones which are 
good at small synchronous writes cost more.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Felix Buenemann

Am 19.02.10 19:30, schrieb Bob Friesenhahn:

On Fri, 19 Feb 2010, Felix Buenemann wrote:


So it is apparent, that the SSD has really poor random writes.

But I was under the impression, that the ZIL is mostly sequential
writes or was I misinformed here?

Maybe the cache syncs bring the device to it's knees?


That's what it seems like. This particular device must actually being
obeying the cache sync request rather than just pretending to like many
SSDs.

Most SSDs are very good at seeking, and very good at random reads, but
most are rather poor at small synchronous writes. The ones which are
good at small synchronous writes cost more.


Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around 
300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD.




Bob


- Felix

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread David Dyer-Bennet

On Fri, February 19, 2010 12:50, Felix Buenemann wrote:


 Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around
 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD.

Well, but the Intel X25-M is the drive that really first cracked the
problem (earlier high-performance drives were hideously expensive and
rather brute force).  Which was relatively recently.  The industry is
still evolving rapidly.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Idiots Guide to Running a NAS with ZFS/OpenSolaris

2010-02-19 Thread Orvar Korvar
I can strongly recommend this series of articles
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/

Very good! :o)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Bob Friesenhahn

On Fri, 19 Feb 2010, David Dyer-Bennet wrote:


Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around
300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD.


Well, but the Intel X25-M is the drive that really first cracked the
problem (earlier high-performance drives were hideously expensive and
rather brute force).  Which was relatively recently.  The industry is
still evolving rapidly.


What is the problem is it that the X25-M cracked?  The X25-M is 
demonstrated to ignore cache sync and toss transactions.  As such, it 
is useless for a ZIL.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Felix Buenemann

Am 19.02.10 20:50, schrieb Bob Friesenhahn:

On Fri, 19 Feb 2010, David Dyer-Bennet wrote:


Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around
300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD.


Well, but the Intel X25-M is the drive that really first cracked the
problem (earlier high-performance drives were hideously expensive and
rather brute force). Which was relatively recently. The industry is
still evolving rapidly.


What is the problem is it that the X25-M cracked? The X25-M is
demonstrated to ignore cache sync and toss transactions. As such, it is
useless for a ZIL.


Yes, I see no difference with the X25-M with both zfs_nocacheflush=0 and 
zfs_nocacheflush=1. After setting zfs_nocacheflush=1, the Mtron SSD also 
performed at around 1000 IOPS, which is still useless, because the array 
performs the same IOPS without dedicated ZIL.
Looking at the X25-E (SLC) benchmarks it should be able to do about 3000 
IOPS, which would improve array performance.


I think I'll try one of thise inexpensive battery-backed PCI RAM drives 
from Gigabyte and see how much IOPS they can pull.




Bob


- Felix


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread David Dyer-Bennet

On Fri, February 19, 2010 13:50, Bob Friesenhahn wrote:
 On Fri, 19 Feb 2010, David Dyer-Bennet wrote:

 Too bad, I'm getting ~1000 IOPS with an Intel X25-M G2 MLC and around
 300 with a regular USB stick, so 50 IOPS is really poor for an SLC SSD.

 Well, but the Intel X25-M is the drive that really first cracked the
 problem (earlier high-performance drives were hideously expensive and
 rather brute force).  Which was relatively recently.  The industry is
 still evolving rapidly.

 What is the problem is it that the X25-M cracked?  The X25-M is
 demonstrated to ignore cache sync and toss transactions.  As such, it
 is useless for a ZIL.

But it's finally useful as, for example, a notebook boot drive.  No
previous vaguely affordable design was.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Marion Hakanson
felix.buenem...@googlemail.com said:
 I think I'll try one of thise inexpensive battery-backed PCI RAM drives  from
 Gigabyte and see how much IOPS they can pull. 

Another poster, Tracy Bernath, got decent ZIL IOPS from an OCZ Vertex unit.
Dunno if that's sufficient for your purposes, but it looked pretty good
for the money.

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSDs with a SCSI SCA interface?

2010-02-19 Thread Eric Sproul
On 12/ 4/09 02:06 AM, Erik Trimble wrote:
 Hey folks.
 
 I've looked around quite a bit, and I can't find something like this:
 
 I have a bunch of older systems which use Ultra320 SCA hot-swap
 connectors for their internal drives. (e.g. v20z and similar)
 
 I'd love to be able to use modern flash SSDs with these systems, but I
 have yet to find someone who makes anything that would fit the bill.
 
 I need either:
 
 (a) a SSD with an Ultra160/320 parallel interface (I can always find an
 interface adapter, so I'm not particular about whether it's a 68-pin or
 SCA)

Bitmicro makes one: http://www.bitmicro.com/products_edisk_altima_35_u320.php

They also make a version with a 4Gb FC interface.  Haven't tried either one, but
found Bitmicro when researching SSD options for a V890.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] rule of thumb for scrub

2010-02-19 Thread Harry Putnam
I think I asked this before but apparently have lost track of the
answers I got.

I'm wanting a general rule of thumb for how often to `scrub'.

My setup is a home NAS and general zfs server so it does not see heavy
use.

I'm up to build 129 and do update fairly often, just the last few
builds were a bit too problematic.

My disks are setup in 3 mirrored pairs.  They do get regular use when
my other machines access the zfs server for backups, and the nfs
served directories shared all around.

But still only home usage no business involved but maybe a bit of
a heavy hobbist user.

With that in mind what would be a good safe plan for `scrubbing'?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Richard Elling
On Feb 19, 2010, at 8:35 AM, Edward Ned Harvey wrote:
 One more thing I’d like to add here:
 
 The PERC cache measurably and significantly accelerates small disk writes.  
 However, for read operations, it is insignificant compared to system ram, 
 both in terms of size and speed.  There is no significant performance 
 improvement by enabling adaptive readahead in the PERC.  I will recommend 
 instead, the PERC should be enabled for Write Back, and have the readahead 
 disabled.  Fortunately this is the default configuration on a new perc 
 volume, so unless you changed it, you should be fine.
 
 It may be smart to double check, and ensure your OS does adaptive readahead.  
 In Linux (rhel/centos) you can check that the “readahead” service is loading. 
  I noticed this is enabled by default in runlevel 5, but disabled by default 
 in runlevel 3.  Interesting.
 
 I don’t know how to check solaris or opensolaris, to ensure adaptive 
 readahead is enabled.

ZFS has intelligent prefetching.  AFAIK, Solaris disk drivers do not prefetch.

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad

On 18 feb 2010, at 13.55, Phil Harman wrote:

...
 Whilst the latest bug fixes put the world to rights again with respect to 
 correctness, it may be that some of our performance workaround are still 
 unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
 nonvolatile storage, I'd better be pretty sure of the failure modes before I 
 work around that).

But are there any clients that assume that an iSCSI volume is synchronous?

Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 17.35, Edward Ned Harvey wrote:

 The PERC cache measurably and significantly accelerates small disk writes.  
 However, for read operations, it is insignificant compared to system ram, 
 both in terms of size and speed.  There is no significant performance 
 improvement by enabling adaptive readahead in the PERC.  I will recommend 
 instead, the PERC should be enabled for Write Back, and have the readahead 
 disabled.  Fortunately this is the default configuration on a new perc 
 volume, so unless you changed it, you should be fine.

If I understand correctly, ZFS now adays will only flush data to
non volatile storage (such as a RAID controller NVRAM), and not
all the way out to disks. (To solve performance problems with some
storage systems, and I believe that it also is the right thing
to do under normal circumstances.)

Doesn't this mean that if you enable write back, and you have
a single, non-mirrored raid-controller, and your raid controller
dies on you so that you loose the contents of the nvram, you have
a potentially corrupt file system?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Felix Buenemann

Am 19.02.10 21:29, schrieb Marion Hakanson:

felix.buenem...@googlemail.com said:

I think I'll try one of thise inexpensive battery-backed PCI RAM drives  from
Gigabyte and see how much IOPS they can pull.


Another poster, Tracy Bernath, got decent ZIL IOPS from an OCZ Vertex unit.
Dunno if that's sufficient for your purposes, but it looked pretty good
for the money.


I found the Hyperdrive 5/5M, which is a half-height drive bay sata 
ramdisk with battery backup and auto-backup to compact flash at power 
failure.
Promises 65,000 IOPS and thus should be great for ZIL. It's pretty 
reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should 
be more than sufficient.


http://www.hyperossystems.co.uk/07042003/hardware.htm



Marion


- Felix


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ross Walker

On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad ra...@csc.kth.se wrote:



On 18 feb 2010, at 13.55, Phil Harman wrote:

...
Whilst the latest bug fixes put the world to rights again with  
respect to correctness, it may be that some of our performance  
workaround are still unsafe (i.e. if my iSCSI client assumes all  
writes are synchronised to nonvolatile storage, I'd better be  
pretty sure of the failure modes before I work around that).


But are there any clients that assume that an iSCSI volume is  
synchronous?


Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?


That was my argument a while back.

If you use /dev/dsk then all writes should be asynchronous and WCE  
should be on and the initiator should issue a 'sync' to make sure it's  
in NV storage, if you use /dev/rdsk all writes should be synchronous  
and WCE should be off. RCD should be off in all cases and the ARC  
should cache all it can.


Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the  
initiator flags write cache is the wrong way to go about it. It's more  
complicated then it needs to be and it leaves setting the storage  
policy up to the system admin rather then the storage admin.


It would be better to put effort into supporting FUA and DPO options  
in the target then dynamically changing a volume's cache policy from  
the initiator side.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rule of thumb for scrub

2010-02-19 Thread Cindy Swearingen

Hi Harry,

Our current scrubbing guideline is described here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Run zpool scrub on a regular basis to identify data integrity problems.
If you have consumer-quality drives, consider a weekly scrubbing
schedule. If you have datacenter-quality drives, consider a monthly
scrubbing schedule.

Thanks,

Cindy

On 02/19/10 14:28, Harry Putnam wrote:

I think I asked this before but apparently have lost track of the
answers I got.

I'm wanting a general rule of thumb for how often to `scrub'.

My setup is a home NAS and general zfs server so it does not see heavy
use.

I'm up to build 129 and do update fairly often, just the last few
builds were a bit too problematic.

My disks are setup in 3 mirrored pairs.  They do get regular use when
my other machines access the zfs server for backups, and the nfs
served directories shared all around.

But still only home usage no business involved but maybe a bit of
a heavy hobbist user.

With that in mind what would be a good safe plan for `scrubbing'?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lost disk geometry

2010-02-19 Thread Daniel Carosone
On Fri, Feb 19, 2010 at 01:15:17PM -0600, David Dyer-Bennet wrote:
 
 On Fri, February 19, 2010 13:09, David Dyer-Bennet wrote:
 
  Anybody know what the proper geometry is for a WD1600BEKT-6-1A13?  It's
  not even in the data sheets any more!

any such geometry has been entirely fictitious since ZBR disks emerged
in, oh, about 1990.

 One further point -- I can't seem to enter the geometry the second disk
 has manually for the first; when I enter 152615 for number of sectors, it
 says this is out of range.

It's probably reading some garbage as a label.  dd 0's over the start
of it and try again, perhaps with a hotplug or reboot in between if
necessary. 

--
Dan.

pgpTu9Ms6QbRj.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Phil Harman

On 19/02/2010 21:57, Ragnar Sundblad wrote:

On 18 feb 2010, at 13.55, Phil Harman wrote:
   

Whilst the latest bug fixes put the world to rights again with respect to 
correctness, it may be that some of our performance workaround are still unsafe 
(i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile 
storage, I'd better be pretty sure of the failure modes before I work around 
that).
 

But are there any clients that assume that an iSCSI volume is synchronous?

Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?

/ragge
   


Yes, that would be nice wouldn't it? But the world is seldom that 
simple, is it? For example, Sun's first implementation of zvol was 
unsafe by default, with no cache flush option either.


A few years back we used to note that one of the reasons Solaris was 
slower than Linux at fileystems microbenchmarks was because Linux ran 
with the write caches on (whereas we would never be that foolhardy).


And then this seems to claim that NTFS may not be that smart either ...

  http://blogs.sun.com/roch/entry/iscsi_unleashed

(see the WCE Settings paragraph)

I'm only going on what I've read.

Cheers,
Phil

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk controllers changing the names of disks

2010-02-19 Thread Freddie Cash
On FreeBSD, I avoid this issue completely by labelling either the entire disk 
(via glabel(8)) or individual slices/partitions (via either glabel(8) or gpt 
labels).  Use the label name to build the vdevs.  Then it doesn't matter where 
the drive is connected, or how the device node is named/numbered, everything 
Just Works(tm).  :)

Hopefully, there are similar tools for labelling disks/partitions on Solaris 
systems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Neil Perrin



If I understand correctly, ZFS now adays will only flush data to
non volatile storage (such as a RAID controller NVRAM), and not
all the way out to disks. (To solve performance problems with some
storage systems, and I believe that it also is the right thing
to do under normal circumstances.)

Doesn't this mean that if you enable write back, and you have
a single, non-mirrored raid-controller, and your raid controller
dies on you so that you loose the contents of the nvram, you have
a potentially corrupt file system?


ZFS requires,that all writes be flushed to non-volatile storage.
This is needed for both transaction group (txg) commits to ensure pool integrity
and for the ZIL to satisfy the synchronous requirement of fsync/O_DSYNC etc.
If the caches weren't flushed then it would indeed be quicker but the pool
would be susceptible to corruption. Sadly some hardware doesn't honour
cache flushes and this can cause corruption.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Eugen Leitl
On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote:

 I found the Hyperdrive 5/5M, which is a half-height drive bay sata 
 ramdisk with battery backup and auto-backup to compact flash at power 
 failure.
 Promises 65,000 IOPS and thus should be great for ZIL. It's pretty 
 reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should 
 be more than sufficient.

Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of
system memory, and a cheap UPS?
 
 http://www.hyperossystems.co.uk/07042003/hardware.htm

-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lost disk geometry

2010-02-19 Thread David Dyer-Bennet

On Fri, February 19, 2010 16:21, Daniel Carosone wrote:
 On Fri, Feb 19, 2010 at 01:15:17PM -0600, David Dyer-Bennet wrote:

 On Fri, February 19, 2010 13:09, David Dyer-Bennet wrote:

  Anybody know what the proper geometry is for a WD1600BEKT-6-1A13?
 It's
  not even in the data sheets any more!

 any such geometry has been entirely fictitious since ZBR disks emerged
 in, oh, about 1990.

Sure, but there still have to be values put into format to satisfy it!

Had to look up ZBR, but indeed I guessed correctly that it was the
transition to variable numbers of sector per track (to give much more
uniform linear size to each sector) that you were referring to.  Yep,
totally and utterly fictitious.

 One further point -- I can't seem to enter the geometry the second disk
 has manually for the first; when I enter 152615 for number of sectors,
 it
 says this is out of range.

 It's probably reading some garbage as a label.  dd 0's over the start
 of it and try again, perhaps with a hotplug or reboot in between if
 necessary.

The details of interaction between what's already written there, and what
can be written there by the tools, are driving me quite insane (as Cindy
said the other day!).

I found some of my earlier tests weren't valid since I apparently omitted
writing out the labels in a couple of key cases.  Now I've got two
slightly different geometries going again, but they're working in the
mirror (the old disks in the mirror are much smaller, so anything that
works and gives access to over 50% of the new disk will attach to the
mirror; but I want to get it right before detaching the old disks) .

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 23.40, Eugen Leitl wrote:

 On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote:
 
 I found the Hyperdrive 5/5M, which is a half-height drive bay sata 
 ramdisk with battery backup and auto-backup to compact flash at power 
 failure.
 Promises 65,000 IOPS and thus should be great for ZIL. It's pretty 
 reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should 
 be more than sufficient.
 
 Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of
 system memory, and a cheap UPS?

System memory can't replace a slog, since a slog is supposed to be
non-volatile.

An UPS plus disabling zil, or disabling synchronization, could possibly
achieve the same result (or maybe better) iops wise.
This would probably work given that your computer never crashes
in an uncontrolled manner. If it does, some data may be lost
(and possibly the entire pool lost, if you are unlucky).

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Daniel Carosone
On Fri, Feb 19, 2010 at 11:51:29PM +0100, Ragnar Sundblad wrote:
 
 On 19 feb 2010, at 23.40, Eugen Leitl wrote:
  On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote:
  I found the Hyperdrive 5/5M, which is a half-height drive bay sata 
  ramdisk with battery backup and auto-backup to compact flash at power 
  failure.
  Promises 65,000 IOPS and thus should be great for ZIL. It's pretty 
  reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should 
  be more than sufficient.

These are the same as the acard devices we've discussed here
previously; earlier hyperdrive models were their own design.  Very
interesting, and my personal favourite, but I don't know of anyone
actually reporting results yet with them as ZIL.

If you have more memory in them than is needed for ZIL, with some
partitioning you could make a small fast pool on them for swap space
and other purposes.  I was originally looking at these for Postgres
WAL logfiles, before there was slog and on a different platform..

Also, if you have enough non-ECC memory there's a mode where it adds
its own redundancy for reduced space, which could allow reusing
existing kit - replace non-ecc system memory with ecc. 

  Wouldn't it be better investing these 300-350 EUR into 16 GByte or more of
  system memory, and a cheap UPS?
 
 System memory can't replace a slog, since a slog is supposed to be
 non-volatile.

System memory might already be maxed out, too.  

--
Dan.



pgpjosbsNcB9Y.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Toby Thain


On 19-Feb-10, at 5:40 PM, Eugen Leitl wrote:


On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote:


I found the Hyperdrive 5/5M, which is a half-height drive bay sata
ramdisk with battery backup and auto-backup to compact flash at power
failure.
Promises 65,000 IOPS and thus should be great for ZIL. It's pretty
reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC  
should

be more than sufficient.


Wouldn't it be better investing these 300-350 EUR into 16 GByte or  
more of

system memory, and a cheap UPS?



That would depend on the read/write mix, I think?

--Toby





http://www.hyperossystems.co.uk/07042003/hardware.htm


--
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Rob Logan

 An UPS plus disabling zil, or disabling synchronization, could possibly
 achieve the same result (or maybe better) iops wise.
Even with the fastest slog, disabling zil will always be faster... 
(less bytes to move)

 This would probably work given that your computer never crashes
 in an uncontrolled manner. If it does, some data may be lost
 (and possibly the entire pool lost, if you are unlucky).
the pool would never be at risk, but when your server
reboots, its clients will be confused that things
it sent, and the server promised it had saved, are gone.
For some clients, this small loss might be the loss of their 
entire dataset.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 23.20, Ross Walker wrote:

 On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad ra...@csc.kth.se wrote:
 
 
 On 18 feb 2010, at 13.55, Phil Harman wrote:
 
 ...
 Whilst the latest bug fixes put the world to rights again with respect to 
 correctness, it may be that some of our performance workaround are still 
 unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
 nonvolatile storage, I'd better be pretty sure of the failure modes before 
 I work around that).
 
 But are there any clients that assume that an iSCSI volume is synchronous?
 
 Isn't an iSCSI target supposed to behave like any other SCSI disk
 (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
 With that I mean: A disk which understands SCSI commands with an
 optional write cache that could be turned off, with cache sync
 command, and all those things.
 Put in another way, isn't is the OS/file systems responsibility to
 use the SCSI disk responsibly regardless of the underlying
 protocol?
 
 That was my argument a while back.
 
 If you use /dev/dsk then all writes should be asynchronous and WCE should be 
 on and the initiator should issue a 'sync' to make sure it's in NV storage, 
 if you use /dev/rdsk all writes should be synchronous and WCE should be off. 
 RCD should be off in all cases and the ARC should cache all it can.
 
 Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the 
 initiator flags write cache is the wrong way to go about it. It's more 
 complicated then it needs to be and it leaves setting the storage policy up 
 to the system admin rather then the storage admin.
 
 It would be better to put effort into supporting FUA and DPO options in the 
 target then dynamically changing a volume's cache policy from the initiator 
 side.

But wouldn't the most disk like behavior then be to implement all the
FUA, DPO, cache mode page, flush cache, etc, etc, have COMSTAR implement
a cache just like disks do, maybe have a user knob to set the cache size
(typically 32 MB or so on modern disks, could probably be used here too
as a default), and still use /dev/rdsk devices?

That could seem, in my naive limited little mind and humble opinion, as
a pretty good approximation of how real disks work, and no OS should have
to be more surprised than usual of how a SCSI disk works.

Maybe COMSTAR already does this, or parts of it?

Or am I wrong?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] l2arc current usage (population size)

2010-02-19 Thread Christo Kutrovsky
Hello,

How do you tell how much of your l2arc is populated? I've been looking for a 
while now, can't seem to find it.

Must be easy, as this blog entry shows it over time:

http://blogs.sun.com/brendan/entry/l2arc_screenshots

And follow up, can you tell how much of each data set is in the arc or l2arc?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Thomas Garner
 These are the same as the acard devices we've discussed here
 previously; earlier hyperdrive models were their own design.  Very
 interesting, and my personal favourite, but I don't know of anyone
 actually reporting results yet with them as ZIL.

Here's one report:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg27739.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 23.22, Phil Harman wrote:

 On 19/02/2010 21:57, Ragnar Sundblad wrote:
 On 18 feb 2010, at 13.55, Phil Harman wrote:
   
 Whilst the latest bug fixes put the world to rights again with respect to 
 correctness, it may be that some of our performance workaround are still 
 unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
 nonvolatile storage, I'd better be pretty sure of the failure modes before 
 I work around that).
 
 But are there any clients that assume that an iSCSI volume is synchronous?
 
 Isn't an iSCSI target supposed to behave like any other SCSI disk
 (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
 With that I mean: A disk which understands SCSI commands with an
 optional write cache that could be turned off, with cache sync
 command, and all those things.
 Put in another way, isn't is the OS/file systems responsibility to
 use the SCSI disk responsibly regardless of the underlying
 protocol?
 
 /ragge
   
 
 Yes, that would be nice wouldn't it? But the world is seldom that simple, is 
 it? For example, Sun's first implementation of zvol was unsafe by default, 
 with no cache flush option either.
 
 A few years back we used to note that one of the reasons Solaris was slower 
 than Linux at fileystems microbenchmarks was because Linux ran with the write 
 caches on (whereas we would never be that foolhardy).

(Exactly, and there are more better fast than safe evilness in that OS too, 
especially in the file system area. That is why I never use it for anything 
that should store anything.)

 And then this seems to claim that NTFS may not be that smart either ...
 
  http://blogs.sun.com/roch/entry/iscsi_unleashed
 
 (see the WCE Settings paragraph)
 
 I'm only going on what I've read.

But - all normal disks come with write caching enabled, so in both the Linux 
case and the NTFS case, this is how they always operate, with all disks, so why 
should an iSCSI lun behave any different?

If they can't handle the write cache (handle syncing, barriers, ordering an all 
that), they should turn the cache off, just as Solaris does in almost all cases 
except when you use an entire disk for zfs (I believe because solaris UFS was 
never really adapted to write caches). And they should do that for all SCSI 
disks.

(I seem to recall at in the bad old days you had to disable the write cache 
yourself if you should use a disk on SunOS, but that was probably because it 
wasn't standardized, and you did it with a jumper on the controller board.)

So - I just do not understand why an iSCSI lun should not try to emulate how 
all other SCSI disks work as much as possible? This must be the most compatible 
mode of operation, or am I wrong?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] l2arc current usage (population size)

2010-02-19 Thread Tomas Ögren
On 19 February, 2010 - Christo Kutrovsky sent me these 0,5K bytes:

 Hello,
 
 How do you tell how much of your l2arc is populated? I've been looking for a 
 while now, can't seem to find it.
 
 Must be easy, as this blog entry shows it over time:
 
 http://blogs.sun.com/brendan/entry/l2arc_screenshots
 
 And follow up, can you tell how much of each data set is in the arc or l2arc?

kstat -m zfs
(p, c, l2arc_size)

arc_stat.pl is good, but doesn't show l2arc..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Ragnar Sundblad

On 20 feb 2010, at 02.34, Rob Logan wrote:

 
 An UPS plus disabling zil, or disabling synchronization, could possibly
 achieve the same result (or maybe better) iops wise.
 Even with the fastest slog, disabling zil will always be faster... 
 (less bytes to move)
 
 This would probably work given that your computer never crashes
 in an uncontrolled manner. If it does, some data may be lost
 (and possibly the entire pool lost, if you are unlucky).
 the pool would never be at risk, but when your server
 reboots, its clients will be confused that things
 it sent, and the server promised it had saved, are gone.
 For some clients, this small loss might be the loss of their 
 entire dataset.

No, the entire pool shouldn't be at risk, you are right of course,
I don't know what I was thinking. Sorry!

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Felix Buenemann

Am 20.02.10 01:33, schrieb Toby Thain:


On 19-Feb-10, at 5:40 PM, Eugen Leitl wrote:


On Fri, Feb 19, 2010 at 11:17:29PM +0100, Felix Buenemann wrote:


I found the Hyperdrive 5/5M, which is a half-height drive bay sata
ramdisk with battery backup and auto-backup to compact flash at power
failure.
Promises 65,000 IOPS and thus should be great for ZIL. It's pretty
reasonable priced (~230 EUR) and stacked with 4GB or 8GB DDR2-ECC should
be more than sufficient.


Wouldn't it be better investing these 300-350 EUR into 16 GByte or
more of
system memory, and a cheap UPS?



That would depend on the read/write mix, I think?


Well the workload will include MaxDB (SAP), Exchange and file services 
(SMB), with the opensolaris box acting as a VMFS iSCSI target for VMware 
vSphere. Due to the mixed workload it's hard to predict how exactly the 
I/O distribution will look like, so I'm trying to build a system that 
can hold up in various usage scenarios.


I've been testing with NFS because it loads the ZIL heavily.

Btw. in my testing I didn't really see a performance improvement with 
ZIL disabled over on disk ZIL, but I've only been testing with a single 
NFS client. Or do I need multiple concurrent clients to benefit from 
external ZIL?


Also is there a guideline on sizing the ZIL? I think in most cases even 
1GB would be enough, but I haven't done any heavy testing.




--Toby


- Felix

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss