Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Auke Folkerts
On Wed, Dec 02, 2009 at 03:57:47AM -0800, Brian McKerr wrote:
 I previously had a linux NFS server that I had mounted 'ASYNC' and, as one 
 would expect, NFS performance was pretty good getting close to 900gb/s. Now 
 that I have moved to opensolaris,  NFS performance is not very good, I'm 
 guessing mainly due to the 'SYNC' nature of NFS.  I've seen various threads 
 and most point at 2 options;
 
 1. Disable the ZIL
 2. Add independent log device/s


We have experienced the same performance penalty using NFS over ZFS.  The 
issue is indeed caused by the synchronous nature of ZFS. More precisely, it
is caused by the fact that ZFS promises correct behaviour while eg. a linux
NFS server (using async) does not.  The issue is decribed in great detail
at http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

If you want the same behaviour as you had with your Linux NFS server,
you can disable the ZIL.  Doing so should give the same guarantees as 
the linux NFS service. 

The big issue with disabling the ZIL is that it is system-wide. Although
it could be an acceptable tradeoff for one filesystem, it is not necesarily
a good system-wide setting. That is why I think the option to disable the 
ZIL should be per-filesystem (Which I think should be possible because
a ZIL is actually kept per-filesystem).

As for adding HDD's as ZIL-devices, I'd advise against it. We have tried
this and the performance decreased.  Using SSD's as the ZIL is probably
the way to go.  A final option is to accept the situation as it is, arguing
that you have traded performance for increased reliability.

Regards,
Auke
-- 
 Auke Folkerts 
 University of Amsterdam


pgp98O6FxZsbM.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr

Hello,

Edward Ned Harvey wrote:

Yes, I have SSD for ZIL.  Just one SSD.  32G.  But if this is the problem,
then you'll have the same poor performance on the local machine that you
have over NFS.  So I'm curious to see if you have the same poor performance
locally.  The ZIL does not need to be reliable; if it fails, the ZIL will
begin writing to the main storage, and performance will suffer until the new
SSD is put into production.


I am also planning to install a SSD as ZILlog. Is it really true that there
are no problems if the ZILlog fails and there is no mirror of the ZILlog?

What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?

What if the machine reboots after the SSD has failed?
The ZFS Best Practices Guide commends to mirror the log:

 
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations

 Mirroring the log device is highly recommended.
 Protecting the log device by mirroring will allow you to access the storage
 pool even if a log device has failed. Failure of the log device may cause the
 storage pool to be inaccessible if you are running the Solaris Nevada release
 prior to build 96 and a release prior to the Solaris 10 10/09 release.
 For more information, see CR 6707530.

 http://bugs.opensolaris.org/view_bug.do?bug_id=6707530

No probs with that if I use Sol10U8?

Regrads,
Michael.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr

Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is

a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, in 
which case it is used to support recovery.  The system memory is used as 
the write path in the normal case.  Once the data is written to the 
intent log, then the data is declared to be written as far as higher 
level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the ZILlog
SSD fails.

If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the faulty
ZILlog SSD but otherwise without probs and with a clean zpool.

Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021

Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, mbr wrote:


Has the following error no consequences?

Bug ID 6538021
Synopsis   Need a way to force pool startup when zil cannot be replayed
State  3-Accepted (Yes, that is a problem)
Link 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


I don't know the status of this but it does make sense to require the 
user to explicitly corrupt/lose data in the storage pool.  It could be 
that the log device is just temporarily missing and can be restored so 
zfs should not do this by default.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Neil Perrin



On 12/03/09 09:21, mbr wrote:

Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is
a copy of the data still in the machines memory from where it can be 
used

to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the 
ZILlog SSD fails.


Mirrored log devices are advised to improve reliablity. As previously mentioned,
if during writing a log device fails or is temporarily full then we use the 
main pool
devices to chain the log blocks. If we get read errors when trying to replay the
intent log (after a crash/power fail) then the admin is given the option to 
ignore
the log and continue or somehow fix the device (eg re-attach) and then retry.
Multiple log devices would provide extra reliability here.
We do not look in memory for the log records if we can't get the records
from the log blocks.



If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the 
faulty ZILlog SSD but otherwise without probs and with a clean zpool.


The log records are not required for consistency of the pool (it's not a 
journal).



Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


Er that bug should probably be closed as a duplicate.
We now have this functionality.



Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Brian McKerr
Hi all,

I have a home server based on SNV_127 with 8 disks;

2 x 500GB mirrored root pool
6 x 1TB raidz2 data pool

This server performs a few functions;

NFS : for several 'lab' ESX virtual machines
NFS : mythtv storage (videos, music, recordings etc)
Samba : for home directories for all networked PCs

I backup the important data to external USB hdd each day.


I previously had a linux NFS server that I had mounted 'ASYNC' and, as one 
would expect, NFS performance was pretty good getting close to 900gb/s. Now 
that I have moved to opensolaris,  NFS performance is not very good, I'm 
guessing mainly due to the 'SYNC' nature of NFS.  I've seen various threads and 
most point at 2 options;

1. Disable the ZIL
2. Add independent log device/s

I happen to have 2 x 250GB Western Digital RE3 7200rpm (Raid edition, rated for 
24x7 usage etc) hard drives sitting doing nothing and was wondering whether it 
might speed up NFS, and possibly general filesystem usage, by adding these 
devices as log devices to the data pool.  I understand that an SSD is 
considered ideal for log devices but I'm thinking that these 2 drives should at 
least be better than having the ZIL 'inside' the zpool.

If adding these devices, should I add them as mirrored or individual to get 
some sort of load balancing (according to zpool manpage) and perhaps a little 
bit more performance ?

I'm running ZFS version 19 which 'zpool upgrade -v' shows me as having 'log 
device removal' support. Can I easily remove these devices if I find that they 
have resulted in little/no performance improvements ?

Any help/tips greatly appreciated.

Cheers.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Ross Walker
On Dec 2, 2009, at 6:57 AM, Brian McKerr br...@datamatters.com.au  
wrote:



Hi all,

I have a home server based on SNV_127 with 8 disks;

2 x 500GB mirrored root pool
6 x 1TB raidz2 data pool

This server performs a few functions;

NFS : for several 'lab' ESX virtual machines
NFS : mythtv storage (videos, music, recordings etc)
Samba : for home directories for all networked PCs

I backup the important data to external USB hdd each day.


I previously had a linux NFS server that I had mounted 'ASYNC' and,  
as one would expect, NFS performance was pretty good getting close  
to 900gb/s. Now that I have moved to opensolaris,  NFS performance  
is not very good, I'm guessing mainly due to the 'SYNC' nature of  
NFS.  I've seen various threads and most point at 2 options;


1. Disable the ZIL
2. Add independent log device/s

I happen to have 2 x 250GB Western Digital RE3 7200rpm (Raid  
edition, rated for 24x7 usage etc) hard drives sitting doing nothing  
and was wondering whether it might speed up NFS, and possibly  
general filesystem usage, by adding these devices as log devices to  
the data pool.  I understand that an SSD is considered ideal for log  
devices but I'm thinking that these 2 drives should at least be  
better than having the ZIL 'inside' the zpool.


If adding these devices, should I add them as mirrored or individual  
to get some sort of load balancing (according to zpool manpage) and  
perhaps a little bit more performance ?


I'm running ZFS version 19 which 'zpool upgrade -v' shows me as  
having 'log device removal' support. Can I easily remove these  
devices if I find that they have resulted in little/no performance  
improvements ?


Any help/tips greatly appreciated.


It wouldn't hurt to try, but I'd be surprised if it helped much if at  
all. Idea of a separate ZIL is to locate it on a device with lower  
latency then the pool which would help increase performance between  
pool and log writes.


What speed are you trying to achieve for writes? Wirespeed? Well it's  
achievable, but with an app that uses larger block sizes and allows  
more then 1 transaction in flight at a time.


I wouldn't disable the ZIL but look at tuning the client side, or you  
could invest in a controller with a large battery backed write-cache  
and good JBOD mode, or a small fast SSD drive.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Edward Ned Harvey
 I previously had a linux NFS server that I had mounted 'ASYNC' and, as
 one would expect, NFS performance was pretty good getting close to
 900gb/s. Now that I have moved to opensolaris,  NFS performance is not
 very good, I'm guessing mainly due to the 'SYNC' nature of NFS.  I've
 seen various threads and most point at 2 options;
 
 1. Disable the ZIL
 2. Add independent log device/s

Really your question isn't about Zil on HDD (as subject says) but NFS
performance.

I'll tell you a couple of things.  I have a solaris ZFS and NFS server at
work, which noticeably outperforms the previous NFS server.  Here are the
differences in our setup:

Yes, I have SSD for ZIL.  Just one SSD.  32G.  But if this is the problem,
then you'll have the same poor performance on the local machine that you
have over NFS.  So I'm curious to see if you have the same poor performance
locally.  The ZIL does not need to be reliable; if it fails, the ZIL will
begin writing to the main storage, and performance will suffer until the new
SSD is put into production.

Another thing - You have 6 disks in raidz2.  This is 6 disks with the
capacity of 4.  You should get noticeably better performance if you have
3x2disk mirrors.  6 disks with the capacity of 3.  But if your bottleneck is
Ethernet, this difference might be irrelevant.

I have nothing special in my dfstab.
cat /etc/dfs/dfstab
share -F nfs -o ro=host1,rw=host2:host3,root=host2,host3,anon=4294967294
/path-to-export

But when I mount it from linux, I took great care to create this config:
cat /etc/auto.master
/-  /etc/auto.direct --timeout=1200

cat /etc/auto.direct
/mountpoint  -fstype=nfs,noacl,rw,hard,intr,posix
solarisserver:/path-to-export


I'm interested to hear if this sheds any light for you.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Rob Logan


 2 x 500GB mirrored root pool
 6 x 1TB raidz2 data pool
 I happen to have 2 x 250GB Western Digital RE3 7200rpm
 be better than having the ZIL 'inside' the zpool.

listing two log devices (stripe) would have more spindles
than your single raidz2 vdev..  but for low cost fun one
might make a tinny slice on all the disks of the raidz2
and list six log devices (6 way stripe) and not bother
adding the other two disks.

Rob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Eric D. Mudama

On Wed, Dec  2 at 10:59, Rob Logan wrote:



2 x 500GB mirrored root pool
6 x 1TB raidz2 data pool
I happen to have 2 x 250GB Western Digital RE3 7200rpm
be better than having the ZIL 'inside' the zpool.


listing two log devices (stripe) would have more spindles
than your single raidz2 vdev..  but for low cost fun one
might make a tinny slice on all the disks of the raidz2
and list six log devices (6 way stripe) and not bother
adding the other two disks.


But if you did that, a synchronous write (FUA or with a cache flush)
would have a significant latency penalty, especially if NCQ was being
used.

The size of the zil is usually tiny, striping it doesn't make any
sense to me.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss