Thanks Nate. Compression is currently with lzjb. Will do some testing with dd.

Joachim

Joachim Jacob
Contact details: http://www.bits.vib.be/index.php/about/80-team




On Wed 11 Sep 2013 04:03:55 PM CEST, Nate Coraor wrote:
Hi all,

I'm a big fan of ZFS, we have long used it behind Galaxy Main.  Some of our 
older servers are (still) Solaris, and the newest is FreeBSD.

I've lately been using SmartOS for virtualization and while it has a drawback 
as a fileserver (since currently the nfs server can only run in the global 
zone, which is not ideal on SmartOS), there are other illumos derivatives that 
would probably be great for this task (e.g. OmniOS).  Native ZFS in the OS in 
which it is developed is a win for me, especially when you are serving via 
simple NFS.  For more complex network filesystems, Linux is probably preferable.

I considered a separate ZIL and L2ARC for the latest ZFS server, but DTrace 
revealed that I probably would not see much of a performance benefit with our 
usage patterns.  The memory usage you're seeing is to be expected - it will 
pretty much consume whatever is available for caching, but it's available to be 
freed if needed for something else.

I wouldn't suggest rsync for performance testing.  I typically do things like 
timed writing of blocks read from /dev/zero using dd, so that the source 
filesystem and checksumming algorithm can be taken out of the equation.  And 
dedup/compression will of course cause a significant write penalty.  If you can 
suffer the decreased space optimization, lzjb performs significantly better 
than gzip.  gzip-1 is a nice compromise between the default gzip level and 
lzjb, as well.

--nate

On Sep 11, 2013, at 3:45 AM, Joachim Jacob | VIB | wrote:

Thank you all for the reactions.

Some details about my current ZFS and Galaxy setup:

- Galaxy runs as a single virtual machine, with currently 20 cores, 80GB RAM. 
Will be 32 cores and about 160GB RAM soon.
- The postgres database is on the virtual machine itself.
- The 'files' and 'job_working_dir' are on an NFS exported directory, hosted on 
the host machine of the guest.
- The NFS exported directory is an raidz1 dataset.
- The raidsz1 runs on 7 550GB SAS disks, which are (unfortunately) controlled 
by a RAID hardware controller, but passed as RAID0 (JBOD not available). So 
raidz1 runs on 7 RAID0 disks (with settings in the hardware controller PERC 
H700: no read ahead, write through, 8 KB stripe size, disk cache policy 
enabled).
- Compression and deduplication is enabled.
- The directory on which the zfs dataset is mounted, is exported using the 
native linux NFS daemon to the Galaxy virtual machine. The 'zfs sharenfs' did 
not work (ownerships not set correctly - perhaps need some more investigation, 
but I found several times reports about sharenfs option in ZFS in linux is not 
behaving well...).

The numbers:
- my initial files database (ext4 on RAID5) is 3.0TB in size. On ZFS, with 
compression and deduplication, this database is *1.8TB *(-40%).
- Did not yet provide a SLOG to host the ZIL and a L2ARC, since I have a 
cleared picture about the performance I can get. Would you advise preference 
over ZIL on a SLOG, or go for a SSD to host the L2ARC.
- The cost for this performance of ZFS in storage is RAM: currently 
continuously using*284GB RAM* for ZFS!
- The write and read speed is from the Galaxy VM over NFS is *~40MB/s and 
~100MB/s* (tested by simply copying over rsync - I still need to check the 
presentation and scripts of Anne Black-Ziegelbein). This is a 66% decrease in 
previously achieved write and read speed (ext4 on hardware RAID5), but I feel 
that the benefits (deduplication, backing up via snapshots, data integrity,) 
outweigh this IO performance.

(I am setting this ZFS up on a new server (well, actually 2 years old now, has 
served on another project well))

Currently our Galaxy uses this zfs with success!

For your interest, my settings on the 'galaxydb' zfs tank below. (I was 
wondering if here some more wizardry can be applied).

************
[root@r910bits ~]# *zfs get all tank/galaxydb*
NAME           PROPERTY              VALUE                  SOURCE
tank/galaxydb  type                  filesystem             -
tank/galaxydb  creation              Mon Sep  9 12:44 2013  -
tank/galaxydb  used                  1.81T                  -
tank/galaxydb  available             1.66T                  -
tank/galaxydb  referenced            1.81T                  -
tank/galaxydb  compressratio         1.66x                  -
tank/galaxydb  mounted               yes                    -
tank/galaxydb  quota                 none default
tank/galaxydb  reservation           none default
tank/galaxydb  recordsize            128K default
tank/galaxydb  mountpoint            /mnt/galaxydb          local
tank/galaxydb  sharenfs              rw=@galaxy             local
tank/galaxydb  checksum              on default
tank/galaxydb  compression           lzjb                   local
tank/galaxydb  atime                 on default
tank/galaxydb  devices               on default
tank/galaxydb  exec                  on default
tank/galaxydb  setuid                on default
tank/galaxydb  readonly              off default
tank/galaxydb  zoned                 off default
tank/galaxydb  snapdir               hidden default
tank/galaxydb  aclinherit            restricted default
tank/galaxydb  canmount              on default
tank/galaxydb  xattr                 on default
tank/galaxydb  copies                1 default
tank/galaxydb  version               5                      -
tank/galaxydb  utf8only              off                    -
tank/galaxydb  normalization         none                   -
tank/galaxydb  casesensitivity       sensitive              -
tank/galaxydb  vscan                 off default
tank/galaxydb  nbmand                off default
tank/galaxydb  sharesmb              off default
tank/galaxydb  refquota              none default
tank/galaxydb  refreservation        none default
tank/galaxydb  primarycache          all default
tank/galaxydb  secondarycache        all default
tank/galaxydb  usedbysnapshots       0                      -
tank/galaxydb  usedbydataset         1.81T                  -
tank/galaxydb  usedbychildren        0                      -
tank/galaxydb  usedbyrefreservation  0                      -
tank/galaxydb  logbias               latency default
tank/galaxydb  dedup                 on                     local
tank/galaxydb  mlslabel              none default
tank/galaxydb  sync                  standard default
tank/galaxydb  refcompressratio      1.66x                  -
tank/galaxydb  written               1.81T                  -
tank/galaxydb  snapdev               hidden default
*****************



Cheers,
Joachim

Joachim Jacob
Contact details: http://www.bits.vib.be/index.php/about/80-team


On 09/10/2013 11:29 PM, Guest, Simon wrote:
Hi Joachim,

At AgResearch we are using ZFS for our HPC storage, which is used by our 
internal Galaxy instance.  Currently we are running on FreeNAS (FreeBSD 
derivative), but we are in transition to ZFS on Linux.  We export the 
filesystem over NFS (10Gb Ethernet), but not the database (PostgreSQL).  In 
general, you want block storage for a database, so I suggest you look for a 
solution other than NFS to host that.

Our experience with ZFS has been very positive.  However, FreeNAS is not really 
suited to our needs - it's more of a storage appliance, probably great for a 
home NAS.  Hence the planned transition.

I strongly recommend you follow the discussion on the ZFS discuss mailing list. 
 There's a lot to learn about ZFS configuration, much more than you will glean 
from few posts here.
http://zfsonlinux.org/lists.html

cheers,
Simon

-----Original Message-----
From: galaxy-dev-boun...@lists.bx.psu.edu [mailto:galaxy-dev-
boun...@lists.bx.psu.edu] On Behalf Of Joachim Jacob | VIB |
Sent: Tuesday, 10 September 2013 10:29 p.m.
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] ZFS storage recommendations

Hi all,

I am performing some tests to move my galaxy database to ZFS. Does anybody
have experience with ZFS on linux, and some recommendations/experiences to
optimize performance? The purpose is to share the database over NFS to the
Galaxy VM.


Thanks,
Joachim.

--
Joachim Jacob
Contact details: http://www.bits.vib.be/index.php/about/80-team


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

Reply via email to