Robin, The NUFA translator sounds perfect for my setup. Do you have a reference for setting it up? I can’t find much documentation about it on the website, except a few references in the mailing list to “the old, now obsolete NUFA translator”.
Thanks - Alex On Mar 11, 2014, at 3:37 AM, Robin Jonsson <[email protected]> wrote: > Alexander: > > I have also experienced the stalls you are explaining. This was in a 2 brick > setup running replicated volumes used by a 20 node HPC. > > In my case this was solved by: > > * Replace FUSE with NFS > * This is by far the biggest booster > * RAM disks for the scratch directories (not connected to gluster at all) > * If you’re not sure where these directories are, run ‘gluster volume > top <volume> write list-cnt 10’ > * 'tuned-adm profile; tuned-adm profile rhs-high-throughput’ on all storage > bricks > * The following volume options > * cluster.nufa: enable > * performance.quick-read: on > * performance.open-behind: on > * Mount option on clients > * noatime > * Use only where access time isn’t needed. > * Major booster for small file writes in my case. Even with the > FUSE client. > > Hope this helps, > > Regards, > Robin > > > On 10 Mar 2014, at 19:06 pm, Alexander Valys <[email protected]> wrote: > >> A quick performance question. >> >> I have a small cluster of 4 machines, 64 cores in total. I am running a >> scientific simulation on them, which writes at between 0.1 and 10 MB/s >> (total) to roughly 64 HDF5 files. Each HDF5 file is written by only one >> process. The writes are not continuous, but consist of writing roughly 1 MB >> of data to each file every few seconds. >> >> Writing to HDF5 involves a lot of reading the file metadata and random >> seeking within the file, since we are actually writing to about 30 datasets >> inside each file. I am hosting the output on a distributed gluster volume >> (one brick local to each machine) to provide a unified namespace for the >> (very rare) case when each process needs to read the other's files. >> >> I am seeing somewhat lower performance than I expected, i.e. a factor of >> approximately 4 less throughput than each node writing locally to the bare >> drives. I expected the write-behind cache to buffer each write, but it >> seems that the writes are being quickly flushed across the network >> regardless of what write-behind cache size I use (32 MB currently), and the >> simulation stalls while waiting for the I/O operation to finish. Anyone >> have any suggestions as to what to look at? I am using gluster 3.4.2 on >> ubuntu 12.04. I have flush-behind turned on, and have mounted the volume >> with direct-io-mode=disable, and have the cache size set to 256M. >> >> The nodes are connected via a dedicated gigabit ethernet network, carrying >> only gluster traffic (no simulation traffic). >> >> (sorry if this message comes through twice, I sent it yesterday but was not >> subscribed) >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
