Alexander: I have also experienced the stalls you are explaining. This was in a 2 brick setup running replicated volumes used by a 20 node HPC.
In my case this was solved by:
* Replace FUSE with NFS
* This is by far the biggest booster
* RAM disks for the scratch directories (not connected to gluster at all)
* If you’re not sure where these directories are, run ‘gluster volume
top <volume> write list-cnt 10’
* 'tuned-adm profile; tuned-adm profile rhs-high-throughput’ on all storage
bricks
* The following volume options
* cluster.nufa: enable
* performance.quick-read: on
* performance.open-behind: on
* Mount option on clients
* noatime
* Use only where access time isn’t needed.
* Major booster for small file writes in my case. Even with the
FUSE client.
Hope this helps,
Regards,
Robin
On 10 Mar 2014, at 19:06 pm, Alexander Valys <[email protected]> wrote:
> A quick performance question.
>
> I have a small cluster of 4 machines, 64 cores in total. I am running a
> scientific simulation on them, which writes at between 0.1 and 10 MB/s
> (total) to roughly 64 HDF5 files. Each HDF5 file is written by only one
> process. The writes are not continuous, but consist of writing roughly 1 MB
> of data to each file every few seconds.
>
> Writing to HDF5 involves a lot of reading the file metadata and random
> seeking within the file, since we are actually writing to about 30 datasets
> inside each file. I am hosting the output on a distributed gluster volume
> (one brick local to each machine) to provide a unified namespace for the
> (very rare) case when each process needs to read the other's files.
>
> I am seeing somewhat lower performance than I expected, i.e. a factor of
> approximately 4 less throughput than each node writing locally to the bare
> drives. I expected the write-behind cache to buffer each write, but it seems
> that the writes are being quickly flushed across the network regardless of
> what write-behind cache size I use (32 MB currently), and the simulation
> stalls while waiting for the I/O operation to finish. Anyone have any
> suggestions as to what to look at? I am using gluster 3.4.2 on ubuntu 12.04.
> I have flush-behind turned on, and have mounted the volume with
> direct-io-mode=disable, and have the cache size set to 256M.
>
> The nodes are connected via a dedicated gigabit ethernet network, carrying
> only gluster traffic (no simulation traffic).
>
> (sorry if this message comes through twice, I sent it yesterday but was not
> subscribed)
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
