Hi,
On 20/11/17 04:23, 성백재 wrote:
Hello, List.
We are developing storage systems using 10 NVMes (current test set).
Using MD RAID10 + CLVM/GFS2 over four hosts achieves 22 GB/s (Max. on
Reads).
However, a GFS2 DLM problem occurred. The problem is that each host
frequently reports “dlm: gfs2: send_repeat_remove” kernel messages,
and I/O throughput becomes unstable and low.
I found a GFS2 commit message about “send_repeat_remove” function.
(https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?id=96006ea6d4eea73466e90ef353bf34e507724e77)
Information about the test environment.
Four hosts share 10 NVMes, and each host deploys CLVM/GFS2 on top of
the cluster MD RAID1 + MD RAID0.
GFS2 has 2,000 directories, each with 1,900 media files (3 MB on average).
Each host runs 20 threads of NGINX, and each thread randomly reads
media files on demand.
The Linux kernel version is 4.11.8.
Can you offer suggestions or directions to solve these problems?
Thank you in advance :)
Best regards,
/Jay Sung
I'm copying in our DLM experts. It would be good to open a bug at Red
Hat's bugzilla to track this issue (and a customer case too, if you are
a customer). It looks like something that will need some investigation
to get to the bottom of what is going on. I suspect that a tcpdump of
the DLM traffic when the issue occurs would be the first thing to try,
so that we can try and match the message to the protocol dump. That may
not be easy since I suspect that there is a large quantity of DLM
traffic in your set up, and that will make finding the specific messages
more tricky.
Just out of interest, what kind of network is this running over? How
much bandwidth is DLM taking up?
Steve.
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster