[Ocfs2-users] Problem with OCFS2 disk on some moments (slow until stalls)

Area de Sistemas Mon, 14 Sep 2015 01:24:08 -0700

Hello everyone,

We have a problem in a 3 member OCFS2 cluster used to serve an web/phpapplication that access (read and/or write) files located in the OCFS2volume.

The problem appears only some times (apparently during high load periods).


SYMPTOMS:
- access to OCFS2 content becomes more an more slow until stalls
    * a "ls" command that normally takes <=1s takes 30s, 40s, 1m,...
- load average of the system grows to 150, 200 or even more

- high iowait values: 70-90%
    * but CPU usage is low

- in the syslog appears a lot of messages like:
    (httpd,XXXXX,Y):ocfs2_rename:1474 ERROR: status = -13
  or
    (httpd,XXXXX,Y):ocfs2_unlink:951 ERROR: status = -2

  and the more "worrying":
     kernel: INFO: task httpd:3488 blocked for more than 120 seconds.

kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message.

     kernel: httpd           D c6fe5d74     0  3488   1616 0x00000080

kernel: c6fe5e04 00000082 00000000 c6fe5d74 c6fe5d74 000041fdc6fe5d88 c0439b18kernel: c0b976c0 c0b976c0 c0b976c0 c0b976c0 ed0f0ac0 c6fe5de8c0b976c0 f75ac6c0kernel: f2f0cd60 c0a95060 00000001 c6fe5dbc c0874b8d c6fe5de8f8fd9a86 00000001

     kernel: Call Trace:
     kernel: [<c0439b18>] ? default_spin_lock_flags+0x8/0x10
     kernel: [<c0874b8d>] ? _raw_spin_lock+0xd/0x10
     kernel: [<f8fd9a86>] ? ocfs2_dentry_revalidate+0xc6/0x2d0 [ocfs2]
     kernel: [<f8ff17be>] ? ocfs2_permission+0xfe/0x110 [ocfs2]
     kernel: [<f905b6f0>] ? ocfs2_acl_chmod+0xd0/0xd0 [ocfs2]
     kernel: [<c0873105>] schedule+0x35/0x50
     kernel: [<c0873b2e>] __mutex_lock_slowpath+0xbe/0x120
     ....


(UNACCEPTABLE) WORKAROUND:
   stop httpd (really slow)
   stop ocfs2 service (really slow)
   start ocfs2 an httpd

MORE INFO:
- OS information:
    Oracle Linux 6.4 32bit
    4GB RAM

uname -a: 2.6.39-400.109.6.el6uek.i686 #1 SMP Wed Aug 28 09:55:10PDT 2013 i686 i686 i386 GNU/Linux* anyway: we have another 5 nodes cluster with Oracle Linux 7.1 (so64bit OS) serving a newer version of the same application and theproblems are similar, so it appears not to be a OS problem but a morespecific OCFS2 problem (bug? some tuning? other?)


- standard configuration

* if you want I can show the cluster.conf configuration but is the"expected configuration"


- standard configuration in o2cb:
    Driver for "configfs": Loaded
    Filesystem "configfs": Mounted
    Stack glue driver: Loaded
    Stack plugin "o2cb": Loaded
    Driver for "ocfs2_dlmfs": Loaded
    Filesystem "ocfs2_dlmfs": Mounted
    Checking O2CB cluster "MoodleOCFS2": Online
      Heartbeat dead threshold: 31
      Network idle timeout: 30000
      Network keepalive delay: 2000
      Network reconnect delay: 2000
      Heartbeat mode: Local
    Checking O2CB heartbeat: Active

- mount options: _netdev,rw,noatime
    * so other options (commit, data, ...) have their default values


Any ideas/suggestion?

Regards.

--
------------------------------------------------------------------------

*Area de Sistemas
Servicio de las Tecnologias de la Informacion y Comunicaciones (STIC)
Universidad de Valladolid
Edificio Alfonso VIII, C/Real de Burgos s/n. 47011, Valladolid - ESPAÑA
Telefono: 983 18-6410, Fax: 983 423271
E-mail: siste...@uva.es
*

*
------------------------------------------------------------------------
*

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] Problem with OCFS2 disk on some moments (slow until stalls)

Reply via email to