Hello Scott,
I had help from an Oracle developer, Srinivas, and he fixed my issue.   Thanks 
again Srini !!

Disclaimer: I am not technically knowledgable of the OCFS2 filesystem so I will 
explain what I understood from a discussion with Srini.

The main issue was that my filesystem is getting more and more fragmented and 
the default write pre-allocation window (localalloc bitmap) was set too big for 
such fragmented filesystem.  For what I understood, when you are doing writes 
on a OCFS2 filesystem, the filesystem reserves a chunk of space before 
beginning to write onto the filesystem.  Even if you are writing a very small 
file, the filesystem will always reserve that chunk of space (I assume this 
helps reduce fragmentation ?!)
According to my filesystem setup, the size of the pre-allocated chunks was set 
at 136 MB so it meant the filesystem needed to find 136 MB of contiguous space 
every time a write was being done.  That caused delays because it had hard time 
finding them ...
Srini showed me how to reduce the pre-allocated chunks size (localalloc bitmap) 
to a smaller size (16 MB instead of 136 MB) and, since then, everything works 
as new.   The solution to my problem was to add localalloc=16 to my filesystem 
mount options, umount/mount the filesystem and everything was fixed.
[root@fileserv01 ~]# grep tier2-ocfs2 /etc/fstab LABEL=tier2-ocfs2 /tier2-ocfs2 
ocfs2 
_netdev,nodev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,localalloc=16
 0 0

For info, you can view your current localalloc setting by looking at the 
fs_state in the debugfs.
You first need to mount the virtual debugfs filesystem if it's not already 
mounted: 
[root@fileserv01 ~]# grep debugfs /etc/fstab debugfs /sys/kernel/debug debugfs 
0 0
My localalloc settings before the change:
[root@fileserv01 ~]# grep "LocalAlloc =" 
/sys/kernel/debug/ocfs2/*/fs_stateLocalAlloc => State: 1  Descriptor: 0  Size: 
17441 bits  Default: 29696 bits
My localalloc settings after the change
[root@fileserv01 ~]# grep "LocalAlloc =" 
/sys/kernel/debug/ocfs2/*/fs_stateLocalAlloc => State: 1  Descriptor: 0  Size: 
2048 bits  Default: 2048 bits

What you will be missing from my above post is the analysis from Srini where he 
found that they were not many "136 MB" chunks of contiguous space on my 
filesystem and therefore that tuning was definitely going to help.
I hope this post may help you and others.
Jeff
p.s.   sadly, there is currently no defragmentation tool for OCFS2






From: skempin...@sjrwmd.com
To: jpaterso...@hotmail.com; ocfs2-users@oss.oracle.com
Subject: RE: [Ocfs2-users] OCFS2 hanging on writes
Date: Wed, 31 Oct 2012 12:30:00 +0000







Jeff,




Have you found a resolution to this issue?



Lately we've been experiencing intermittent freezing, so I'm curious to hear 
more about your issue.



Thanks, Scott





From: ocfs2-users-boun...@oss.oracle.com [ocfs2-users-boun...@oss.oracle.com] 
on behalf of Jeff Paterson [jpaterso...@hotmail.com]

Sent: Thursday, October 25, 2012 9:32 PM

To: ocfs2-users@oss.oracle.com

Subject: [Ocfs2-users] OCFS2 hanging on writes






Hello,







I would need help with our OCFS2 (1.8.0) filesystem.  We are having problems 
with it since a couple days.  When we write onto it, it hangs.



The "hanging pattern" is easily reproductible.  If I write a 1GB file on the 
filesystem, it does the following:
        - write ~200 MB of data on the disk in 1 second
        - freeze for about 10 seconds
        - write ~200 MB of data on the disk in 1 second
        - freeze for about 10 seconds
        - write ~200 MB of data on the disk in 1 second
        - freeze for about 10 seconds
        (and so on)



When the freezes occur:
        - other writes operations (from other processes) on the same node also 
freeze
        - writes operations on other nodes are not affected by the freezes on 
another node
  
Read operations (on any cluster node, even the one with frozen writes) don't 
seem to be affected by the freezes.  One sure thing, read operations alone don't
 cause the filesystem freeze.




For info, before the problem began to appear we could sustain 640 MB/s writes 
without any freeze.



I tried to mount the filesystem on a single node to avoid issues that could 
happen with inter-node communications and the problem was still there.






Filesystem details


The filesystem has 18 TB and it is currently 72% full.Mount options are the 
following: 
rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=localAll
 Features: backup-super strict-journal-super sparse extended-slotmap 
inline-data metaecc indexed-dirs refcount discontig-bg unwritten







There is nothing special in the systems logs beside application errors caused 
by the freezes.






Would a fsck.ocfs2 help?   How long would it take for 18 TB?



Is there a flag I can enable in debugfs.ocfs2 to get a better idea of what is 
happening and why it is freezing like that?






Any help would be greatly appreciated.



Thanks in advance,



Jeff






                                          
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to