Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-05 Thread Srinivas Eeda
On 6/3/2010 10:37 PM, Wengang Wang wrote: Srini, On 10-06-03 19:17, Srinivas Eeda wrote: Can you please explain the idea of the new flag DLM_LOCK_RES_DE_DROP_REF :) If the idea of the fix is to address the race between purging and recovery, I am wondering DLM_LOCK_RES_DROPPING_REF

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Srinivas Eeda
Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't resend deref in this

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master

2010-06-03 Thread Srinivas Eeda
On 6/3/2010 6:43 PM, Wengang Wang wrote: Srini, On 10-06-03 18:06, Srinivas Eeda wrote: Comments inline On 6/3/2010 9:37 AM, Wengang Wang wrote: Changes to V1: 1 move the msleep to the second runs when the lockres is in recovery so the purging work on other lockres' can go. 2 do

Re: [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures

2010-05-24 Thread Srinivas Eeda
thanks for doing this patch. I have a little comment, wondering if there could be a window where node B sent the lock info to node C as part of recovery and removed flag DLM_LOCK_RES_RECOVERING while dlm_thread was still purging it. In that case dlm_thread will still continue to remove it from

Re: [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures

2010-05-24 Thread Srinivas Eeda
On 5/24/2010 7:50 PM, Wengang Wang wrote: delay deref message if DLM_LOCK_RES_RECOVERING is set (which means recovery got to the lockres before dlm_thread could), move the lockres to the end of the purgelist and retry later. If you meant checking before sending DEREF, it could cause a

[Ocfs2-devel] o2net: log socket state changes

2010-03-31 Thread Srinivas Eeda
This patch logs socket state changes that lead to socket shutdown. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 334f231..8bda1ea

[Ocfs2-devel] o2net patch to lock socket shutdown message

2010-03-31 Thread Srinivas Eeda
The following patch logs socket shutdown messages. Below is the snippet of how the message looks (new message ends with ... shutdown, state #) [r...@el532p-3 ~]# mount /dev/hdb /vol1 Mar 31 11:14:18 el532p-3 kernel: connection to node el532p-2 (num 64) at 10.35.70.104: shutdown, state 8 Mar

[Ocfs2-devel] o2net: log socket state changes

2010-03-30 Thread Srinivas Eeda
This patch logs socket state changes that lead to socket shutdown. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 334f231..6d0d228

[Ocfs2-devel] [PATCH 1/1] ocfs2: Fix a race in o2dlm lockres mastery(backport to 1.4)

2010-03-23 Thread Srinivas Eeda
informing the master directly. This is easily fixed by holding the dlm spinlock a little longer in the mastery handler. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/dlm/dlmmaster.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlm

[Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery

2010-03-22 Thread Srinivas Eeda
DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a hole that results in loss of refmap bit on the master node. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/dlm/dlmmaster.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm

Re: [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery

2010-03-22 Thread SRINIVAS EEDA
Sunil, Joel, thanks for modifying the comments :) On 3/22/2010 6:47 PM, Joel Becker wrote: On Mon, Mar 22, 2010 at 06:20:32PM -0700, Sunil Mushran wrote: yes, your wording is better. and yes, dlm-spinlock is the top level lock. This patch is now in the 'fixes' branch of

[Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol (revision 2)

2010-02-18 Thread Srinivas Eeda
delivery. However the intention of this feature was to send a keepalive message every timeout seconds. This patch sends a message for every keepalive time interval. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c |6 +- 1 files changed, 5 insertions(+), 1

Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

2010-02-17 Thread Srinivas Eeda
they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case

Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

2010-02-17 Thread Srinivas Eeda
No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always

Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

2010-02-17 Thread Srinivas Eeda
alive every 2 seconds. Sunil Mushran wrote: How will it double? The node will send a keepalive only if it has not heard from the other node for 2 secs. Srinivas Eeda wrote: No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving

Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

2010-02-17 Thread srinivas eeda
. As in, not wait for the response to requeue. But we'll still be smart about it in the sense that not send a hb even if the nodes are communicating otherwise. Srinivas Eeda wrote: In old code a node cancels and re queues keep alive message when it hears from the other node. If it didn't hear in 2

Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

2010-02-17 Thread srinivas eeda
Yea, they don't expect/wait for a response for keep alive message. On 2/17/2010 5:49 PM, Joel Becker wrote: On Wed, Feb 17, 2010 at 10:24:30AM -0800, Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without

[Ocfs2-devel] [PATCH 2/3] o2net: delay enotconn for sends receives till quorum decision

2010-01-28 Thread Srinivas Eeda
messages to/from evicted node. If network connection comes back before the eviction, quorum decision is cancelled and messaging resumes. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c | 69 +++ fs/ocfs2/cluster

[Ocfs2-devel] [PATCH 1/3] o2net: rollback reconnect on network timeout.

2010-01-28 Thread Srinivas Eeda
This patch rollbacks earlier fix that tries to re-establish network connection when network timeout happens. Reconnect was re-cycling sockets which results in lost messages resulting in hangs. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c | 50

[Ocfs2-devel] o2net patches

2010-01-28 Thread Srinivas Eeda
Following 3 patches fixes: 1. rollback's reconnect fix 2. delay enotconn for sends, receives till a node reconnects/dies after a lost connection. 3. Correct's keepalive protocol Thanks, --Srini ___ Ocfs2-devel mailing list

[Ocfs2-devel] [PATCH] ocfs2: avoid panic for local mounts on corruptions

2009-11-23 Thread Srinivas Eeda
When a file system is mounted local, it may be enough to remount it read only on seeing corruptions. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/super.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c

[Ocfs2-devel] [PATCH] o2net: delay ENOTCONN for sends receives till quorum decision

2009-11-19 Thread Srinivas Eeda
is cancelled and messaging resumes. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/cluster/tcp.c | 94 +++ fs/ocfs2/cluster/tcp_internal.h |9 ++-- 2 files changed, 60 insertions(+), 43 deletions(-) diff --git a/fs/ocfs2

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

2009-07-17 Thread Srinivas Eeda
Tao Ma wrote: Hi Joel, This reply may be really too late. :) Joel Becker wrote: On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote: I also have some thoughts for it. Wish it isn't too late. Well, if we come up with changes it will affect what I push, but that's OK.

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

2009-06-10 Thread Srinivas Eeda
in ocfs2_super when reflink is ongoing(I will do it). Make sense? Yes, I can restrict the node to recover it's own and offline slots. I can make the node to recover it's own slot every time the timer fires and offline slots in round robin way(current way) Regards, Tao Srinivas Eeda wrote

[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

2009-06-04 Thread Srinivas Eeda
such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/dlmglue.c | 51 ++ fs/ocfs2/dlmglue.h | 10 fs/ocfs2

[Ocfs2-devel] Backport that adds delayed orphan scan timer to 1.4

2009-06-04 Thread Srinivas Eeda
Next two patches are backport of orphan scan timer patches to ocfs2-1.4 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 2/2] ocfs2 patch to track delayed orphan scan timer statistics

2009-06-03 Thread Srinivas Eeda
Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Signed-off-by: Sunil Mushran sunil.mush...@oracle.com --- fs/ocfs2/journal.c

[Ocfs2-devel] Patches that adds delayed orphan scan timer (rev 3)

2009-06-03 Thread Srinivas Eeda
Resending after implementing review comments. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 2/2] ocfs2 patch to track delayed orphan scan timer statistics

2009-06-02 Thread Srinivas Eeda
Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c |2 ++ fs/ocfs2/ocfs2.h |4 +++- fs/ocfs2

[Ocfs2-devel] [PATCH 1/2] OCFS2: timer to queue scan of all orphan slots

2009-06-02 Thread Srinivas Eeda
at a time. It is done once every X seconds, where X is a value between ORPHAN_SCAN_SCHEDULE_TIMEOUT/2 and ORPHAN_SCAN_SCHEDULE_TIMEOUT milliseconds. Each time the scan is done by different node so eventually the node that has the inode cached will get to wipe the file. Signed-off-by: Srinivas Eeda

[Ocfs2-devel] Patches that adds delayed orphan scan timer

2009-06-02 Thread Srinivas Eeda
Resending after adding another patch to display delayed orphan scan statistics. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 1/2] ocfs2: timer to queue scan of all orphan slots

2009-06-02 Thread Srinivas Eeda
the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/dlmglue.c | 47 + fs/ocfs2/dlmglue.h | 11 + fs/ocfs2/journal.c | 106 +++ fs

[Ocfs2-devel] [PATCH 2/2] ocfs2 patch to track delayed orphan scan timer statistics

2009-06-02 Thread Srinivas Eeda
Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c |2 ++ fs/ocfs2/ocfs2.h |4 +++- fs/ocfs2

[Ocfs2-devel] Patches that adds delayed orphan scan timer (rev 2)

2009-06-02 Thread Srinivas Eeda
Resending after implementing review comments. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 1/1] OCFS2: timer to queue scan of all orphan slots

2009-05-21 Thread Srinivas Eeda
at a time. It is done once every X seconds, where X is a value between ORPHAN_SCAN_SCHEDULE_TIMEOUT/2 and ORPHAN_SCAN_SCHEDULE_TIMEOUT milliseconds. Each time the scan is done by different node so eventually the node that has the inode cached will get to wipe the file. Signed-off-by: Srinivas Eeda

[Ocfs2-devel] [PATCH 1/1] OCFS2: timer to queue scan of all orphan slots

2009-05-19 Thread Srinivas Eeda
-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/dlmglue.c | 58 + fs/ocfs2/dlmglue.h |8 +++ fs/ocfs2/journal.c | 109 +++ fs/ocfs2/journal.h | 12 + fs/ocfs2/ocfs2.h|2 + fs

Re: [Ocfs2-devel] FW: Oracle 9204 installation on linux x86-64 on ocfs

2009-05-04 Thread Srinivas Eeda
did you use -o datavolume, nointr options for mounting? keyur patel wrote: Hello All, I have installed Oracle Cluster Manager on linux x86-64 nit. I am using ocfs file system for quorum file. But I am getting following error. Please see ocfs configureation below. I would appreciate, if

Re: [Ocfs2-devel] orphan cleanup

2009-04-30 Thread Srinivas Eeda
hmm, even if we queue the orphan recovery, inode may not get cleaned if the inode is still around on some node right? The node where the inode is still cached will vote no again? Sunil Mushran wrote: Joel Becker wrote: Srini, Ok, you can go ahead and cook up the background orphan

[Ocfs2-devel] Backport to 1.4 of patch that recovers orphans from offline slots

2009-04-07 Thread Srinivas Eeda
The following patch is a backport of patch that recovers orphans from offline slots. It is being backported from mainline to 1.4 mainline patch: 0001-Patch-to-recover-orphans-in-offline-slots-during-rec.patch Thanks, --Srini ___ Ocfs2-devel mailing

[Ocfs2-devel] [PATCH 1/1] ocfs2: recover orphans in offline slots during recovery and mount

2009-03-06 Thread Srinivas Eeda
recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Signed-off-by: Joel Becker joel.bec...@oracle.com --- fs/ocfs2/journal.c | 140

[Ocfs2-devel] [PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount

2009-03-05 Thread Srinivas Eeda
recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c | 136

[Ocfs2-devel] [PATCH 1/1] Patch to recover orphans in offline slots during recovery and mount

2009-03-04 Thread Srinivas Eeda
recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c | 106

[Ocfs2-devel] [PATCH 1/1] Patch to recover orphans from the slot during mount

2009-02-27 Thread Srinivas Eeda
are clean they will not queue to recover their orphan directory. This patch queues to recover orphans when the slot is next used. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c | 21 - 1 files changed, 8 insertions(+), 13 deletions(-) diff --git

[Ocfs2-devel] Patch to move ocfs2_slot_info to slot_map.h

2009-02-27 Thread Srinivas Eeda
Next 3 patches does the following 1) movies ocfs2_slot_info struct from slot_map.c to slot_map.h 2) patch to recover orphans during mount even if the journal is clean 3) patch to recovery orphans in offline slots ___ Ocfs2-devel mailing list

[Ocfs2-devel] [PATCH 1/1] Patch to clean orphans in all offline slots during recovery.

2009-02-19 Thread Srinivas Eeda
things: a) Recover orphans during mount of the slot that it is using. b) Recover orphans in all offline slots during recovery. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com --- fs/ocfs2/journal.c | 44 +++- 1 files changed, 27 insertions(+), 17

Re: [Ocfs2-devel] mount gnenerates an error : Unable to access cluster service while starting heartbeat

2007-03-28 Thread Srinivas Eeda
That might because you have configured user mode dlm. What does |cat /sys/o2cb/heartbeat_mode show, user?. If so run, /etc/init.d/o2cb configure and answer n for the following ||Use user-space driven heartbeat? (y/n) [y] n thanks, --Srini || |Andy Johnson wrote: Hello, I have

<    1   2