On 6/3/2010 10:37 PM, Wengang Wang wrote:
Srini,
On 10-06-03 19:17, Srinivas Eeda wrote:
Can you please explain the idea of the new flag
DLM_LOCK_RES_DE_DROP_REF :)
If the idea of the fix is to address the race between purging and
recovery, I am wondering DLM_LOCK_RES_DROPPING_REF
Comments inline
On 6/3/2010 9:37 AM, Wengang Wang wrote:
Changes to V1:
1 move the msleep to the second runs when the lockres is in recovery so the
purging work on other lockres' can go.
2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
resend deref in this
On 6/3/2010 6:43 PM, Wengang Wang wrote:
Srini,
On 10-06-03 18:06, Srinivas Eeda wrote:
Comments inline
On 6/3/2010 9:37 AM, Wengang Wang wrote:
Changes to V1:
1 move the msleep to the second runs when the lockres is in recovery so the
purging work on other lockres' can go.
2 do
thanks for doing this patch. I have a little comment, wondering if there
could be a window where node B sent the lock info to node C as part of
recovery and removed flag DLM_LOCK_RES_RECOVERING while dlm_thread was
still purging it. In that case dlm_thread will still continue to remove
it from
On 5/24/2010 7:50 PM, Wengang Wang wrote:
delay deref message if DLM_LOCK_RES_RECOVERING is set (which means
recovery got to the lockres before dlm_thread could), move the
lockres to the end of the purgelist and retry later.
If you meant checking before sending DEREF, it could cause a
This patch logs socket state changes that lead to socket shutdown.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 334f231..8bda1ea
The following patch logs socket shutdown messages. Below is the snippet of how
the message looks (new message ends with ... shutdown, state #)
[r...@el532p-3 ~]# mount /dev/hdb /vol1
Mar 31 11:14:18 el532p-3 kernel: connection to node el532p-2 (num 64) at
10.35.70.104: shutdown, state 8
Mar
This patch logs socket state changes that lead to socket shutdown.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index 334f231..6d0d228
informing the
master directly. This is easily fixed by holding the dlm spinlock a
little longer in the mastery handler.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/dlm/dlmmaster.c |4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/fs/ocfs2/dlm
DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a
hole that results in loss of refmap bit on the master node.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/dlm/dlmmaster.c |4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm
Sunil, Joel, thanks for modifying the comments :)
On 3/22/2010 6:47 PM, Joel Becker wrote:
On Mon, Mar 22, 2010 at 06:20:32PM -0700, Sunil Mushran wrote:
yes, your wording is better. and yes, dlm-spinlock is the
top level lock.
This patch is now in the 'fixes' branch of
delivery. However the intention of this feature was to send
a keepalive message every timeout seconds. This patch sends a message for
every keepalive time interval.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c |6 +-
1 files changed, 5 insertions(+), 1
they received. So
nodes with this patch will always receive a response message.
So, in a mixed setup, both nodes will always hear the heartbeat from
each other :).
thanks,
--Srini
Joel Becker wrote:
On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote:
case
No harm, just doubles heartbeat messages which is not required at all.
Sunil Mushran wrote:
What's the harm in leaving it in?
Srinivas Eeda wrote:
Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC
every 2 seconds(default). So, nodes without this patch would always
alive every 2 seconds.
Sunil Mushran wrote:
How will it double? The node will send a keepalive only if it has
not heard from the other node for 2 secs.
Srinivas Eeda wrote:
No harm, just doubles heartbeat messages which is not required at all.
Sunil Mushran wrote:
What's the harm in leaving
.
As in, not wait for the response to requeue. But we'll still be smart
about
it in the sense that not send a hb even if the nodes are communicating
otherwise.
Srinivas Eeda wrote:
In old code a node cancels and re queues keep alive message when it
hears from the other node. If it didn't hear in 2
Yea, they don't expect/wait for a response for keep alive message.
On 2/17/2010 5:49 PM, Joel Becker wrote:
On Wed, Feb 17, 2010 at 10:24:30AM -0800, Srinivas Eeda wrote:
Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC
every 2 seconds(default). So, nodes without
messages
to/from evicted node. If network connection comes back before the eviction,
quorum decision is cancelled and messaging resumes.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c | 69 +++
fs/ocfs2/cluster
This patch rollbacks earlier fix that tries to re-establish network connection
when network timeout happens. Reconnect was re-cycling sockets which results
in lost messages resulting in hangs.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c | 50
Following 3 patches fixes:
1. rollback's reconnect fix
2. delay enotconn for sends, receives till a node reconnects/dies after a
lost connection.
3. Correct's keepalive protocol
Thanks,
--Srini
___
Ocfs2-devel mailing list
When a file system is mounted local, it may be enough to remount it read only
on seeing corruptions.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/super.c | 10 ++
1 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
is cancelled and messaging resumes.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/cluster/tcp.c | 94 +++
fs/ocfs2/cluster/tcp_internal.h |9 ++--
2 files changed, 60 insertions(+), 43 deletions(-)
diff --git a/fs/ocfs2
Tao Ma wrote:
Hi Joel,
This reply may be really too late. :)
Joel Becker wrote:
On Wed, Jun 10, 2009 at 01:37:53PM +0800, Tao Ma wrote:
I also have some thoughts for it. Wish it isn't too late.
Well, if we come up with changes it will affect what I push, but
that's OK.
in ocfs2_super when reflink is
ongoing(I will do it).
Make sense?
Yes, I can restrict the node to recover it's own and offline slots. I
can make the node to recover it's own slot every time the timer fires
and offline slots in round robin way(current way)
Regards,
Tao
Srinivas Eeda wrote
such inodes. Care has been taken to distribute the
workload across the cluster so that no one node has to perform the task all the
time.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/dlmglue.c | 51 ++
fs/ocfs2/dlmglue.h | 10
fs/ocfs2
Next two patches are backport of orphan scan timer patches to ocfs2-1.4
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Patch to track delayed orphan scan timer statistics.
Modifies ocfs2_osb_dump to print the following:
Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
Signed-off-by: Sunil Mushran sunil.mush...@oracle.com
---
fs/ocfs2/journal.c
Resending after implementing review comments.
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Patch to track delayed orphan scan timer statistics.
Modifies ocfs2_osb_dump to print the following:
Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c |2 ++
fs/ocfs2/ocfs2.h |4 +++-
fs/ocfs2
at a time. It is done once every X seconds, where X is a value between
ORPHAN_SCAN_SCHEDULE_TIMEOUT/2 and ORPHAN_SCAN_SCHEDULE_TIMEOUT milliseconds.
Each time the scan is done by different node so eventually the node that has the
inode cached will get to wipe the file.
Signed-off-by: Srinivas Eeda
Resending after adding another patch to display delayed orphan scan statistics.
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel
the cluster so that no one node has to perform the task all the
time.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/dlmglue.c | 47 +
fs/ocfs2/dlmglue.h | 11 +
fs/ocfs2/journal.c | 106 +++
fs
Patch to track delayed orphan scan timer statistics.
Modifies ocfs2_osb_dump to print the following:
Orphan Scan= Local: 10 Global: 21 Last Scan: 67 seconds ago
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c |2 ++
fs/ocfs2/ocfs2.h |4 +++-
fs/ocfs2
Resending after implementing review comments.
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel
at a time. It is done once every X seconds, where X is a value between
ORPHAN_SCAN_SCHEDULE_TIMEOUT/2 and ORPHAN_SCAN_SCHEDULE_TIMEOUT milliseconds.
Each time the scan is done by different node so eventually the node that has the
inode cached will get to wipe the file.
Signed-off-by: Srinivas Eeda
-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/dlmglue.c | 58 +
fs/ocfs2/dlmglue.h |8 +++
fs/ocfs2/journal.c | 109 +++
fs/ocfs2/journal.h | 12 +
fs/ocfs2/ocfs2.h|2 +
fs
did you use -o datavolume, nointr options for mounting?
keyur patel wrote:
Hello All,
I have installed Oracle Cluster Manager on linux x86-64 nit. I am
using ocfs file system for quorum file. But I am getting following
error. Please see ocfs configureation below. I would appreciate, if
hmm, even if we queue the orphan recovery, inode may not get cleaned if
the inode is still around on some node right? The node where the inode
is still cached will vote no again?
Sunil Mushran wrote:
Joel Becker wrote:
Srini,
Ok, you can go ahead and cook up the background orphan
The following patch is a backport of patch that recovers orphans from offline
slots. It is being backported from mainline to 1.4
mainline patch: 0001-Patch-to-recover-orphans-in-offline-slots-during-rec.patch
Thanks,
--Srini
___
Ocfs2-devel mailing
recovers it's own slot, which
leaves orphans in offline slots.
This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
Signed-off-by: Joel Becker joel.bec...@oracle.com
---
fs/ocfs2/journal.c | 140
recovers it's own slot, which
leaves orphans in offline slots.
This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c | 136
recovers it's own slot, which
leaves orphans in offline slots.
This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c | 106
are
clean they will not queue to recover their orphan directory.
This patch queues to recover orphans when the slot is next used.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c | 21 -
1 files changed, 8 insertions(+), 13 deletions(-)
diff --git
Next 3 patches does the following
1) movies ocfs2_slot_info struct from slot_map.c to slot_map.h
2) patch to recover orphans during mount even if the journal is clean
3) patch to recovery orphans in offline slots
___
Ocfs2-devel mailing list
things:
a) Recover orphans during mount of the slot that it is using.
b) Recover orphans in all offline slots during recovery.
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
---
fs/ocfs2/journal.c | 44 +++-
1 files changed, 27 insertions(+), 17
That might because you have configured user mode dlm. What does |cat
/sys/o2cb/heartbeat_mode show, user?. If so run, /etc/init.d/o2cb
configure and answer n for the following
||Use user-space driven heartbeat? (y/n) [y] n
thanks,
--Srini
||
|Andy Johnson wrote:
Hello,
I have
101 - 146 of 146 matches
Mail list logo