Re: [Ocfs2-users] sanity check - Xen+iSCSI+LVM+OCFS2 at dom0/domU

Alok Dhir Fri, 08 Feb 2008 10:51:19 -0800

You are indeed correct. Removing "/lib/modules/2.6.18-53.1.6.el5xen/extra/ocfs2/" on all cluster members. Will report back with results.


Thanks!


On Feb 8, 2008, at 1:43 PM, Sunil Mushran wrote:

Appears the self-built ocfs2 modules are still in play.

Do:
$ find /lib/modules/`uname -r` -name \*ocfs\* -exec echo -n "{} "\; -exec rpm -qf {} \;
This will show all the ocfs2 modules and whether the owning package.

Alok Dhir wrote:
Great call on the netconsole -- I had no idea it existed until youradvice - you learn something new every day :)
Here's my repeatable oops on Centos 5.1 Xen dom0,2.6.18-53.1.6.el5xen x86_64, using OSS packages'ocfs2-2.6.18-53.1.6.el5xen-1.2.8-2.el5'.
As soon as I kick off 'iozone -A':

Kernel BUG at fs/inode.c:250
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:04:00.0/0000:05:00.0/0000:06:00.0/0000:07:00.0/irq
CPU 5
Modules linked in: ocfs2(U) netconsole netloop netbk blktap blkbkipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlinkipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge ipv6autofs4 hidp rfcomm l2cap bluetooth ocfs2_dlmfs(U) ocfs2_dlm(U)ocfs2_nodemanager(U) configfs sunrpc ib_iser rdma_cm ib_cm iw_cmib_addr ib_local_sa ib_sa ib_mad ib_core iscsi_tcp libiscsiscsi_transport_iscsi dm_multipath video sbs backlight i2c_eci2c_core button battery asus_acpi ac parport_pc lp parport joydevsr_mod ide_cd serial_core serio_raw cdrom pcspkr shpchp bnx2dm_snapshot dm_zero dm_mirror dm_mod mppVhba(U) usb_storageata_piix libata megaraid_sas mppUpper(U) sg(U) sd_mod scsi_mod ext3jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 31841, comm: iozone Not tainted 2.6.18-53.1.6.el5xen #1
RIP: e030:[<ffffffff80222b19>] [<ffffffff80222b19>] clear_inode+0x1b/0x123
RSP: e02b:ffff8803c2803e28  EFLAGS: 00010202
RAX: ffff8803c3f2ff20 RBX: ffff8803c3f2fd98 RCX: 0000000000000000
RDX: ffffffffff578140 RSI: ffff8803c2803e48 RDI: ffff8803c3f2fd98
RBP: 0000000000000000 R08: ffff8803dbe1b6c0 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880002cb8c00 R12: ffff8803c3f2fac0
R13: ffff8803d6e46000 R14: 0000000000000002 R15: 0000000000000000
FS: 00002aaaaaac5ee0(0000) GS:ffffffff80599280(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process iozone (pid: 31841, threadinfo ffff8803c2802000, taskffff8803e345a7a0)Stack: ffff8803c3f2fd98 ffffffff8869ac19 ffff8803c3f2fd98ffff8803c38571300000000000000000 0000000000000002 ffffffffffffffff0000000000000200
ffff8803c3f2fd98  ffffffff8869a4f0
Call Trace:
[<ffffffff8869ac19>] :ocfs2:ocfs2_delete_inode+0x729/0x79a
[<ffffffff8869a4f0>] :ocfs2:ocfs2_delete_inode+0x0/0x79a
[<ffffffff8022f811>] generic_delete_inode+0xc6/0x143
[<ffffffff88699df5>] :ocfs2:ocfs2_drop_inode+0x117/0x16e
[<ffffffff8023c6ea>] do_unlinkat+0xd5/0x141
[<ffffffff8025d291>] tracesys+0x47/0xb2
[<ffffffff8025d2f1>] tracesys+0xa7/0xb2


Code: 0f 0b 68 e3 7f 47 80 c2 fa 00 48 8b 83 08 02 00 00 a8 10 75
RIP  [<ffffffff80222b19>] clear_inode+0x1b/0x123
RSP <ffff8803c2803e28>
<0>Kernel panic - not syncing: Fatal exception


On Feb 7, 2008, at 6:56 PM, Sunil Mushran wrote:
Setup netconsole on the cluster members (domU?) to get a stacktrace.
Alok Dhir wrote:
Thanks again for your prompt assistance earlier today - we seemto have gotten past the fs/inode.c bug at domU by using the OSSpackaged ocfs2 kernel modules. The cluster comes up and mountson all boxes, and appears to work.
However, we have now run into a more prevalent issue - at dom0,any of the cluster member servers will spontaneously reboot whenI start an 'iozone -A' in an ocfs2 filesystem. I am unable tocheck the kernel panic message as the box reboots immediately,despite the setting of 'kernel.panic=0' in sysctl (which issupposed to mean 'do not reboot on panic'). There are also noentries in messages when this happens.
I realize there's not much debugging you can do without the panicmessage, but I'm wondering if perhaps this new version has somebug which was not in 1.2.7 (with our self-built 1.2.7 only domUservers rebooted - dom0 were stable).
Are others running this new version with success? Under RHEL/Centos 5.1 Xen dom0/domU?
On Feb 7, 2008, at 1:40 PM, Sunil Mushran wrote:
Is the ip address correct? If not, correct.

# netstat -tan
See if that port is already in use. If  so, use another.

Alok Dhir wrote:
Ah - thanks for the clarification.
I'm left with one perplexing problem - on one of the hosts,'devxen0', o2cb refuses to start. The box is identicallyconfigured to at least 2 other cluster hosts and all wereimaged the exact same way, except that devxen0 has 32GB RAMwhere the others have 16 or less.
Any clues where to look?

[EMAIL PROTECTED]:~] service o2cb enable
Writing O2CB configuration: OK
Starting O2CB cluster ocfs2: Failed
Cluster ocfs2 created
Node beast added
o2cb_ctl: Internal logic failure while adding node devxen0

Stopping O2CB cluster ocfs2: OK
--This is in syslog when this happens:
Feb 7 13:26:50 devxen0 kernel:(17194,6):o2net_open_listening_sock:1867 ERROR: unable to bindsocket at 196.168.1.72:7777, ret=-99
--Box config:

[EMAIL PROTECTED]:~] uname -a
Linux devxen0.symplicity.com 2.6.18-53.1.6.el5xen #1 SMP WedJan 23 11:59:21 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
--Here is cluster.conf:

---
node:
 ip_port = 7777
 ip_address = 192.168.1.62
 number = 0
 name = beast
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 196.168.1.72
 number = 1
 name = devxen0
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.1.73
 number = 2
 name = devxen1
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.1.74
 number = 3
 name = devxen2
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.1.70
 number = 4
 name = fs1
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.1.71
 number = 5
 name = fs2
 cluster = ocfs2

node:
 ip_port = 7777
 ip_address = 192.168.1.80
 number = 6
 name = vdb1
 cluster = ocfs2

cluster:
 node_count = 7
 name = ocfs2
---



On Feb 7, 2008, at 1:23 PM, Sunil Mushran wrote:
Yes, but backported and released as ocfs2 1.4 which is yet tobe released.
You are on ocfs2 1.2.

Alok Dhir wrote:
I've seen that -- I was under the impression that some ofthose were being backported into the release kernels.
Thanks,

Alok

On Feb 7, 2008, at 1:15 PM, Sunil Mushran wrote:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-new-features.html

Alok Dhir wrote:
We were indeed using a self-built module due to the lack ofan OSS one for the latest kernel. Thanks for yourresponse, I will test with the new version.
What are we leaving on the table by not using the latestmainline kernel?
On Feb 7, 2008, at 12:56 PM, Sunil Mushran wrote:
Are you building ocfs2 with this kernel or are using theones we
provide for RHEL5?
I am assuming you have built it yourself as we did notreleasepackages for the latest 2.6.18-53.1.6 kernel till lastnight.
If you are using your own, then use the one from oss.
If you are using the one from oss, then file a bugzillawith the
full oops trace.

Thanks
Sunil

Alok K. Dhir wrote:
Hello all - we're evaluating OCFS2 in our developmentenvironment to see if it meets our needs.
We're testing it with an iSCSI storage array (DellMD3000i) and 5 servers running Centos 5.1(2.6.18-53.1.6.el5xen).
1) Each of the 5 servers is running the Centos 5.1 open-iscsi initiator, and sees the volumes exposed by thearray just fine. So far so good.
2) Created a volume group using the exposed iscsi volumesand created a few LVM2 logical volumes.
3) vgscan; vgchange -a y; on all the cluster members.all see the "md3000vg" volume group. looking good. (wehave no intention of changing the LVM2 configurationsmuch if at all, and can make sure all such changes aredone when the volumes are off-line on all clustermembers, so theoretically this should not be a problem).
4) mkfs.ocfs2 /dev/md3000vg/testvol0 -- works great

5) mount on all Xen dom0 boxes in the cluster, works great.
6) create a VM on one of the cluster members, set upiscsi, vgscan, md3000vg shows up -- looking good.
7) install ocfs2, 'service o2cb enable', starts up fine.mount /dev/md3000vg/testvol0, works fine.
** Thanks for making it this far -- this is where is getsinteresting
8) run 'iozone' in domU against ocfs2 share - BANG -immediate kernel panic, repeatable all day long.
"kernel BUG at fs/inode.c"

So my questions:

1) should this work?

2) if not, what should we do differently?
3) currently we're tracking the latest RHEL/Centos 5.1kernels -- would we have better luck using the latestmainline kernel?
Thanks for any assistance.

Alok Dhir


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] sanity check - Xen+iSCSI+LVM+OCFS2 at dom0/domU

Reply via email to