Re: [Ocfs2-users] issues with my ocfs2 cluster

2018-01-10 Thread Luis Freitas
Hi Jim,
      Have you tried to update the kernel as suggested by Changwei? 
      The messages seem to indicate kernel 4.4.0, a google search shows this 
ubuntu version should be able to use kernel 4.10.
Best Regards,Luis       

On Wednesday, January 10, 2018 4:12 PM, Jim Okken  wrote:
 

 hello again list,
We seem to be having issues on more servers where according to the linux 
developers here: "the kernel is stuck in a spin lock during a disk operation."

The call traces are below, I see a lot of ocfs in the call traces, but I don't 
know how to read them, please tell me does the issue come from ocfs?thanks --Jim
2018-01-06T17:10:02.194362+00:00 node-115 kernel: [87885.155288] Modules linked 
in: vhost_net vhost macvtap macvlan ip6table_raw xt_mac xt_tcpudp xt_physdev 
br_netfilter veth ebtable_filter ebtables openvswitch ocfs2 quota_tree 
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue 
configfs ip6table_filter ip6_tables xt_multiport xt_conntrack iptable_filter 
xt_comment xt_CT iptable_raw ip_tables x_tables xfs bridge 8021q garp mrp stp 
llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul 
ipmi_ssif crc32_pclmul ghash_clmulni_intel kvm_intel aesni_intel aes_x86_64 kvm 
lrw gf128mul glue_helper ablk_helper irqbypass cryptd hpilo 8250_fintek 
serio_raw ioatdma ipmi_si sb_edac edac_core ipmi_msghandler shpchp dca 
acpi_power_meter lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 
nf_defrag_ipv4 nf_conntrack autofs4 raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor dm_round_robin ses enclosure 
scsi_transport_sas raid6_pq libcrc32c raid1 raid0 multipath linear uas 
usb_storage psmouse lpfc be2net vxlan ip6_udp_tunnel scsi_transport_fc 
udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac scsi_dh_alua 
dm_multipath2018-01-06T17:10:02.194364+00:00 node-115 kernel: [87885.157143] 
CPU: 15 PID: 11936 Comm: qemu-system-x86 Not tainted 4.4.0-98-generic 
#121-Ubuntu2018-01-06T17:10:02.194366+00:00 node-115 kernel: [87885.157144] 
Hardware name: HP ProLiant BL460c Gen9, BIOS I36 
02/17/20172018-01-06T17:10:02.194367+00:00 node-115 kernel: [87885.157280] 
task: 882036ff ti: 881f80ca task.ti: 
881f80ca2018-01-06T17:10:02.194400+00:00 node-115 kernel: 
[87885.157281] RIP: 0010:[]  [] 
native_queued_spin_lock_slowpath+0x15c/0x1702018-01-06T17:10:02.194414+00:00 
node-115 kernel: [87885.157566] RSP: 0018:88203f143c30  EFLAGS: 
02022018-01-06T17:10:02.194416+00:00 node-115 kernel: [87885.157567] RAX: 
0101 RBX: 8820046c83f0 RCX: 
00012018-01-06T17:10:02.194418+00:00 node-115 kernel: 
[87885.157705] RDX: 0101 RSI: 0001 RDI: 
8820046c83ec2018-01-06T17:10:02.194440+00:00 node-115 kernel: 
[87885.157705] RBP: 88203f143c30 R08: 0101 R09: 
811924a72018-01-06T17:10:02.194442+00:00 node-115 kernel: 
[87885.157706] R10: ea0040d6d680 R11: 0800 R12: 
8820046c83ec2018-01-06T17:10:02.194443+00:00 node-115 kernel: 
[87885.157707] R13: 0800 R14: 4c63ee00 R15: 
08002018-01-06T17:10:02.19+00:00 node-115 kernel: 
[87885.157708] FS:  7fbcbb7eec00() GS:88203f14() 
knlGS:2018-01-06T17:10:02.19+00:00 node-115 kernel: 
[87885.157709] CS:  0010 DS:  ES:  CR0: 
800500332018-01-06T17:10:02.194445+00:00 node-115 kernel: 
[87885.157710] CR2: 7f54266a8000 CR3: 000fcc2f2000 CR4: 
001426e02018-01-06T17:10:02.194446+00:00 node-115 kernel: 
[87885.157711] Stack:2018-01-06T17:10:02.194448+00:00 node-115 kernel: 
[87885.157712]  88203f143c40 81844421 88203f143c60 
818425352018-01-06T17:10:02.194449+00:00 node-115 kernel: 
[87885.157714]  881e88a9ca80 8820046c84b0 88203f143c70 
8184257b2018-01-06T17:10:02.194450+00:00 node-115 kernel: 
[87885.157716]  88203f143ca0 c074158d 881e5d3beb80 
08002018-01-06T17:10:02.194450+00:00 node-115 kernel: 
[87885.157717] Call Trace:2018-01-06T17:10:02.194451+00:00 node-115 kernel: 
[87885.157718]  2018-01-06T17:10:02.194453+00:00 node-115 kernel: 
[87885.157725]  [] 
_raw_spin_lock+0x21/0x302018-01-06T17:10:02.194454+00:00 node-115 kernel: 
[87885.157727]  [] 
__mutex_unlock_slowpath+0x25/0x502018-01-06T17:10:02.194456+00:00 node-115 
kernel: [87885.157729]  [] 
mutex_unlock+0x1b/0x202018-01-06T17:10:02.194457+00:00 node-115 kernel: 
[87885.157766]  [] ocfs2_dio_end_io+0x6d/0x80 
[ocfs2]2018-01-06T17:10:02.194458+00:00 node-115 kernel: [87885.157770]  
[] dio_complete+0x11c/0x1c02018-01-06T17:10:02.194460+00:00 
node-115 kernel: [87885.157771]  [] 
dio_bio_end_aio+0x73/0x1002018-01-06T17:10:02.194461+00:00 node-115 kernel: 
[87885.157774]  [] 

Re: [Ocfs2-users] Backup issues

2012-04-04 Thread Luis Freitas
Dirk,
 
   Also, you should have noatime enabled on this filesystem, check your mount 
options. Or else the rsync will end up causing access time to be updated.
 
Regards,
Luis



From: Eduardo Diaz - Gmail ediaz...@gmail.com
To: Dirk Bonenkamp - ProActive d...@proactive.nl 
Cc: ocfs2-users@oss.oracle.com 
Sent: Tuesday, April 3, 2012 11:26 AM
Subject: Re: [Ocfs2-users] Backup issues


For backup I use a local copy using tar, but I use rsync too.

I don't compare but if you send any stadistics about rsync we can said more 
information.

use rsync --stats

I don't note difernt speed, but did you make a full rsync, and show --progress 
for see that file it is going at what speed?

regards



On Mon, Apr 2, 2012 at 9:04 PM, Dirk Bonenkamp - ProActive d...@proactive.nl 
wrote:

Hi All,

I'm currently testing a OCFS2 set-up, and I'm having issues with
creating backups.

I have a 2 node cluster, running OCFS2 on a dual primary DRBD device.

The file system is 3.7Tb of which 211 Gb is used: about 1.5 million
files in 95 directories.

Everything works fine, except for the backups, which are taking way more
time than on 'regular' file systems.

I'm using rsync for my backups. When I rsync the file system above, this
takes more than an hour, without any modifications to the file system.

Network / disk speed is good. I can rsync a 10 Gb file from the OCFS2
filesystem to the same backup server with just under 100 Mb/s.

I know there is some penalty to be expected from a clustered file
system, but this is al lot. Rsyncing an ext3 file system double the size
(in Mb's and files) of this file system takes about 600 seconds...

Has anybody some advice on a backup strategy for me? Or some tuning tips?

Thanks in advance,

Dirk

--
http://www.proactive.nl
T       023 - 5422299
F       023 - 5422728

www.proactive.nl http://www.proactive.nl



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] partition offset/alignment on SAN devices.

2010-07-09 Thread Luis Freitas
Thomas and James,

   Usually the partition is aligned to the Cylinder boundary. In the case of a 
LUN, the cylinder bondary might have no sense at all? Funny, I never tought 
about this before.

  There are some tools that can overwrite a few sectors on the start of the 
disk 
(lilo, grub, DOS fdisk), and expect the disk to have a partition table, where 
these sectors are not used for this reason. 


   I don't know if OCFS2 has provisions for leaving this space unused.

Regards,
Luis




From: thomas.zimol...@bmi.bund.de thomas.zimol...@bmi.bund.de
To: james.mas...@tradefair.com; ocfs2-users@oss.oracle.com
Sent: Fri, July 9, 2010 11:24:38 AM
Subject: Re: [Ocfs2-users] partition offset/alignment on SAN devices.

 mkfs.ocfs2 -L SOMELABEL /dev/dm-10
 This 64k (32/128k whatever) issue is usually only a problem if you've used
a fdisk to create a
 partition to put your data on. For hysterical reasons the first partition
is created an arbitrary
 amount of kb into the disk. This almost never lines up with the LUN
raid-stripe. Then the I/O going
 from app  FS  LUN  disks overlaps block sizes - causing unnecessary
I/O through the chain and
 sub-optimal performance.

Hi folks,

we had that issue too with our CX4-480 and the guys from EMC telling us not
to forget the alignment.

So we first wondered how to automate this, because we'd have had to align the
partitions on quite a few LUNs (more than 40).
Since you'd have to use the expert options of fdisk, we did'nt find a quick
solution with sfdisk, though with some further investigation there will
surely be one.

But in the end this was not necessary anyway. We specifically requested that
at EMC and they confirmed, that alignment is only a point when using
partitions at all. We don’t use partitions, so no todo.

IMHO: Concerning the risk of destroying whatever is on the LUN: If someone
thinks the LUN is unused, just because he/she sees no partition on it, isn't
that more of a lack of diligence?
We have a similar situation with ASM-devices: Even if you have a partition on
it, it's not mountable (as there's some ASM data inside it and no FS). So
you'd always have to check more than that to be sure, that the device is
unused.

Maybe you can clarify that with the vendor of your shared disks.

Greetings,
Thomas

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?

2009-12-14 Thread Luis Freitas
Brian,
 
   Hmm, I was not aware of this. Seems Novel uses other volume manager, called 
EVMS, not CLVM (?).
 
From:
http://wiki.novell.com/index.php/Linux_Data_Management
 
Some Open Source OCFS2 Features 
Oracle Linux Certification matrix 
OCFS2 project web site 
OCFS2 Development Roadmap 
Oracle Cluster File System v2 (OCFS2) is an open source cluster management and 


No exclusive write lock capability yet (now every lock request returns: 
successful). This feature is candidate for SLE10 SP2 (Q1 2008). 
OCFS2 on top of a software mirror is not supported yet 
Can be managed by EVMS 


OCFS2 offers integration with heartbeat2. Heartbeat2 offers a Resource Agent 
'md group take over'. (which enables fail-over of host based mirroring of SAN 
volumes), but OCFS2 on top of a software mirror is not supported. 
...

Best Regards,
Luis Freitas

--- On Fri, 12/11/09, Brian Kroth bpkr...@gmail.com wrote:


From: Brian Kroth bpkr...@gmail.com
Subject: Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
To: Luis Freitas lfreita...@yahoo.com
Cc: Patrick J. LoPresti lopre...@gmail.com, ocfs2-users@oss.oracle.com
Date: Friday, December 11, 2009, 2:09 PM


Luis Freitas lfreita...@yahoo.com 2009-12-11 05:40:
 Patrick,
 
    Depending on what you are using, you could use the volume manager
    to do the striping, but you need to use CLVM. So if you can, go for
    Heartbeat2+CLVM+OCFS2, all integrated.
 
    Not sure but I think Heartbeat2+OCFS2 is only available on the
    vanilla kernels, not on the enterprise ones. Maybe Suse has
    support, I don't know, you will have to check.
 
 Best Regards,
 Luis Freitas

Just to elaborate on these comments.  Last time I checked CLVM required
the openais/cman cluster stack, which neither heartbeat nor ocfs2 use
(by default).  The userspace stack option for ocfs2 in recent mainline
kernels added support for the openais stack and pacemaker is required to
make heartbeat work with that rather than use it's own cluster stack.

Now, you can do an basic LVM linear span, concatenation, or whatever you
want to call it without any cluster stack, so long as it's not striped
and so long as you heed Sunil's warning about fat fingering changes to
the thing while more than one host is using it.

That means that if you want to add another LUN to the span you can't do
it on the fly.  You have to do something like this:

# On all nodes:
umount /ocfs2

# On all nodes but one:
vgchange -an ocfs2span
# Or, to be extra safe:
halt -p

# On the remaining node:
vgextend ocfs2span /dev/newlun
lvextend -l+100%FREE /dev/mapper/ocfs2span-lv
tunefs.ocfs2 -S /dev/mapper/ocfs2span-lv

# You might actually need the fs mounted for that last bit, I forget.
# Probably a fsck somewhere in there would be wise as well.

# Bring the other nodes back up.

Brian

 --- On Wed, 12/9/09, Patrick J. LoPresti lopre...@gmail.com wrote:
 
   From: Patrick J. LoPresti lopre...@gmail.com
   Subject: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
   To: ocfs2-users@oss.oracle.com, linux-r...@vger.kernel.org
   Date: Wednesday, December 9, 2009, 9:03 PM
 
   Is it possible to run an OCFS2 file system on top of Linux software RAID?
 
   Here is my situation.  I have four identical disk chassis that perform
   hardware RAID internally.  Each chassis has a pair of fiber channel
   ports, and I can assign the same LUN to both ports.  I want to connect
   all of these chassis to two Linux systems.  I want the two Linux
   systems to share a file system that is striped across all four chassis
   for performance.
 
   I know I can use software RAID (mdadm) to do RAID-0 striping across
   the four chassis on a single machine; I have tried this, it works
   fine, and the performance is tremendous.  I also know I can use OCFS2
   to create a single filesystem on a single chassis that is shared
   between my two Linux systems.  What I want is to combine these two
   things.
 
   Suse's documentation
   ([1]http://www.novell.com/documentation/sles11/stor_admin/?page=/documentation/sles11/stor_admin/data/raidyast.html)
   says:
 
   IMPORTANT:Software RAID is not supported underneath clustered file
   systems such as OCFS2, because RAID does not support concurrent
   activation. If you want RAID for OCFS2, you need the RAID to be
   handled by the storage subsystem.
 
   Because my disk chassis already perform hardware RAID-5, I only need
   Linux to do the striping (RAID-0) in software.  So for me, there is no
   issue about which node should rebuild the RAID etc.  I understand
   that Linux md stores meta-data on the partitions and is not cluster
   aware, but will this create problems for OCFS2 even if it is just RAID
   0?
 
   Has anybody tried something like this?  Are there alternative RAID-0
   solutions for Linux that would be expected to work?
 
   Thank you.
 
   - Pat
 
   ___
   Ocfs2-users mailing list
   [2]ocfs2-us...@oss.oracle.com
   [3]http

Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?

2009-12-14 Thread Luis Freitas
Sunil,
 
 I am getting too old for all these cluster stacks! Seems that heartbeat2 is 
deprecated...
 
Can ocfs2 be integrated with pacemaker in the same way as it was possible with 
heartbeat2 on Suse 10? I know that the RedHat cluster stack cannot, so I used 
to consider this as an additional feature for Suse Linux.
 
Best Regards,
Luis Freitas
 


--- On Mon, 12/14/09, Sunil Mushran sunil.mush...@oracle.com wrote:


From: Sunil Mushran sunil.mush...@oracle.com
Subject: Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
To: Luis Freitas lfreita...@yahoo.com
Cc: Brian Kroth bpkr...@gmail.com, ocfs2-users@oss.oracle.com
Date: Monday, December 14, 2009, 4:20 PM


That's old.

sles11 has added the pacemaker cluster stack that works with clvm.

Luis Freitas wrote:
 Brian,
  
    Hmm, I was not aware of this. Seems Novel uses other volume 
 manager, called EVMS, not CLVM (?).
  
 From:
 http://wiki.novell.com/index.php/Linux_Data_Management
  


         Some Open Source OCFS2 Features

 Oracle Linux Certification matrix 
 http://www.novell.com/products/server/oracle/matrix.html
 OCFS2 project web site http://oss.oracle.com/projects/ocfs2/
 OCFS2 Development Roadmap http://oss.oracle.com/osswiki/OCFS2/Roadmap
 Oracle Cluster File System v2 (OCFS2) is an open source cluster 
 management and
 

     * No exclusive write lock capability yet (now every lock request
       returns: successful). This feature is candidate for SLE10 SP2
       (Q1 2008).
     * OCFS2 on top of a software mirror is not supported yet
     * *Can be managed by EVMS *

 **

     * OCFS2 offers integration with heartbeat2. Heartbeat2 offers a
       Resource Agent 'md group take over'. (which enables fail-over of
       host based mirroring of SAN volumes), but OCFS2 on top of a
       software mirror is not supported.

 ...

 Best Regards,
 Luis Freitas

 --- On *Fri, 12/11/09, Brian Kroth /bpkr...@gmail.com/* wrote:


     From: Brian Kroth bpkr...@gmail.com
     Subject: Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
     To: Luis Freitas lfreita...@yahoo.com
     Cc: Patrick J. LoPresti lopre...@gmail.com,
     ocfs2-us...@oss.oracle.com
     Date: Friday, December 11, 2009, 2:09 PM

     Luis Freitas lfreita...@yahoo.com
     http://us.mc514.mail.yahoo.com/mc/compose?to=lfreita...@yahoo.com
     2009-12-11 05:40:
      Patrick,
     
     Depending on what you are using, you could use the volume manager
     to do the striping, but you need to use CLVM. So if you can,
     go for
     Heartbeat2+CLVM+OCFS2, all integrated.
     
     Not sure but I think Heartbeat2+OCFS2 is only available on the
     vanilla kernels, not on the enterprise ones. Maybe Suse has
     support, I don't know, you will have to check.
     
      Best Regards,
      Luis Freitas

     Just to elaborate on these comments.  Last time I checked CLVM
     required
     the openais/cman cluster stack, which neither heartbeat nor ocfs2 use
     (by default).  The userspace stack option for ocfs2 in recent mainline
     kernels added support for the openais stack and pacemaker is
     required to
     make heartbeat work with that rather than use it's own cluster stack.

     Now, you can do an basic LVM linear span, concatenation, or
     whatever you
     want to call it without any cluster stack, so long as it's not striped
     and so long as you heed Sunil's warning about fat fingering changes to
     the thing while more than one host is using it.

     That means that if you want to add another LUN to the span you
     can't do
     it on the fly.  You have to do something like this:

     # On all nodes:
     umount /ocfs2

     # On all nodes but one:
     vgchange -an ocfs2span
     # Or, to be extra safe:
     halt -p

     # On the remaining node:
     vgextend ocfs2span /dev/newlun
     lvextend -l+100%FREE /dev/mapper/ocfs2span-lv
     tunefs.ocfs2 -S /dev/mapper/ocfs2span-lv

     # You might actually need the fs mounted for that last bit, I forget.
     # Probably a fsck somewhere in there would be wise as well.

     # Bring the other nodes back up.

     Brian

      --- On Wed, 12/9/09, Patrick J. LoPresti lopre...@gmail.com
     http://us.mc514.mail.yahoo.com/mc/compose?to=lopre...@gmail.com
     wrote:
     
    From: Patrick J. LoPresti lopre...@gmail.com
     http://us.mc514.mail.yahoo.com/mc/compose?to=lopre...@gmail.com
    Subject: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
    To: ocfs2-users@oss.oracle.com
     http://us.mc514.mail.yahoo.com/mc/compose?to=ocfs2-us...@oss.oracle.com,
     linux-r...@vger.kernel.org
     http://us.mc514.mail.yahoo.com/mc/compose?to=linux-r...@vger.kernel.org
    Date: Wednesday, December 9, 2009, 9:03 PM
     
    Is it possible to run an OCFS2 file system on top of Linux
     software RAID?
     
    Here is my situation.  I have four identical disk chassis that
     perform
    hardware RAID internally

Re: [Ocfs2-users] ocfs2 heartbeat and drbd in dual primary

2009-12-09 Thread Luis Freitas
Unni,

    Well, with this setup you can't properly deal with a split brain scenario.

    DRDB will have it's heartbeat algorithm, I don't know how it works, but it 
certainly has one, and OCFS2 has one too. If they are not integrated you could 
have a scenario where OCFS2 kills one node and DRDB kills the other, and you 
end up without any of them.

    OCFS2 chooses the first node in the case of a network failure. So you 
should either use OCFS2 and DRDB integrated with heartbeat2 (Available only for 
non-enterprise distros) or setup DRDB to choose the first node too.

   Btw, someone actually using DRBD can comment better on this. I use OCFS2 for 
RAC, so I have to use shared storage.

Best Regards,
Luis Freitas

--- On Tue, 12/8/09, unni krishnan unnikrishna...@gmail.com wrote:

From: unni krishnan unnikrishna...@gmail.com
Subject: Re: [Ocfs2-users] ocfs2 heartbeat and drbd in dual primary
To: ocfs2-users@oss.oracle.com
Date: Tuesday, December 8, 2009, 3:13 AM

Any idea ?

On Mon, Dec 7, 2009 at 10:13 PM, unni krishnan unnikrishna...@gmail.com wrote:
 Hi,

 I am new to the world of cluster file system and also with ocfs2. I am
 setting up a cluster as a research project and it looks like :

 http://picasaweb.google.com/lh/photo/A7CUF3b_SkzJ0NZKbkyyng?feat=directlink

 The heartbeat will not support ocfs2. Also since in my case the drbd
 is in dual primary, for me it is not necessary to add it in heartbeat
 or pacemaker ( Its for me and not sure, if that is correct. )

 In my case the crm resource are the two VPSs in the picture. I am
 using the built in heartbeat provided by ocfs2 and the ocfs2 dlm.

 My questions are :

 Since the ocfs2 or drbd is not added in crm, is there any worst thing
 or any disaster going to happen with this setup ?

 Please suggest the best method to deal with the ocfs2 + drbd in dual
 primary + heartbeat + pacemaker

 I am using :

 OpenVZ for VPS
 CentOS 5 as OS

 --
 Regards,
 Unni




-- 
Regards,
Unni

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 2 node OCFS2 clusters

2009-11-16 Thread Luis Freitas
Srinivas,

  If this is true then I would sugest OCFS2 is not taking the best decision in 
this scenario.

  The node that still has network connectivity should survive instead of the 
lowest node number. Oracle CRS has heuristics to detect if the network is down 
and in this scenario the node that lost network conectivy is evicted. That is 
why it is required to use a switch between the two nodes, instead of a cross 
cable. 

   OCFS2 should do the same.

Best Regards,
Luis Freitas

--- On Mon, 11/16/09, Srinivas Eeda srinivas.e...@oracle.com wrote:

 From: Srinivas Eeda srinivas.e...@oracle.com
 Subject: Re: [Ocfs2-users] 2 node OCFS2 clusters
 To: Thompson, Mark mark.thomp...@uk.experian.com
 Cc: ocfs2-users@oss.oracle.com
 Date: Monday, November 16, 2009, 2:05 PM
 
 
 
   
 
  
 Thompson, Mark wrote:
 
   
   
 
   2 node OCFS2 clusters
   
 
 
 
   
   Hi
 Srini, 
   
   
   Thanks
 for the response. 
   
   
   So
 are the following statements correct: 
   
   
   If
 I stop the networking on node 1, node 0 will continue to
 allow OCFS2 filesystems to work and not reboot itself.
  
   
   
   If
 I stop the networking on node 0, node 1 (now being the
 lowest
 node?) will continue to allow OCFS2 filesystems to work and
 not reboot
 itself.
   
 
 In both the cases node 0 will survive, because that's
 the node that has
 lowest node number (defined in cluster.conf). This applies
 to the
 scenario where interconnect went down but nodes are healthy
 and are
 heartbeating to the disk.
 
 
   
   

   
   
   I
 guess I just need to know if it’s possible to have a
 2
 node OCFS2 cluster that will cope with either one of the
 nodes dying,
 and have
 the remaining node still provide service.
   
 
 If node 0 itself panics, reboots then node 1 will survive.
 
 
   
   
 
   
   
   Regards, 
   
   
   Mark

   
   
   
   
   From:
 Srinivas Eeda [mailto:srinivas.e...@oracle.com]
 
 
   Sent: 16 November 2009 14:57
 
   To: Thompson, Mark
 
   Cc: ocfs2-users@oss.oracle.com
 
   Subject: Re: [Ocfs2-users] 2 node OCFS2
 clusters 
   
   
  
   In a cluster with more than 2 nodes,
 if a
 network on one
 node goes down, that node will evict itself but other nodes
 will
 survive. But
 in a two node cluster, the node with lowest node number
 will survive no
 mater
 on which node network went down.
 
   
 
 thanks,
 
 --Srini
 
   
 
 Thompson, Mark wrote:  
   Hi, 
   This is my
 first post here
 so please be
 gentle
 with me. 
   My question is,
 can you
 have a 2 node OCFS2 cluster, disconnect one node from the
 network, and
 have the
 remaining node continue to function normally? Currently we
 have a 2
 node
 cluster and if we stop the NIC that has the OCFS2 o2cb net
 connection
 running
 on it, the other
 node
 will reboot itself. I
 have researched having a 2 node OCFS2 cluster but so far I
 have been
 unable to
 find a clear solution. I have looked at the FAQ regarding
 quorum,
   and my OCFS2 init
 scripts are
 enabled etc. 
   Is this
 possible, or should
 we look at alternative solutions? 
   Regards, 
   Mark 
   This
 e-mail
 has come from Experian, the only business to have been
 twice named the
 UK's
 'Business of the Year’  
     
   
 ===
 
   Information
 in this e-mail and any attachments is confidential, and may
 not be
 copied or
 used by anyone other than the addressee, nor disclosed to
 any third
 party
 without our permission. There is no intention to create any
 legally
 binding
 contract or other binding commitment through the use of
 this electronic
 communication
 unless it is issued in accordance with the Experian Limited
 standard
 terms and
 conditions of purchase or other express written agreement
 between
 Experian
 Limited and the recipient.  
   Although
 Experian has taken reasonable steps to ensure that this
 communication
 and any
 attachments are free from computer virus, you are advised
 to take your
 own
 steps to ensure that they are actually virus free.
  
   Companies
 Act
 information: 
   Registered
 name: Experian Limited  
   Registered
 office: Landmark House, Experian Way, NG2 Business Park,
 Nottingham,
 NG80 1ZZ,
 United Kingdom 
   Place
 of
 registration: England and Wales  
   Registered
 number: 653331 
     
  
   
 
 
   
  
  
 ___ 
   Ocfs2-users mailing list 
   Ocfs2-users@oss.oracle.com
 
   http://oss.oracle.com/mailman/listinfo/ocfs2-users
 
   
 
  
 
 
 -Inline Attachment Follows-
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Heartbeat Timeout Threshold

2009-08-10 Thread Luis Freitas
Bret,

   These are my two cents on this subject. If I am not mistaken, there are two 
heartbeats, one on the network and one on the disk. Failure on either one of 
them will cause a node to be evicted.

If you have network bonding, depending on your configuration and the 
network topology, when one path fails the switches might have to reconfigure 
the path to that MAC address, and that could take some time. This can be 
reduced forcing arp broadcasts on the new path, so the network equipment 
between your servers can reconfigure itself faster.

For the disk heartbeat, assuming that:

- You have dual FC cards on each server
- You have dual FC switches connected to each other
- You have a storage with two or more FC ports, connected to the switches.

   You have a FC card timeout (Probably set on the HBA firmware or on the 
driver), the multipath timeout, and a storage controller timeout.

   Each of these need to be smaller than your cluster heartbeat timeout in 
order for all the nodes to survive a component failure. For example, an EMC 
storage has two internal controllers (SP) and a SP failover in the order of two 
minutes. 

   During this time the LUNs that are routed through the failed SP will be 
unresponsive, and the FC card on your server will not report this to the O/S 
until its timeout is reached. After the failover inside the EMC storage, the 
multipath software on your server will also need to establish the new path to 
the LUN using the surviving SP. 

Best Regards,
Luis Freitas



--- On Fri, 8/7/09, Brett Worth br...@worth.id.au wrote:

 From: Brett Worth br...@worth.id.au
 Subject: [Ocfs2-users] Heartbeat Timeout Threshold
 To: ocfs2-users@oss.oracle.com
 Date: Friday, August 7, 2009, 9:29 PM
 I've been using OCFS2 on a 3 way
 Centos 5.2 Xen cluster for a while now using it to share
 the VM disk images.   In this way I can have
 live and transparent VM migration.
 
 I'd been having intermittent (every 2-3 weeks) incidents
 where a server would self fence.
  After configuring netconsole I managed to see that the
 fencing was due to a heartbeat
 threshold timeout so I have now increased all three servers
 to have a threshold of 61 i.e.
 2 minutes from the default 31 i.e. 1 minute.  So far
 there have been no panics.
 
 I do have a couple of questions though:
 
 1. To get this timeout applied I had to have a complete
 cluster outage so that I could
 make all the changes simultaneously.  Making the
 change to single node prevented it from
 joining in the fun.  Do all parameters really need to
 match before you can join?  The
 timeout threshold seems to be one that could differ from
 node to node.
 
 2. Even though this appears to have fixed the problem, 2
 minutes is a long time to wait
 for a heartbeat.  Even one minute seems like a very
 long time.  I assume that missing a
 heartbeat would be a symptom of a very busy filesystem but
 for a packet to take over a
 minute to get over the wire is odd.  Or is it that the
 heartbeats are actually being lost
 for an extended period?  Is this a network
 problem?  All my nodes communicate heartbeat on
 a dedicated VLAN.
 
 Regards
 Brett
 PS:  If anyone is planning to do Xen like this my main
 piece of advice is that you must
 put a ceiling on how much RAM the Dom0 domain can
 use.  If you don't it will expand to use
 all non-vm memory for buffer cache so that when you try to
 do a migration to it there is
 no ram left.
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 hosting and running binaries

2009-06-09 Thread Luis Freitas

Saul,

   OCFS2 1.2 doesnt has support for indexed directories. So you will need to 
have a cleaning procedure for the database dump directories to keep the 
quantity of log files reasonable. Unless you dont mind a ls hanging when you 
try to find a trace. I am not sure if OCFS 1. has support for indexed 
directories on the enterprise tree.

   Also when sharing database binaries there is a need to use a special type of 
file that is different on each node for certain configuration files . The OCFS2 
1.2 enterprise has support for this. This was necessary on 10g, not sure 
about 11g.

There are some files that need to be outside of OCFS2 1.2 so you will need 
to create some symlinks for each database too. And the mount options for 
database files and binaries/traces are different, so you will need to plan your 
filesystems in a way to keep datafiles, logfiles, archivelogs separated from 
your binaries and trace/dump locations. 

One important point is that when you do this you will be unable to do a 
rolling patch application. Depending on your availability requirements this 
could be important.

Also you introduce a single point of failure. If someone accidentaly damage 
or delete a important database or configuration file the entire cluster will 
fail. When you have non-shared binaries filesystems one would expect the other 
nodes to continue operating if this happens.

Regards,
Luis

--- On Tue, 6/9/09, Sunil Mushran sunil.mush...@oracle.com wrote:

 From: Sunil Mushran sunil.mush...@oracle.com
 Subject: Re: [Ocfs2-users] OCFS2 hosting and running binaries
 To: Saul Gabay sa...@herbalife.com
 Cc: Server Ops_Linux serverops_li...@herbalife.com, 
 ocfs2-users@oss.oracle.com
 Date: Tuesday, June 9, 2009, 5:20 PM
 Sure. One can use ocfs2 to host
 almost anything. The one exception
 is the crs_home. crs_home needs to be on a local volume.
 
 OCFS 1.2/1.4 has two limits. Like ext3, the number of
 sub-directories in _a_
 directory cannot exceed 32000. (There is no limit to the
 number of subdirs
 in a volume.) The other limit is the volume size. Currently
 max is 16T.
 There is no limit to the number of files in a volume. (The
 two limits have
 been relaxed in mainline for few kernel versions.)
 
 As far as performance goes, I have yet to see a benchmark
 that shows ocfs2
 slower than gfs/gfs2.
 
 For certification, please check metalink.
 
 Sunil
 
 Saul Gabay wrote:
 
  We are currently using OCFS2 to host multiple Oracle
 10g RAC databases 
  on Itanium servers running Redhat AS 4.7, we are
 running this OCFS2 
  version so far with no issues
 
   
 
  ocfs2-2.6.9-78.0.13.EL-1.2.9-1.el4
 
   
 
  We would like to use OCFS2 to host binaries files for
 the database and 
  / or application.
 
   
 
  This will be 4 active nodes mounting an OCFS2
 formatted LUN through iSCSI.
 
   
 
  What are the issues, caveats or things we need to be
 aware if we take 
  this approach.
 
   
 
  Like, is there a limit on the number of files or
 directories hosted on 
  OCFS2?
 
   
 
  Are there a performance issue / degradation in
 comparison with GFS 
  hosting binaries files?
 
   
 
  What are the good, bad and ugly of OCFS2 in comparison
 with GFS 
  hosting binaries files?
 
   
 
  Is OCFS2 certified by Oracle to run
 database/application binaries?
 
   
 
  Please advice what is your experience on this topic,
 it will be 
  greatly appreciated.
 
  /*/ /*/
 
  /*/Saul/*/
 
 
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]

2009-05-20 Thread Luis Freitas

Robin,

   To me, anyone else includes the kernel of the current node.

   Well, if it is unclear the man page should be revised. Also a big warning 
message on ocfs2.fsck would be nice, after all we all make mistakes. But this 
is only my two cents.

   Running fsck on any journaled filesystem will replay the journal. This will 
cause corruption if the filesystem is mounted read/write, even if the 
filesystem was not corrupted on the first place. 

   You could mount it read only, but you risk getting a kernel panic when the 
filesystem suddenly changes if fsck corrects something. I am not aware of any 
filesystem that can withstand a online fsck. Sun ZFS can do online correction, 
but it doesnt have a fsck tool.

Regards,
Luis   

--- On Tue, 5/19/09, Robin Garner robin.gar...@scu.edu.au wrote:

 From: Robin Garner robin.gar...@scu.edu.au
 Subject: Re: [Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with 
 fsck.ocfs2]
 To: ocfs2-users@oss.oracle.com
 Date: Tuesday, May 19, 2009, 11:05 PM
 Joel Becker wrote:
  On Tue, May 19, 2009 at 02:49:31PM +1000, Robin Garner
 wrote:
  Robin Garner wrote:
  Yes.  This is a 24/7 application (at
 least during semester), and 
  arranging extended downtime is a challenge.
  
      Ok, you ran fsck against a live
 filesystem and skipped the
  cluster locking with the '-F' option.  So now you
 have two problems.
  
  1) The original directory problem.
  2) The duplicate blocks created by your fsck of a
 mounted filesystem.
  
      Do you have backups?
  
  Joel
  
 
 OK, now I'm confused:
 
 The man page for fsck.ocfs2 says
 
         -F 
    Usually fsck.ocfs2 will check with
 cluster
            
    services  and the DLM to make sure
 that no
            
    one else in the cluster is actively 
 using
            
    the  device  before 
 proceeding.  -F skips
            
    this check and should only be used when
 it
            
    can  be  guaranteed 
 that  there can be no
            
    other users of the device while
 fsck.ocfs2
            
    is running.
 
 To me  my colleagues no one else in the cluster is
 actively using the 
   device means that the filesystem must be mounted on
 *at most* one 
 node in the cluster (the node doing the fsck).  That's
 what we did.
 
 This filesystem is normally mounted by both nodes of a
 2-node cluster. 
 We had cleanly unmounted the filesystem on the other
 node.  fsck.ocfs2 
 without '-F' gave errors, but then mounted.ocfs2 claimed
 the disk was 
 mounted on both nodes.  Eventually we shut down the
 other node, and 
 mounted.ocfs2 still thought it had it mounted.  At
 this point we used '-F'.
 
 I can't see any reference in the man page about not doing
 an fsck on a 
 mounted disk.
 
 e2fsck for example says this:
 
   WARNING!!! Running e2fsck on a mounted file system
 may cause
   SEVERE filesystem damage.
  
   Do you really want to continue (y/n)?
 
 when you try to fsck a mounted filesystem.  May I
 suggest that 
 fsck.ocfs2 do something similar ?  Perhaps 'everyone
 knows' you can't 
 run fsck on a mounted filesystem, but we were assuming that
 ocfs2 being 
 a modern cluster filesystem might be a little more
 advanced.  Apparently 
 not.
 
 We'll try to salvage the data another way (we believe the
 directory 
 corruption is some way down the directory tree), and pull
 missing data 
 back from backups.
 
 Robin
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Encountered disk I/O error 19502

2009-04-06 Thread Luis Freitas

Diane,

   Are you using ASM and OCFS2? Some of the log messages point to a disk group.

   Can you post a copy of your /etc/fstab with the mount options?

Regards,
Luis


--- On Mon, 4/6/09, Diane Petersen diane_peter...@yahoo.com wrote:

 From: Diane Petersen diane_peter...@yahoo.com
 Subject: Re: [Ocfs2-users] Encountered disk I/O error 19502
 To: Karim Alkhayer kkha...@gmail.com, ocfs2-users@oss.oracle.com
 Date: Monday, April 6, 2009, 1:42 PM
 Hi,
 
 We already have TAF implemented, unfortunately that
 doesn't help. It suppose TAF might help if the instance
 was terminated, but that's not happening instead it
 terminates these individual sessions directly.
 
 This happens on both nodes during writes to the OCFS2
 partition at random times but never at the same time. There
 is nothing else in the db alert log or crs logs other than
 what I've included below.
 
 Thanks,
 Diane Petersen
 ServerCare, Inc.
 
 
 
 
 
 From: Karim Alkhayer kkha...@gmail.com
 To: Diane Petersen diane_peter...@yahoo.com;
 ocfs2-users@oss.oracle.com
 Sent: Monday, April 6, 2009 9:11:06 AM
 Subject: RE: [Ocfs2-users] Encountered disk I/O error 19502
 
  
 Hello Diane,
  
 I believe that implementing TAF could help a bit in this
 case, at
 least to become transparent to the end users, unless of
 course, the following
 points are blocking in your case:
  
 1.   ALTER SESSION statements are lost:  
 Statements such as ALTER
 SESSION ... are not automatically re-issued to the
 server following a
 failover. This can have a significant effect on application
 behavior. For
 example: 
 ALTER SESSION
 SET NLS_DATE_FORMAT='-MM-DD';
 select sysdate
 from dual;
 Result
 2009-01-31
  Fail
 over the connection 
 select sysdate
 from dual;
 Result
 31-JAN-09
 2.   In-progress transactions must be rolled back 
 3.   Continuing work on existing cursors may raise an
 error (eg:
 ORA-25401 cannot continue fetches) 
 4.   Failed over selects may take time to re-position
 (when FAILOVER_TYPE=SELECT) 
 5.   Client awareness of a Failover
  
 Can we have an overview of the database setup, nature of
 transactions, and parameters?
  
 It would also help to examine the troublesome node behavior
 and
 recovery measures.
  
 Best regards,
 Karim Alkhayer
  
 From:ocfs2-users-boun...@oss.oracle.com
 [mailto:ocfs2-users-boun...@oss.oracle.com] On
 Behalf Of Diane Petersen
 Sent: Monday, April 06, 2009 4:06 PM
 To: ocfs2-users@oss.oracle.com
 Subject: [Ocfs2-users] Encountered disk I/O error 19502
  
 Hi,
 
 We have a 2-node 11g RAC database running OCFS2 1.4.1-1.el5
 with Linux kernel
 2.6.18-92.1.17.el5 64-bit. Lately we've been seeing
 errors on both nodes almost
 ever other day. The system administrator has checked the
 SAN array and said
 there are no issues being reported. 
 
 Another part of the problem, it appears the instances alter
 the service_names
 parameter not allowing new connections to the node with the
 reported error, but
 also terminate sessions already connected using the RAC
 service. The errors all
 start with - Encountered disk I/O error 19502 - and contain
 the following:
 ARC2: Encountered disk I/O error 19502
 
 
  (ifxdb2)
 
 
 Errors in file
 /u01/app/oracle/diag/rdbms/ifxdb/ifxdb2/trace/ifxdb2_arc2_15414.trc:
 
 
 ORA-19502: write error on file
  /u03/arch/2_1917_656008464.dbf, block number
 155649 (block size=512)
 
 
 ORA-27072: File I/O error
 
 
 Linux-x86_64 Error: 5: Input/output error
 
 
 Additional information: 4
 
 
 Additional information: 155649
 
 
 Additional information: -1
 
 
 ORA-19502: write error on file
 /u03/arch/2_1917_656008464.dbf, block number
 155649 (block size=512)
 
 
 Errors in file
 /u01/app/oracle/diag/rdbms/ifxdb/ifxdb2/trace/ifxdb2_arc2_15414.trc:
 
 
 ORA-19502: write error on file
 /u03/arch/2_1917_656008464.dbf, block number
 155649 (block size=512)
 
 
 ORA-27072: File I/O error
 
 
 Linux-x86_64 Error: 5: Input/output error
 
 
 Additional information: 4
 
 
 Additional information: 155649
 
 
 Additional information: -1
 
 
 ORA-19502: write error on file
 /u03/arch/2_1917_656008464.dbf, block number
 155649 (block size=512)
 
 
 ARC2: I/O error 19502 archiving log 10 to
 '/u03/arch/2_1917_656008464.dbf'
 
 
 ARCH: Archival stopped, error occurred. Will continue
 retrying
 
 
 ORACLE
  Instance ifxdb2 - Archival Error
 
 
 ORA-16038: log 10 sequence# 1917 cannot be archived
 
 
 ORA-19502: write error on file , block number 
 (block size=)
 
 
 ORA-00312: online log 10 thread 2:
 '+REDO1/ifxdb/onlinelog/group_10.265.656605479'
 
 
 Errors in file
 /u01/app/oracle/diag/rdbms/ifxdb/ifxdb2/trace/ifxdb2_arc2_15414.trc:
 
 
 ORA-16038: log 10 sequence# 1917 cannot be archived
 
 
 ORA-19502: write error on file , block number 
 (block size=)
 
 
 ORA-00312: online log 10 thread 2:
 '+REDO1/ifxdb/onlinelog/group_10.265.656605479'
 
 
 Sun Apr 05 15:05:16 2009
 
 
 ALTER SYSTEM SET
 service_names='ifxdb.gointranet.com' SCOPE=MEMORY
 

Re: [Ocfs2-users] OCFS2 FS with BACKUP Tools/Vendors

2009-04-02 Thread Luis Freitas


   We use Veritas Netbackup 6.0 (With MP7) to backup some export files on a 
OCFS2 filesystem, didnt have any issues so far but we didnt test it too much 
either.

   OCFS2 can be regarded as a regular file system, unlike OCFS1 which could not 
be accessed without a special version of cp and tar. But I doubt Veritas would 
directly support any issues we eventually find when backing up (or restoring) 
files from OCFS2. 

   If you really want support you might need to copy the files from OCFS2 with 
operating system commands to a ext3 filesystem and then copy from there. 

Regards,
Luis

--- On Thu, 4/2/09, Daniel Keisling daniel.keisl...@ppdi.com wrote:

 From: Daniel Keisling daniel.keisl...@ppdi.com
 Subject: Re: [Ocfs2-users] OCFS2 FS with BACKUP Tools/Vendors
 To: Bumpass, Brian brian.bump...@wachovia.com, ocfs2-users@oss.oracle.com
 Date: Thursday, April 2, 2009, 5:22 PM
 I use HP Data Protector.  OCFS2 is supported in v6.0.
 
 
 
 
   From: ocfs2-users-boun...@oss.oracle.com
 [mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of
 Bumpass, Brian
   Sent: Thursday, April 02, 2009 1:31 PM
   To: ocfs2-users@oss.oracle.com
   Subject: [Ocfs2-users] OCFS2 FS with BACKUP Tools/Vendors
   
   
 
   My apology up front if this has been discussed already.  
 I've
 reviewed the archives back to Nov. 2005 and found little of
 anything.
 

 
   I need some information concerning support for OCFS2 by
 backup
 products.  Currently we use IBM/Tivoli's TSM tool.   
 They don't support
 OCFS2 filesystems.  And it looks like they have no intent
 to supporting
 the FS in next releases.Note... They do support their
 own NAS FS,
 GPFS.  But this costs extra.
 

 
   Additionally, the small testing I have done is that a file
 under
 an OCFS2 FS backs up and recovers quite nicely.I have
 not tested
 using ACL-lists.  But don't really care about those.  
 This issue comes
 down to support.   
 

 
   So... I guess what I am looking for is some indication of
 what
 the user community with OCFS2 and doing backups has been
 along similar
 issues.
 

 
   Sorry... The environment being supported is SLES 10 SP2
 64-bit
 on DELL  HP hardware.
 

 
   Thanks in advance,
 
   -B
 

 

 
 
 __
 This email transmission and any documents, files or
 previous email
 messages attached to it may contain information that is
 confidential or
 legally privileged. If you are not the intended recipient
 or a person
 responsible for delivering this transmission to the
 intended recipient,
 you are hereby notified that you must not read this
 transmission and
 that any disclosure, copying, printing, distribution or use
 of this
 transmission is strictly prohibited. If you have received
 this transmission
 in error, please immediately notify the sender by telephone
 or return email
 and delete the original transmission and its attachments
 without reading
 or saving in any manner.
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ASM over OCFS2 vs. Standard locally managed tablespaces

2009-02-09 Thread Luis Freitas
Karim,

   I dont see why run ASM over OCFS2. It seems to be a useless overhead. Either 
you run ASM or OCFS2.

   Btw, neither ASM nor OCFS2 are smart enough to detect that some LUNs are 
faster than others. ASM expects each diskgroup to be comprised of LUNs of 
similar performance in order for it's load balancing algorithms to work. OCFS2, 
as far as I know doesnt have this type of management built in.

See:
http://www.oracle.com/technology/products/database/asm/pdf/take%20the%20guesswork%20out%20of%20db%20tuning%2001-06.pdf
Section: ASM Best practices and principals.

   About the performance, ASM is said to have similar performance to raw 
devices in a SAME layout, being tightly integrated to Oracle. OCFS2 has some 
overheads that are inherent to a file system, like cache management, locking, 
context switching, so it is likely to use more CPU power than ASM. But I dont 
remember any specific benchmark comparing those. 

    Also, keep in mind that when you use a filesystem you are using part of the 
memory for the filesystem cache. When using RAW or ASM you would need to 
allocate this memory to the block buffer in order to compare results.

Regards,
Luis

 Hello All,



 Are there any benchmarks with respect to performance with respect to  
 ASM over OCFS2 vs. standard locally managed tablespaces?

 In our environment, data files hosting tables/lobs are stored on a  
 RAID6 disk array with 10K rpm disks, whilst indices are  stored on a  
 different RAID6 disk array with 15K rpm disks.

 We’re using oracle managed files for the rollback/undo and temporary 
  tablespaces.

 Would ASM over OCFS2 be smart enough to detect the fast LUNs?



 Appreciate your thoughts.



 Best regards,

 Karim

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ASM over OCFS2 vs. Standard locally managed tablespaces

2009-02-09 Thread Luis Freitas
Karim,

  This is one big environment.

  I dont see how ASM over OCFS2 would give easier administration than only ASM 
or only OCFS2. The only situation I see that is reasonable to use ASM over a 
cooked filesystem would be when using a NAS device that doesnt support direct 
block access. Also I dont understand why you say that RAC needs a shared 
filesystem. When you use ASM you dont need to have a shared filesystem.

   If you go ASM, you will need to install the cluster services on each server 
that shares a ASM diskgroup, even if it has no RAC databases. The same goes to 
OCFS2, you will need to install the OCFS2 services on each server that shares a 
OCFS2 filesystem. If you do both, you will have to install both.

   ASM has some interesting storage like features, for example extended 
clusters and online disk reorganization. You can do some of these with OCFS2. 
For example, adding a disk. But try to remove a OCFS2 volume with the database 
online and not disrupt your users. ASM can do that.

   On the other hand ASM is less transparent. You have little control on how 
the data is layout, and the only tool to manage files is a ftp like client, 
that you need to use to delete dangling files or if you need to backup 
something manually. Database backups would usually need to go through RMAN. On 
OCFS2 you can use standard operating system commands to manage the datafiles. 
ASM also has no recovery tools, like fsck.

Regards,
Luis

--- On Mon, 2/9/09, Karim Alkhayer kkha...@gmail.com wrote:
From: Karim Alkhayer kkha...@gmail.com
Subject: RE: [Ocfs2-users] ASM over OCFS2 vs. Standard locally managed 
tablespaces
To: lfreita...@yahoo.com, ocfs2-users@oss.oracle.com
Date: Monday, February 9, 2009, 10:47 AM




 
 






We’re using OCFS2 for RAC on top of  SLES9, which we’re
going to upgrade to SLES10. Around 10 TB RAID6 multi disk arrays, 5 databases
on RAC, and 5 single instances standby for the primary site 

   

As there is no AI component in ASM to detect the fast LUNs, and
RAC on SLES requires a shared file system. Therefore, on a set of identical
LUNs, in terms of capacity and speed, ASM should take care of distributing the 
balance
over LUNs, and OCFS2 is expected to work even better if these LUNs are placed
on several disk groups (arrays) 

   

How would this scenario (ASM over OCFS2) work? What are the cons
and pros? Keep in mind that the goal of such a concept is provide performance
and reliability with  the least possible administration 

   

Appreciate your thoughts 

   

Best regards, 

Karim 

   

   



From:
ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] 
On
Behalf Of Luis Freitas

Sent: Monday, February 09, 2009 2:16 PM

To: ocfs2-users@oss.oracle.com

Subject: Re: [Ocfs2-users] ASM over OCFS2 vs. Standard locally managed
tablespaces 



   


 
  
  Karim,

  

     I dont see why run ASM over OCFS2. It seems to be a useless
  overhead. Either you run ASM or OCFS2.

  

     Btw, neither ASM nor OCFS2 are smart enough to detect that some
  LUNs are faster than others. ASM expects each diskgroup to be comprised of
  LUNs of similar performance in order for it's load balancing algorithms to
  work. OCFS2, as far as I know doesnt have this type of management built in.

  

  See:

  
http://www.oracle.com/technology/products/database/asm/pdf/take%20the%20guesswork%20out%20of%20db%20tuning%2001-06.pdf

  Section: ASM Best practices and principals.

  

     About the performance, ASM is said to have similar performance
  to raw devices in a SAME layout, being tightly integrated to Oracle. OCFS2
  has some overheads that are inherent to a file system, like cache management,
  locking, context switching, so it is likely to use more CPU power than ASM.
  But I dont remember any specific benchmark comparing those. 

  

      Also, keep in mind that when you use a filesystem you are
  using part of the memory for the filesystem cache. When using RAW or ASM you
  would need to allocate this memory to the block buffer in order to compare
  results.

  

  Regards,

  Luis 
  

 Hello All,







 Are there any benchmarks with respect to performance with respect to  

 ASM over OCFS2 vs. standard locally managed tablespaces?



 In our environment, data files hosting tables/lobs are stored on a  

 RAID6 disk array with 10K rpm disks, whilst indices are  stored on a  

 different RAID6 disk array with 15K rpm disks.



 We’re using oracle managed files for the rollback/undo and temporary 

  tablespaces.



 Would ASM over OCFS2 be smart enough to detect the fast LUNs?







 Appreciate your thoughts.







 Best regards,



 Karim



 ___

 Ocfs2-users mailing list

 Ocfs2-users@oss.oracle.com

 http://oss.oracle.com/mailman/listinfo/ocfs2-users

  

___

Ocfs2-users mailing list

Ocfs2-users@oss.oracle.com

http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ocfs2 hangs during webserver usage

2009-01-27 Thread Luis Freitas
David,

  You probably could get away with writing the log files to OCFS2, but in 
different files.

   The access_log gets a small write on each access, and writes will be 
serialized, since no two processes can write to it at the same time.

   So it should be related more to the size of the disk queues and the speed of 
the interconnect than to the speed of the SAN itself. If you keep writing to 
the same file from different nodes, OCFS2 will need to keep flushing pages and 
bouncing write locks from one node to another on each page hit. This is very 
nonscalable, as you got a single serialized write on each page access.

   The extent size would play a role on this also, as Sunil pointed. You could 
check if the extent size is different on your test environment. The mkfs tool 
might have defaulted a larger extent size if the total size of the filesystem 
is larger. (Sunil, correct me on this if I am wrong).

Regards,
Luis
   

--- On Tue, 1/27/09, jmose...@corp.xanadoo.com jmose...@corp.xanadoo.com 
wrote:
From: jmose...@corp.xanadoo.com jmose...@corp.xanadoo.com
Subject: Re: [Ocfs2-users] ocfs2 hangs during webserver usage
To: David Johle djo...@industrialinfo.com
Cc: lfreita...@yahoo.com, ocfs2-users@oss.oracle.com, 
ocfs2-users-boun...@oss.oracle.com
Date: Tuesday, January 27, 2009, 10:32 PM

As others have indicated, I don't think that's going to work very well.
You've got two different nodes trying to write to the same file constantly.
I would keep each server's log on a locally mounted file system, or simply
keep the logs on the OCFS2 filesystem, but have each node write to
different log files.

Yeah, that makes parsing access_logs slightly more of a problem for
producing hit reports, etc, but I think you'll notice performance improve.


James Moseley




   
 David Johle   
 djo...@industria 
 linfo.com To 
 Sent by:  lfreita...@yahoo.com
 ocfs2-users-bounc  cc 
 e...@oss.oracle.com ocfs2-users@oss.oracle.com  
   Subject 
   Re: [Ocfs2-users] ocfs2 hangs   
 01/27/2009 04:38  during webserver usage  
 PM
   
   
   
   
   




Yes, that is the case, multiple nodes with the same log file open 
being written to at once.

Worked well during all my testing environment stress tests, and even
worked great in production for over a month.



At 02:56 PM 1/27/2009, Luis Freitas wrote:
David,

 You said you were keeping the apache log files on OCFS2.

  Are you using the same log file (access_log and error_log) for
 all the nodes? That is a single access_log that is writen by both
 nodes simultaneosly?

Regards,
Luis


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users





  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ocfs2 hangs during webserver usage

2009-01-27 Thread Luis Freitas
Sunil,

  Hmm, I thought you meant the cluster size.

Regards,
Luis 

--- On Wed, 1/28/09, Sunil Mushran sunil.mush...@oracle.com wrote:
From: Sunil Mushran sunil.mush...@oracle.com
Subject: Re: [Ocfs2-users] ocfs2 hangs during webserver usage
To: lfreita...@yahoo.com
Cc: ocfs2-users@oss.oracle.com
Date: Wednesday, January 28, 2009, 1:08 AM

Luis Freitas wrote:
The extent size would play a role on this also, as Sunil pointed. You
could check if the extent size is different on your test environment. The mkfs
tool might have defaulted a larger extent size if the total size of the
filesystem is larger. (Sunil, correct me on this if I am wrong).

The extend size I was referring to is not related to any ondisk size.
It depends on the location the write() is issued to. Say you have a
100K sized file and you issue a write() at offset 1G. If the fs supports
sparse files, it will allocate (and init) blocks only around the 1G location,
leaving a hole in between. A non-sparsefile supporting fs, on the other hand,
will need to allocate (and init) the entire space between 100K and 1G.




  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2 replication

2009-01-23 Thread Luis Freitas
Brian,

   We have EMC Mirrorview here, but it doesnt allow an active-active mirror, we 
mount only the primary lun on both servers, and the mirror lun needs to be 
promoted to primary, and then mounted in case of a disaster recovery. We have 
a script to do that.

Regards,
Luis

--- On Thu, 1/22/09, Brian Kroth bpkr...@gmail.com wrote:
From: Brian Kroth bpkr...@gmail.com
Subject: Re: [Ocfs2-users] OCFS2 replication
To: CA Lists li...@creativeanvil.com
Cc: ocfs2-users@oss.oracle.com ocfs2-users@oss.oracle.com
Date: Thursday, January 22, 2009, 9:35 PM

Our iscsi San (equallogic) does block level replication so we were thinking of 
trying to set something up soon so that we could have some nodes in another 
building connected via fiber to provide site level failover.  I'll report back 
our experiences when we do that, but I imagine it would be similar to drdb with 
nice interconnect.
Brian
On Jan 22, 2009, at 2:08 PM, CA Lists li...@creativeanvil.com wrote:



Can't say
I've replicated it between two sites, but definitely between two
physical servers. I used drbd in my particular case. Here's a small
blog entry I put together a while back about what I did. Hopefully it's
helpful:



http://www.creativeanvil.com/blog/2008/how-to-create-an-iscsi-san-using-heartbeat-drbd-and-ocfs2/










Joe Koenig



Creative Anvil, Inc.
Phone: 314.692.0338
1346 Baur Blvd.
Olivette, MO 63132
j...@creativeanvil.com
http://www.creativeanvil.com





David Schüler wrote:

  
  

  
  What about drbd?
  

  
  Von:
ocfs2-users-boun...@oss.oracle.com
[mailto:ocfs2-users-boun...@oss.oracle.com] Im Auftrag von Garcia,
Raymundo

  Gesendet: Donnerstag, 22. Januar 2009 20:46

  An: ocfs2-users@oss.oracle.com

  Betreff: Re: [Ocfs2-users] OCFS2 replication

  

  
  
  RSYNC is
not real time…. any other suggestion…? I treid RSYNC already… 
     
  
  
  From:
ocfs2-users-boun...@oss.oracle.com
[mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of Sérgio
Surkamp

  Sent: Thursday, January 22, 2009 12:32 PM

  Cc: ocfs2-users@oss.oracle.com

  Subject: Re: [Ocfs2-users] OCFS2 replication 
  
  
     
  Try rsync.

  

Garcia, Raymundo wrote:  
  Hello… I am trying to replicate a OCFS2
filesystem in a site A to another OCFS@ based partition in another site
B … I have tried several products, inmage, steeleye.. etc without any
luck.. those programs help me to replicate the filesystem but nnot the
OCFS2 mounted …    I assume that this is because that most software
based replication system work on the block level instead of the file
leve… I wonder is anyone have tried to replicate OCFS2 between 2 sites…. 
    
  Thanks 
    
  Raymundo Garcia 
    
    
    
     
  
  
  The
information contained in this message may be confidential and legally
protected under applicable law. The message is intended solely for the
addressee(s). If you are not the intended recipient, you are hereby
notified that any use, forwarding, dissemination, or reproduction of
this message is strictly prohibited and may be unlawful. If you are not
the intended recipient, please contact the sender by return e-mail and
destroy all copies of the original message.

  

   
     
  
  
     
  ___ 
  Ocfs2-users mailing list 
  Ocfs2-users@oss.oracle.com 
  http://oss.oracle.com/mailman/listinfo/ocfs2-users 
  Regards, 
  
  --  
    .:':. 
  .:'    ` Sérgio Surkamp | Gerente de Rede 
  ::       ser...@gruposinternet.com.br 
  `:.    .:' 
    `:,   ,.:' *Grupos Internet S.A.* 
      `: :'    R. Lauro Linhares, 2123 Torre B - Sala 201 
   : : Trindade - Florianópolis - SC 
   :.' 
   ::  +55 48 3234-4109 
   : 
   '   http://www.gruposinternet.com.br 
  
  
  

  



Virus checked by G DATA AntiVirusKit

Version: AVF 19.226 from 18.01.2009

Virus news: www.antiviruslab.com

  
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Filesystem Block Size w// DB_BLOCK_SIZE

2008-12-20 Thread Luis Freitas
Karin,

   Even if they wont upgrade the entire system, I see no reason to not upgrade 
the kernel.

   On Suse, upgrading the kernel would upgrade the OCFS2 version that is being 
used. And if there is trouble you could revert to the older kernel.

Regards,
Luis


--- On Sat, 12/20/08, Karim Alkhayer kkha...@gmail.com wrote:

 From: Karim Alkhayer kkha...@gmail.com
 Subject: RE: [Ocfs2-users] Filesystem Block Size w// DB_BLOCK_SIZE
 To: 'Sunil Mushran' sunil.mush...@oracle.com
 Cc: lfreita...@yahoo.com, ocfs2-users@oss.oracle.com
 Date: Saturday, December 20, 2008, 8:57 AM
 Folks,
 
 We're using two StoreWay FDA storage with
 around 5 TB on each. 
 The connection from the servers to the storage is through
 fiber channel
 RAID 6 is used with 7400RPM for non-index data files
 whereas 10K RPM is for
 the indices
 ASM is not used, the tablespaces are sized and organized
 based on the
 business requirements
 
 The platform provider is a hesitating to upgrade SLES from
 SP3 to SP4, so
 that we can benefit from the latest possible version of
 OCFS2. Therefore,
 any potential enhancement for the current scenario would be
 the best chance
 for now
 
 Appreciate your thoughts
 
 Cheers,
 Karim
 
 
 -Original Message-
 From: Sunil Mushran [mailto:sunil.mush...@oracle.com] 
 Sent: Friday, December 19, 2008 11:57 PM
 To: Karim Alkhayer
 Cc: lfreita...@yahoo.com; ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] Filesystem Block Size w//
 DB_BLOCK_SIZE
 
 True.
 
 The only point I would like to add is that you are using a
 2+ yr
 old version of the fs. You should upgrade to atleast SLES9
 SP4.
 
 Luis Freitas wrote:
  Karim,
 
This is not OCFS2 related, it is more related to the
 disk hardware
 capabilities and how it works.
 
That will depend on your OS, HBAs and storage, and
 the workload.
 
There is a maximum queue depth associated with each
 LUN, so if you use
 several LUNs on the same device, you could achieve more
 outstanding scsi
 commands open on the controller. But each port will also
 have a maximum
 queue depth that cannot be excedeed, so at some point using
 extra LUNs wont
 give you this advantage.
 
If the storage/disk have more outstanding requests
 it could provide a
 better performance by reordering them to provide a larger
 overall
 throughput, given that the storage hardware supports this.
 Probably it
 supports it, since even low end SATA disks supports
 reordering nowadays.
 
On the other hand the database (Or ASM for that
 matter) has no ideia
 that these luns are from the same device, so it will spread
 the data evenly
 accross them, and your data will end up scattered accross
 the disk, instead
 of concentrated at the start of the disks. This will bring
 a performance
 penalty, most noticeable on full table scan operations
 since they read the
 data sequentially from the start to the end. If you tune
 the tables extent
 size you can work around this problem.
 
  Regards,
  Luis
 
  --- On Thu, 12/18/08, Karim Alkhayer
 kkha...@gmail.com wrote:

  From: Karim Alkhayer kkha...@gmail.com
  Subject: [Ocfs2-users] Filesystem Block Size w//
 DB_BLOCK_SIZE
  To: ocfs2-users@oss.oracle.com
  Date: Thursday, December 18, 2008, 9:20 PM
  Hello All,
 
  We're hosting DB1 and DB2  with db_block_size 
 set to 
  8K, 16K respectively
 
  File system creation is done with mkfs.ocfs2 -b 4K
 -C 32K
  -N 4 -L   LABLE
  /dev/mapper/
 
  Mount is done with:  ocfs2 
 _netdev,datavolume,nointr 0 0
 
  I'd like to know if we can separate most of
 the
  tablespaces on different
  LUNs, even if they're on the same disk group
 sometimes,
  is it possible to
  gain better performance? Is the impact limited to
 the time
  of creating the
  tablespaces only (assuming they're pre-sized
 properly)?
 
  Current OCFS2 version is 1.2.1
 
  Current OCFS2 components:
  ocfs2-tools-1.1.4-0.5
  ocfs2console-1.1.4-0.5
 
  # uname -r
  Kernel 2.6.5-7.257-default
 
  # cat /etc/SuSE-release
  SUSE LINUX Enterprise Server 9 (ia64) VERSION = 9
  PATCHLEVEL = 3
 
  Oracle 10.1.0.5
 
  Appreciate your input 
 
  Best regards,
 
  Karim
 


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] hardware needed for OCFS

2008-12-19 Thread Luis Freitas

  If your NAS can do iSCSI you could use it to provide a shared block device.

  The performance wont be as good as a SAN as the data has to go through the 
kernel TCP/IP stack, it can be comparable if you have a iSCSI IP accelerator 
board that simulates a hba.

Regards,
Luis


--- On Fri, 12/19/08, David Coulson da...@davidcoulson.net wrote:

 From: David Coulson da...@davidcoulson.net
 Subject: Re: [Ocfs2-users] hardware needed for OCFS
 To: Pete Kay pete...@gmail.com
 Cc: ocfs2-users@oss.oracle.com
 Date: Friday, December 19, 2008, 9:57 AM
 OCFS2 requires a block device (local disk, direct attached,
 SAN), so 
 it's not going to work with a NAS (Samba, NFS, etc)
 mount. You can, of 
 course, share a OCFS2 filesystem using Samba or NFS.
 
 Pete Kay wrote:
  Hi,
 
  Does OCFS require NAS hardware to run or does normal
 PC hard disk work?
 
  Thanks,
  Pete
 
 
 
  ___
  Ocfs2-users mailing list
  Ocfs2-users@oss.oracle.com
  http://oss.oracle.com/mailman/listinfo/ocfs2-users
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] NFS Failover

2008-12-10 Thread Luis Freitas

   I found some IBM papers on GPFS that indicates they have a working solution 
for this.

   They seem to use a standard feature of the NFS client, NFS lock recovery, 
but the server was modified to initiate this process when one of the nodes die, 
and also to maintain lock coeherence.

   I am pasting text from the one of the papers below:

http://www.redbooks.ibm.com/redpapers/pdfs/redp4400.pdf

The following system prerequisites must be met before you begin the 
installation and configuration:

 A Linux 2.6 kernel

Distributions currently supported are Red Hat Enterprise Linux (RHEL) versions 
4 and 5 and SUSE Linux Enterprise Server (SLES) versions 9 and 10.

 Operating system patches

– If NLM locking is required, a kernel patch that updates the lockd daemon to 
propagate locks to the clustered file system must be applied. This patch is 
currently available at:

http://sourceforge.net/tracker/?atid=719124group_id=130828func=browse/

Depending on the version of SLES you are using, this patch might exist 
partially. If this condition exists, you might need to resolve certain 
conflicts. Contact your support organization if necessary.

– To permit NFS clients to reclaim their locks with a new server after 
failover, the reclaim message from statd must appear to come from the IP 
address of the failing node (and not the node that took over, which is the one 
that will actually send the message).

On SUSE, statd runs in the kernel and does not implement the interface to 
support this requirement (notification only, -N option). Therefore, on SUSE, 
the common NFS utilities (sm-notify in the user space) are needed to implement 
this function.

The patches required for the util-linux package are:

• Support statd notification by name (patch-10113)
http://support.novell.com/techcenter/psdb/2c7941abcdf7a155ecb86b309245e468.html

• Specify a host name for the -v option (patch-10852)
http://support.novell.com/techcenter/psdb/e6a5a6d9614d9475759cc0cd033571e8.html

• Allow selection of IP source address on command line (patch-9617)
http://support.novell.com/techcenter/psdb/c11e14914101b2debe30f242448e1f5d.html/

– For RHEL, use of nfs-utils 1.0.7 is required for rpc.statd fixes. See:
http://www.redhat.com/

   They also use load balancing using DNS round robin (?!), not sure how they 
can make this work.

Regards,
Luis

--- On Tue, 12/9/08, Sunil Mushran [EMAIL PROTECTED] wrote:

 From: Sunil Mushran [EMAIL PROTECTED]
 Subject: Re: [Ocfs2-users] NFS Failover
 To: [EMAIL PROTECTED]
 Cc: ocfs2-users@oss.oracle.com
 Date: Tuesday, December 9, 2008, 5:09 PM
 I forgot about fsid. That's how it identifies the
 device. Yes, it needs
 to be the same.
 
 Yes, the inode numbers are consistent. It is the block
 number
 of the inode on disk.
 
 Afraid cannot help you with failover lockd.
 
 Sunil
 
 Luis Freitas wrote:
  Sunil,
 
 They are not waiting, the kernel reconnects after a
 few seconds, but just dont like the other nfs server, any
 attempt to access directories or files after the virtual IP
 failover to the other nfs server was resulting in errors.
 Unfortunatelly I dont have the exact error message here
 anymore.
 
 We found a parameter on the nfs server that seems
 to fix it, fsid. If you set this to the same number on both
 servers it forces both of them to use the same identifiers.
 Seems that if you dont, you need to guarantee that the mount
 is done on the same device on both servers, and we cannot do
 this since we are using powerpath. 
 
I would like to confirm if the inode numbers are
 consistent accross servers?
 
That is:
 
  [EMAIL PROTECTED] concurrents]$ ls -il
  total 8
  131545 drwxr-xr-x  2  100 users 4096 Dec  9 12:12
 admin
  131543 drwxrwxrwx  2 root dba   4096 Dec  4 08:53
 lost+found
  [EMAIL PROTECTED] concurrents]$
 
 Directory admin (Or other
 directories/files) is always be inode number 131545, no
 mater on what server we are? Seems to be so, but I would
 like to confirm.
 
 
 About the metadata changes, this share will be used
 for log files (Actually, for a Oracle eBusiness Suite
 concurrent log and output files), so we can tolerate if a
 few of the latest files are lost during the failover. The
 user can simply run his report again. Also if some processes
 hang or die during the failover it can be tolerated, as the
 internal manager can restart them. Preferably processes
 should die instead of hanging.
 
 But I am concerned about dangling locks on the
 server. Not sure on how to handle those. On the NFS-HA docs
 some files on /var/lib/nfs are copied using scripts every
 few seconds, but this does not seem to be a foolprof way. 
 
 I overviewed the docs from NFS-HA sent to the list,
 they are usefull, but also very Linux HA
 centric, and require the heartbeat2 package. I wont install
 another cluster stack, since I already have CRS here. 
 
 Do anyone has pointers on a similar setup with CRS?
 
  Best Regards,
  Luis

Re: [Ocfs2-users] NFS Failover

2008-12-09 Thread Luis Freitas
Sunil,

   They are not waiting, the kernel reconnects after a few seconds, but just 
dont like the other nfs server, any attempt to access directories or files 
after the virtual IP failover to the other nfs server was resulting in errors. 
Unfortunatelly I dont have the exact error message here anymore.

   We found a parameter on the nfs server that seems to fix it, fsid. If you 
set this to the same number on both servers it forces both of them to use the 
same identifiers. Seems that if you dont, you need to guarantee that the mount 
is done on the same device on both servers, and we cannot do this since we are 
using powerpath. 

  I would like to confirm if the inode numbers are consistent accross servers?

  That is:

[EMAIL PROTECTED] concurrents]$ ls -il
total 8
131545 drwxr-xr-x  2  100 users 4096 Dec  9 12:12 admin
131543 drwxrwxrwx  2 root dba   4096 Dec  4 08:53 lost+found
[EMAIL PROTECTED] concurrents]$

   Directory admin (Or other directories/files) is always be inode number 
131545, no mater on what server we are? Seems to be so, but I would like to 
confirm.


   About the metadata changes, this share will be used for log files (Actually, 
for a Oracle eBusiness Suite concurrent log and output files), so we can 
tolerate if a few of the latest files are lost during the failover. The user 
can simply run his report again. Also if some processes hang or die during the 
failover it can be tolerated, as the internal manager can restart them. 
Preferably processes should die instead of hanging.

   But I am concerned about dangling locks on the server. Not sure on how to 
handle those. On the NFS-HA docs some files on /var/lib/nfs are copied using 
scripts every few seconds, but this does not seem to be a foolprof way. 

   I overviewed the docs from NFS-HA sent to the list, they are usefull, but 
also very Linux HA centric, and require the heartbeat2 package. I wont 
install another cluster stack, since I already have CRS here. 

   Do anyone has pointers on a similar setup with CRS?

Best Regards,
Luis

--- On Mon, 12/8/08, Sunil Mushran [EMAIL PROTECTED] wrote:

 From: Sunil Mushran [EMAIL PROTECTED]
 Subject: Re: [Ocfs2-users] NFS Failover
 To: [EMAIL PROTECTED]
 Cc: ocfs2-users@oss.oracle.com
 Date: Monday, December 8, 2008, 11:47 PM
 While the nfs protocol is stateless and thus should handle
 failing-over,
 the procedures themselves are synchronous. Meaning, I am
 not sure
 how a nfs client will handle getting a ok for some metadata
 change
 (mkdir, etc) just before a server dies and is recovered by
 another node.
 If the op did not make it to the journal, it would be a
 null op. But the
 nfs client would not know that as the server has
 failed-over.
 This is a qs for nfs.
 
 What is the stack of the nfs clients? As in, what are they
 waiting on?
 
 Luis Freitas wrote:
  Hi list,
 
 I need to implement a High available NFS server.
 Since we already have OCFS2 here for RAC, and already have a
 virtual IP on the RAC server that failovers automatically to
 the other node, it seems a natural choice to use it too for
 our NFS needs. We are using OCFS2 1.2. (Upgrade to 1.4 is
 not on our current plans)
 
 We did a preliminary failover test, and the client
 that mounts the filesystem (Actually a solaris box) doesnt
 like the failover. We expect some errors and minor data loss
 and can tolerate them as a transient condition, but the
 problem is that the mounted filesystem on the client becomes
 useless until we umount and remount it again.
 
 I suspect that NFS uses inode numbers on underlying
 filesystem to create handles that it passes on
 to clients, but I am not sure on how this is done.
 
 Anyone know if we can achieve a failover without
 needing to remount he nfs share on the clients? Any special
 options are needed mounting the OCFS2 filesystem and also
 for exporting it as NFS, or on the client?
 
  Best Regards,
  Luis
 


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] NFS Failover

2008-12-08 Thread Luis Freitas
Hi list,

   I need to implement a High available NFS server. Since we already have OCFS2 
here for RAC, and already have a virtual IP on the RAC server that failovers 
automatically to the other node, it seems a natural choice to use it too for 
our NFS needs. We are using OCFS2 1.2. (Upgrade to 1.4 is not on our current 
plans)

   We did a preliminary failover test, and the client that mounts the 
filesystem (Actually a solaris box) doesnt like the failover. We expect some 
errors and minor data loss and can tolerate them as a transient condition, but 
the problem is that the mounted filesystem on the client becomes useless until 
we umount and remount it again.

   I suspect that NFS uses inode numbers on underlying filesystem to create 
handles that it passes on to clients, but I am not sure on how this is done.

   Anyone know if we can achieve a failover without needing to remount he nfs 
share on the clients? Any special options are needed mounting the OCFS2 
filesystem and also for exporting it as NFS, or on the client?

Best Regards,
Luis

   


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?

2008-12-03 Thread Luis Freitas

   Depending on your configuration, dataguard will transfer the modifications 
on the online log directly to the standby, so that the archived logs are 
recreated there. It doesnt transfer the archivedlogs from disk, instead it 
transfers the redo log entries directly to the other host, as they are 
generated, and the archivelogs are recreated there.

   So the remote copy is created independently from the local copy.

   The problem could be on the process writing the archivelog to disk or on the 
operating system/hardware. 

   If it is a operating system or hardware issue, it probably is corrupting 
your datafiles as well. You should find fractured blocks or other corruptions 
indicating lost writes as well, like discrepancies between tables and indexes 
and rollback errors due to incorrect block scn. You can check this running a 
ANALYZE TABLE  VALIDATE STRUCTURE CASCADE; job on all the tables. This 
command will also compare all the tables with the indexes, so it will 
effectively read all data blocks on the database and complain if it finds 
corrupted blocks or if the tables and indexes have discrepancies. Some types of 
tables cant be verified like this, so some errors indicating this are normal. 
(Btw this command locks the tables while it is running).

    Can you post the error that appears when applying the archivelog?

Regards,
Luis

    

--- On Wed, 12/3/08, Silviu Marin-Caea [EMAIL PROTECTED] wrote:
From: Silviu Marin-Caea [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?
To: ocfs2-users@oss.oracle.com
Date: Wednesday, December 3, 2008, 1:17 PM

On Monday 22 September 2008 15:02:36 Silviu Marin-Caea wrote:
 We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive logs are generated
 on an OCFS2 volume (mounted with nointr,datavolume).  It has happened 3
 times in one year that some archivelog had a lost write.  We have detected
 this when applying the archivelogs on the standby database (with
 dataguard).  We had to copy some datafiles from the production database to
 the standby and let it resume the recovery process.

 Has it ever occurred a data loss of this kind (lost write) on an OCFS2
 volume, version 1.2.3 x86_64?

 We had 32 bit servers before with OCFS2 that was even older than 1.2.3 and
 those servers never had such a problem with archivelogs.

 The storage is Dell/EMC Clariion CX3-40.  The storage on the old servers
 was CX300.

 We are worried that this lost writes could occur not only in archivelogs
 but in the datafiles as well...

 Not saying that OCFS2 is the cause, the problem might be with something
 else, but we must investigate everything.

OCFS2 is not the cause.  The error just occurred again and this time we had 
the archivelogs multiplexed on both OCFS2 and local storage (reiserfs).  Both 
archives have identical MD5 sums.

There was no lost write, just some bullshit that Oracle support tries to feed 
us.

There is still an unknown, but it's not related to OCFS2.  The unknown is
that 
archives on the standby database have different MD5 sums than the ones on 
production.  All the archives, not just the corrupt one.  Does dataguard 
intervenes in some way in the archives during transmission?  I thought it was 
just supposed to transfer them unchanged, then apply them on the standby.


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Please urgent help required - OCFS2 and VPN again

2008-12-02 Thread Luis Freitas
Lorenzo,

   My 2 cents. This is purely speculation, since I never worked with a 
environment like what you have there.

   You have a very different configuration from what is usual with OCFS2. The 
filesystem is tested on systems that have a fast network connection between 
nodes, so it is probably not tuned to environments where the network bandwith 
is low.

    You might get some improvement if you change some of the VM tunables 
(/proc/sys/vm/*). On 2.6 there are not much of them for the filesystem cache, 
and some seem to have no effect.

   vfs_cache_pressure could give you some control on the quantity of inodes 
cached by the kernel. Try either increasing or decreasing. Some of the problems 
you relate, like the long time for umount might actually be caused by keeping a 
large amount of structures on memory, and since the network is slow a long time 
is needed to clear all of them.

   Swappiness controls how aggressivelly pages are swapped out. Since you dont 
have swap on the OCFS2 filesystem it should not have much impact. (You dont 
have swap there, right?) But you may be able to force the kernel to release 
memory so that it can be used by the OCFS2. Again this could actually cause the 
problem to become worse.

   There used to be a parameter that controls how much of the cache is used for 
inodes, and how much of it is used for data blocks, dcache_priorty.  But it no 
longer available on 2.6  and I could not find an equivalent.

Regards,
Luis

--- On Tue, 12/2/08, Lorenzo Milesi [EMAIL PROTECTED] wrote:
From: Lorenzo Milesi [EMAIL PROTECTED]
Subject: [Ocfs2-users] Please urgent help required - OCFS2 and VPN again
To: ocfs2-users@oss.oracle.com
Date: Tuesday, December 2, 2008, 8:59 AM

Hi all...

I already wrote before on the list about the solution I have at a
customer running DRBD8+OCFS2 on two remote sites connected via VPN.
The different suggestions helped improving the situation, but still
we're having big troubles. We've also upgraded the old server with a
new
and much more powerful one but there was nearly no improvement at all!
The situation is resumed as:
SITE A: Dual Core 2GHz Pentium, 1Gb ram, 1 SATA hdd for /, 3 SATA hdd in
software raid5, DRBD on /dev/md0. 
SITE B: Quad Core 2.4HGz Pentium, 2Gb ram, 3 SATA HDD in software raid5,
DRBD on /dev/md1.

The two sites are connected using two ADSL, with TWO bonded VPN.

Both machines run Debian Etch fully updated, kernel 2.6.26-bpo.1-686 SMP
with deadline scheduler, DRBD 8.0.13, OCFS2 1.4.1-1. 
The shared data partition is 187G, 30 of which used.

The recent upgrade to OCFS2 1.4 and kernel 2.6.26 didn't improve the
performances as much as I expected.

The main problems we have are:
1. very high load average: this was previously caused by very high
iowait percentages, but with the new server the load is high while top
says the machine is 99-100% idle! 
2. very slow dir browsing: Sunil pointed me to the user guide, where he
talks about inode stat. How can I raise inode cache memory? I've done
several searches without result... The server actually uses less than
300Mb of ram out of the 1Gb installed...
3. very long umount time: I often (not always) experience an extremely
long umount time. During the period while the process is executing iftop
says there's a high usage of network transfer. I suppose it's
transfering file locks, but is it possible that stays stuck for more
than one hour, and still going?

This is the configuration file of OCFS2. The quad-core is file-server-2.

#/etc/ocfs2/cluster.conf
node:
ip_port = 
ip_address = 192.168.0.1
number = 0
name = file-server-1
cluster = ocfs2
node:
ip_port = 
ip_address = 192.168.2.31
number = 1
name = file-server-2
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2


What is stunning me is that on file-server-2 we run a rsync backup during the
night on a local machine on the network, and it takes less than 20m! Doing the
same on the other server throws the load average to the stars!

We're in a critical situation because this solution is deployed since a
long time and it's not yet working as expected. 
If nobody has suggestion we have no problem in paying qualified support for
solving these problems. In this case please contact me directly. 
Sunil, can I get Oracle support for this?

Thank you.
-- 
Lorenzo Milesi - [EMAIL PROTECTED]

YetOpen S.r.l. - http://www.yetopen.it/
C.so E. Filiberto, 74 23900 Lecco - ITALY -
Tel 0341 220 205 - Fax 178 607 8199

GPG/PGP Key-Id: 0xE704E230 - http://keyserver.linux.it

 D.Lgs. 196/2003 

Si avverte che tutte le informazioni contenute in questo messaggio sono
riservate ed a uso esclusivo del destinatario. Nel caso in cui questo
messaggio Le fosse pervenuto per errore, La invitiamo ad eliminarlo
senza copiarlo, a non inoltrarlo a terzi e ad avvertirci non appena
possibile.
Grazie.


___

Re: [Ocfs2-users] New node..new problems

2008-10-10 Thread Luis Freitas
Dante,

   Your old debian is running OCFS 1.4 and your new Centos is running OCFS 1.2, 
right?

   If you are running Centos 5.0 you should be able to install OCFS 1.4. 

   If not you will need to umount your debian before mounting the Centos. 
Beware that there are functionalities on OCFS 1.4 that are not available on 
1.2, that might impact your applications.

   Also I am not sure if the disk layout is fully compatible if certain OCFS 
1.4 filesystem options were enabled on your old cluster. The best option would 
be to upgrade to OCFS 1.4 on the Centos cluster.

Regards,
Luis

--- On Fri, 10/10/08, Dante Garro [EMAIL PROTECTED] wrote:
From: Dante Garro [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] New node..new problems
To: 'Tao Ma' [EMAIL PROTECTED]
Cc: 'ocfs2-users@oss.oracle.com' ocfs2-users@oss.oracle.com
Date: Friday, October 10, 2008, 9:05 AM

Thanks Tao, I've setup the same on both nodes and the cluster becomes
online.
Now, when I try to mount the following errors appears on node 1 (new
CentOS):
(2512,1):o2net_connect_expired:1585 ERROR: no connection established with
node 0 after 30.0 seconds, giving up and returning errors.
(3022,1):dlm_request_join:901 ERROR: status = -107
(3022,1):dlm_try_to_join_domain:1049 ERROR: status = -107
(3022,1):dlm_join_domain:1321 ERROR: status = -107
(3022,1):dlm_register_domain:1514 ERROR: status = -107
(3022,1):ocfs2_dlm_init:2024 ERROR: status = -107
(3022,1):ocfs2_mount_volume:1133 ERROR: status = -107
ocfs2: Unmounting device (147,0) on (node 1)

And the following on node 0 (old Debian)

 (2228,0):o2net_check_handshake:1093 node nodo2 (num 1) at
192.168.168.2: advertised net protocol version 103 but 2 is required,
disconnecting

I believe the Debian message is clear, protocol version incompatibility.

Are there a way to resolve it?

Thanks

Dante


-Mensaje original-
De: Tao Ma [mailto:[EMAIL PROTECTED] 
Enviado el: viernes, 10 de octubre de 2008 1:25
Para: Dante Garro
CC: 'Sunil Mushran'; 'ocfs2-users@oss.oracle.com'
Asunto: Re: [Ocfs2-users] New node..new problems


Hi,
Dante Garro wrote:
 Sunil, now I fall in count of messages are related to node 0, but the 
 new is node 1 and does not care about the value I've setup allways
says
14000 ms.
 Do this change your diagnostic?
Node1 start connection with node0, so you see the messages related to node0
on node1. It looks like your configuration in node1 is wrong.
Please make sure that value of O2CB_HEARTBEAT_THRESHOLD in
/etc/sysconfig/o2cb of node1 is the same as that in node0.

Regards,
Tao

 
 
 -Mensaje original-
 De: Sunil Mushran [mailto:[EMAIL PROTECTED] Enviado el: 
 Jueves, 09 de Octubre de 2008 06:02 p.m.
 Para: Dante Garro
 CC: 'ocfs2-users@oss.oracle.com'
 Asunto: Re: [Ocfs2-users] New node..new problems
 
 Yeah the cluster timeouts are not consistent. Update and restart the 
 cluster on the new node (or all nodes as the case might be).
 
 Hint: cat /sys/kernel/config/cluster/clustername/idle_timeout_ms
 to see the active heartbeat threshold.
 
 Dante Garro wrote:
 Hi all, because problems with ocfs2 release of Debian distribution 
 decided to remake my cluster replacing it by CentOS based
installation.
 Started replacing one of the nodes keeping the other working.

 On this recently created node the following errors appears:

 drbd0: Writing meta data super block now.
 (2558,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
 count of 14000 ms, but our count is 13000 ms.
 Please double check your configuration values for
 'O2CB_HEARTBEAT_THRESHOLD'
 OCFS2 1.2.9 Wed Sep 24 19:26:41 PDT 2008 (build
 a693806cb619dd7f225004092b675ede)
 (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.
 (2556,1):dlm_request_join:901 ERROR: status = -107
 (2556,1):dlm_try_to_join_domain:1049 ERROR: status = -107
 (2556,1):dlm_join_domain:1321 ERROR: status = -107
 (2556,1):dlm_register_domain:1514 ERROR: status = -107
 (2556,1):ocfs2_dlm_init:2024 ERROR: status = -107
 (2556,1):ocfs2_mount_volume:1133 ERROR: status = -107
 ocfs2: Unmounting device (147,0) on (node 1)
 (2591,1):o2hb_check_slot:881 ERROR: Node 0 on device drbd0 has a dead 
 count of 14000 ms, but our count is 13000 ms.
 Please double check your configuration values for
 'O2CB_HEARTBEAT_THRESHOLD'
 (2520,1):o2net_connect_expired:1585 ERROR: no connection established 
 with node 0 after 30.0 seconds, giving up and returning errors.
 (2589,1):dlm_request_join:901 ERROR: status = -107
 (2589,1):dlm_try_to_join_domain:1049 ERROR: status = -107
 (2589,1):dlm_join_domain:1321 ERROR: status = -107
 (2589,1):dlm_register_domain:1514 ERROR: status = -107
 (2589,1):ocfs2_dlm_init:2024 ERROR: status = -107
 (2589,1):ocfs2_mount_volume:1133 ERROR: status = -107
 ocfs2: Unmounting device (147,0) on (node 1)

 I've changed the parameter O2CB_HEARTBEAT_THRESHOLD according O2CB

 adviced me, but It don't resolve the issue.

 I hope someone could 

Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?

2008-09-22 Thread Luis Freitas
Silviu,

   When I had this kind of issues it usually was caused by a bad hba, or a 
power failure. I am assuming it is not the latter as you would be aware of it.

   It is a difficult situation, since the controller only malfunctions 
sporadically it is difficult to prove that it is the cause or to get it changed 
on warranty. And your database slowly gets corrupted, until someday it crashes 
and wont startup. If this is the cause it surely is happening on the datafiles 
also.

   To be safe you should run a ANALYZE TABLE ... VALIDATE STRUCTURE CASCADE; 
on all your database tables, and look for fractured or bad blocks on the 
datafiles using dbv or rman. A fractured block is one that has a different 
timestamp on the begin and the end, so it was only partially writen to the disk.

   You also could try to change the hba with some other server to see if the 
problem disappears.

Regards,
Luis

--- On Mon, 9/22/08, Silviu Marin-Caea [EMAIL PROTECTED] wrote:

 From: Silviu Marin-Caea [EMAIL PROTECTED]
 Subject: [Ocfs2-users] Lost write in archive logs: has it ever happened?
 To: ocfs2-users@oss.oracle.com
 Date: Monday, September 22, 2008, 9:02 AM
 We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive logs
 are generated on 
 an OCFS2 volume (mounted with nointr,datavolume).  It has
 happened 3 times in 
 one year that some archivelog had a lost write.  We have
 detected this when 
 applying the archivelogs on the standby database (with
 dataguard).  We had to 
 copy some datafiles from the production database to the
 standby and let it 
 resume the recovery process.
 
 Has it ever occurred a data loss of this kind (lost write)
 on an OCFS2 volume, 
 version 1.2.3 x86_64?
 
 We had 32 bit servers before with OCFS2 that was even older
 than 1.2.3 and 
 those servers never had such a problem with archivelogs.
 
 The storage is Dell/EMC Clariion CX3-40.  The storage on
 the old servers was 
 CX300.
 
 We are worried that this lost writes could occur not only
 in archivelogs but 
 in the datafiles as well...
 
 Not saying that OCFS2 is the cause, the problem might be
 with something else, 
 but we must investigate everything.
 
 Thank you
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?

2008-09-22 Thread Luis Freitas
Silviu,

Just so you be warned, the ANALYZE TABLE... command locks the tables 
during its execution.

Regards,
Luis


--- On Mon, 9/22/08, Luis Freitas [EMAIL PROTECTED] wrote:

 From: Luis Freitas [EMAIL PROTECTED]
 Subject: Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?
 To: ocfs2-users@oss.oracle.com, Silviu Marin-Caea [EMAIL PROTECTED]
 Date: Monday, September 22, 2008, 12:54 PM
 Silviu,
 
When I had this kind of issues it usually was caused by
 a bad hba, or a power failure. I am assuming it is not the
 latter as you would be aware of it.
 
It is a difficult situation, since the controller only
 malfunctions sporadically it is difficult to prove that it
 is the cause or to get it changed on warranty. And your
 database slowly gets corrupted, until someday it crashes and
 wont startup. If this is the cause it surely is happening on
 the datafiles also.
 
To be safe you should run a ANALYZE TABLE ...
 VALIDATE STRUCTURE CASCADE; on all your database
 tables, and look for fractured or bad blocks on the
 datafiles using dbv or rman. A fractured block is one that
 has a different timestamp on the begin and the end, so it
 was only partially writen to the disk.
 
You also could try to change the hba with some other
 server to see if the problem disappears.
 
 Regards,
 Luis
 
 --- On Mon, 9/22/08, Silviu Marin-Caea
 [EMAIL PROTECTED] wrote:
 
  From: Silviu Marin-Caea [EMAIL PROTECTED]
  Subject: [Ocfs2-users] Lost write in archive logs: has
 it ever happened?
  To: ocfs2-users@oss.oracle.com
  Date: Monday, September 22, 2008, 9:02 AM
  We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive
 logs
  are generated on 
  an OCFS2 volume (mounted with nointr,datavolume).  It
 has
  happened 3 times in 
  one year that some archivelog had a lost write.  We
 have
  detected this when 
  applying the archivelogs on the standby database (with
  dataguard).  We had to 
  copy some datafiles from the production database to
 the
  standby and let it 
  resume the recovery process.
  
  Has it ever occurred a data loss of this kind (lost
 write)
  on an OCFS2 volume, 
  version 1.2.3 x86_64?
  
  We had 32 bit servers before with OCFS2 that was even
 older
  than 1.2.3 and 
  those servers never had such a problem with
 archivelogs.
  
  The storage is Dell/EMC Clariion CX3-40.  The storage
 on
  the old servers was 
  CX300.
  
  We are worried that this lost writes could occur not
 only
  in archivelogs but 
  in the datafiles as well...
  
  Not saying that OCFS2 is the cause, the problem might
 be
  with something else, 
  but we must investigate everything.
  
  Thank you
  
  ___
  Ocfs2-users mailing list
  Ocfs2-users@oss.oracle.com
  http://oss.oracle.com/mailman/listinfo/ocfs2-users
 
 
   
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-07-29 Thread Luis Freitas
Tina,
 
   The raw devices are being deprecated on Linux. Since you are using fedora 
instead of a enterprise distro these changes are already done.
 
   You can use disk devices directly with the 11g clusterware. Also it probably 
would work with the main tree OCFS2, that doesnt has the datavolume option, 
since this implies that the clusterware writes on these devices using O_DIRECT, 
but AFAIK this not tested on the main OCFS2 and since the clusterware uses some 
heuristics to turn on these options it could not work with OCFS2.
 
   If you have metalink access check note 401132.1. Beware that this is broken 
on 10gR1 and probably the initial 10gR2 patchsets. I didnt found references of 
patches to include the O_DIRECT on the 10g version of the CRS so I am not sure 
when this was fixed.
 
Regards,
Luis

--- On Tue, 7/29/08, Tina Soles [EMAIL PROTECTED] wrote:

From: Tina Soles [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9
To: [EMAIL PROTECTED], Tao Ma [EMAIL PROTECTED]
Cc: Randy Gustin [EMAIL PROTECTED], ocfs2-users@oss.oracle.com
Date: Tuesday, July 29, 2008, 4:19 PM

OK, another snag.  Fedora 9 does not support RAW devices, so I can't
configure the voting disk or OCR disk to be as such.  Any suggestions? I
think I'm up a creek here...

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Sunday, June 29, 2008 11:20 PM
To: Tao Ma; Tina Soles
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

datavolume mount option is only on ocfs2 for enterprise kernels.

For most part, you shouldn't require it for using it as a datastore.
Instead set init.ora param filesystemio_options to directio (or is it
odirect).

The only bit that won't work here is using ocfs2 for the voting disk and
ocr... as datavolume is necessary to force enable odirect.
But you can always use raw for that.

Sunil

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Recommended block size for a mail environment

2008-07-22 Thread Luis Freitas

   Funny, I could not find a option to specify the number of inodes or the 
inodes/bytes ratio on mkfs.ocfs2?

Regards,
Luis


--- On Tue, 7/22/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
Subject: [Ocfs2-users] Recommended block size for a mail environment
To: ocfs2-users@oss.oracle.com
Date: Tuesday, July 22, 2008, 9:22 AM

Hello all,

I have a mail environment, with a situation seemed with a previous  
mail Different size with du and ls. My environment is:

Debian Etch 4.0
Postfix 2.3.8-2
ocfs2-tools 1.2.1-1.3
kernel 2.6.18-4-amd64

My df -h shows me the follow:

df -Th
/dev/sdb1ocfs21,0T  894G  131G  88% /mails

df -Thi
File Syst.   Type  Inodes   IUsed IFree IUse% Mount on
/dev/sdb1ocfs2   16M 14M2,1M   88% /mails

but when I do a du -sh . /mails it shows me 480GB, so much
different  
from 849G. The OCFS2 block size is 4K. I made a script to average the  
medium mail size, and it's the follow:

0k to 1k - 8%
0k to 2k - 13%
0k to 3k - 22%
0k to 4k - 35%
4.1k to 6k - 15%
6.1 to 8k - 10%
8.1k to 10k - 9%
10.1k to 12k - 7%
rest - 24%

My debug.ocfs2 doesn't has the -R option:

# debugfs.ocfs2 -R stats /dev/sdb1
debugfs.ocfs2: invalid option -- R
debugfs.ocfs2 1.2.1


My doubts are:

4k is a good block size for this situation?
How to avoid waste space as now (~50%)?
If I choose to use 2k, will the available Inodes double (16M to 32M)?

Thanks a lot!

Jeronimo


Universidade Federal da Bahia - http://www.portal.ufba.br



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ocfs2 datavolume option and oracle

2008-07-08 Thread Luis Freitas
Jamin,

   If you are using 10g you can create two 170Mb partitions for the OCR and 
three 20Mb partitions for the voting disk, and you can leave the rest on a 
single partition for the OCFS2 filesystem.

   I never installed 11g RAC, but the manual says that each partition must be 
280Mb or larger, so you will need at least two 280Mb partitions for one OCR and 
one voting disk, preferrably five, for two OCR copies and three voting disks.

Regards,
Luis

--- On Tue, 7/8/08, Sunil Mushran [EMAIL PROTECTED] wrote:
From: Sunil Mushran [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] ocfs2 datavolume option and oracle
To: [EMAIL PROTECTED] [EMAIL PROTECTED]
Cc: ocfs2-users@oss.oracle.com
Date: Tuesday, July 8, 2008, 12:18 PM

[EMAIL PROTECTED] wrote:
 By raw are you meaning raw device access without a filesystem like ocfs2 
 on the volume for the voting disk?  Or am I not following?
   
Raw means specifying the block device directly. So make two
partitions, say, sdd1 and sdd2, and feed that (/dev/sdd1, etc)
to the tool that creates the voting disk and ocr.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] ocfs2 limits

2008-07-03 Thread Luis Freitas
Sunil,

  This applies only to subdirectories or also to the number of files inside a 
directory?

   Oracle Applications can easily have hundreds of thousands files inside the 
concurrent output and log directories.

Regards,
Luis

--- On Thu, 7/3/08, Sunil Mushran [EMAIL PROTECTED] wrote:
From: Sunil Mushran [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] ocfs2 fencing problem
To: Gabriele Di Giambelardini [EMAIL PROTECTED]
Cc: Kuang, Howard [WHQKT] [EMAIL PROTECTED], ocfs2-users@oss.oracle.com
Date: Thursday, July 3, 2008, 1:54 PM

Gabriele Di Giambelardini wrote:
 Hi to all, some time ago, I read that the ocfs have a limit for the 
 subfolder. Is it possible this whren this limit gone  exceeded the 
 ocfs2 have those problem???


 Or some boby know the limit number?

http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#LIMITS

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] RAC Shared Disk..?

2008-06-16 Thread Luis Freitas
Abhishek, Depends on what you intend to use OCFS2 for. Since you want to test Oracle RAC, the best options would be to go with a iSCSI setup or use a firewire shared disk, to get a configuration close to what is supported by Oracle . For both options you will need a third computer to "simulate" the storage, and for the firewire also the firewire adapters for the three computers. For the iSCSI you can use "OpenFiler", which has a browser based interface to create the iSCSI volume, and it can run on the same ethernet where you run the cluster interconnect, or on a third separate adapter on each server.  Nowadays iSCSI seems to be prefered over firewire,
 the Oracle people here may have a better indication on which to use. If you only intended to run other services, like Apache you could try with DRDB. But for installing RAC I would not recommend it as would be very different from what a RAC production environment should look like. Also you can simulate the two nodes and the "shared storage" on a single machine using VMWare server. There are some howtos on this subject on www.oracle-base.com (But none for the particular setup you want to achieve).Regards,Luis--- On Mon, 6/16/08, Abhishek Sahu [EMAIL PROTECTED] wrote:From: Abhishek Sahu [EMAIL PROTECTED]Subject: [Ocfs2-users] RAC Shared Disk..?To: Ocfs2-users@oss.oracle.comDate: Monday, June 16, 2008,
 9:06 AM


 
 
 






Dear All, 

  

I am preparing a two node cluster on
simple LINUX desktop systems. Both systems have 80 GB of hard disk; I don’t
have any additional storage. 

  

Now I want to know the procedure for
making a shared disk space out or these two available hard disks. Can any body
help me out in this experiment? 

  

Also if any body can send me a detailed
installation document for RAC on Linux (other than given on oracle website),
then it will be a great help for me.  

  

Thanks in Advance, 

  

Abhishek  

  

  



 

___Ocfs2-users mailing listOcfs2-users@oss.oracle.comhttp://oss.oracle.com/mailman/listinfo/ocfs2-users



  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-06-06 Thread Luis Freitas
Martin, Sunil and Mark are Oracle employees and involved on the development of OCFS2, I am just a user, :-).Regards,Luis--- On Fri, 6/6/08, Schmitter, Martin [EMAIL PROTECTED] wrote:From: Schmitter, Martin [EMAIL PROTECTED]Subject: AW: [Ocfs2-users] CRS/CSS and OCFS2To: "[EMAIL PROTECTED]" [EMAIL PROTECTED], "[EMAIL PROTECTED]" [EMAIL PROTECTED]Cc: "ocfs2-users@oss.oracle.com" ocfs2-users@oss.oracle.comDate: Friday, June 6, 2008, 4:31
 AM
 
 


Hi Luis,
 
 
 
now I am a bit confused, because I asked a few months ago this question. Howmust be the timing setting from OCFS2 and CRS? 
 
Sunil and Mark stated that OCFS2 must be theleading system! 
 
 
If I get it right, the SAN fail over comes first, then OCFS2 und least but not last CRS.


BR



Martin Schmitter


-- 



OPITZ CONSULTING Gummersbach GmbH
Martin Schmitter - Fachinformatiker
Kirchstr. 6 - 51647 Gummersbach
Telefon: +49 2261 6001-0
Mobil: +49 173 2808193
http://www.opitz-consulting.de

Geschäftsführer: Bernhard Opitz, Dr. Jürgen Abel, Ulrich Kramer
HRB-Nr.39163 Amtsgericht Köln






Von: [EMAIL PROTECTED] [EMAIL PROTECTED] im Auftrag von Luis Freitas [EMAIL PROTECTED]
Gesendet: Donnerstag, 5. Juni 2008 18:32
An: [EMAIL PROTECTED]
Cc: ocfs2-users@oss.oracle.com
Betreff: Re: [Ocfs2-users] CRS/CSS and OCFS2







Alexandra,

 I usually make sure that one of the timeouts is large enought so that the other node death is detected before the other node "self-fence".

 To solve the problem you could configure the OCFS timeouts to be larger than the CRS timeouts, so that the CRS fences the node and OCFS detects the other node as dead before it takes any action.

 Maybe Sunil has a better solution that I am not aware of.

 This is particular of OCFS2 and CRS, which is kind of funny since both are developed by Oracle. With vendor clusterware (Sun cluster, Veritas, etc) CRS is integrated with the vendor clusterware stack so that this kind of situation does not occur.

 Btw, CRS is kind of picky about its interfaces, if it detects a link down on the interface, it will shutdown the services on the node. This is why I asked about the crossover cable, when using a crossover cable and one node goes down, the inteface goes down
 on the other node and things does not work as expected. 

Regards,
Luis

--- On Thu, 6/5/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

From: [EMAIL PROTECTED] [EMAIL PROTECTED]
Subject: [Ocfs2-users] CRS/CSS and OCFS2
To: ocfs2-users@oss.oracle.com
Date: Thursday, June 5, 2008, 12:38 PM


Hi Sunil, 

sorry for the delay but I was ill the last 10 days.


a. We do not use a crossover cable between the two nodes. The two systems are seated in two SANs in different building with redundant switches and HBA's inbetween.


b.ocfs2-node numbers: [EMAIL PROTECTED] etc]$ cat /etc/ocfs2/cluster.conf

node: 
ip_port =  
ip_address = 10.190.59.5 
number = 0 
name = byaz05.bayer-ag.com 
cluster = ocfs2 

node: 
ip_port =  
ip_address = 10.190.59.6 
number = 1 
name = byaz10.bayer-ag.com 
cluster = ocfs2 

cluster: 
node_count = 2 
name = ocfs2 

Clusterconfiguration css/crs: 
/u01/app/oracle/product/crs/log/byaz05/crsd

2008-04-25 09:29:01.855: [ OCRMAS][1210108256]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 2

2008-04-25 09:29:01.862: [ OCRRAW][1210108256]proprioo: for disk 0 (/dev/raw/raw101), id match (1), my id set (1723799148,1710759834) total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes (1), total
 votes (2) 
2008-04-25 09:29:01.862: [ OCRRAW][1210108256]proprioo: for disk 1 (/dev/raw/raw201), id match (1), my id set (1723799148,1710759834) total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes (1), total
 votes (2) 

/u01/app/oracle/product/crs/log/byaz10/crsd

2008-04-25 10:21:28.781: [ OCRMAS][1210108256]th_master:13: I AM THE NEW OCR MASTER at incar 4. Node Number 1

2008-04-25 10:21:28.781: [ OCRMSG][1505941856]prom_rpc:1: NULL con. Probably got disconnected due to a remote server failure.

2008-04-25 10:21:29.324: [ OCRRAW][1210108256]proprioo: for disk 0 (/dev/raw/raw101), id match (1), my id set (1723799148,1710759834) total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes (1), total
 votes (2) 
2008-04-25 10:21:29.324: [ OCRRAW][1210108256]proprioo: for disk 1 (/dev/raw/raw201), id match (1), my id set (1723799148,1710759834) total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes (1), total
 votes (2) 
2008-04-25 10:21:29.351: [ OCRMAS][1210108256]th_master: Deleted ver keys from cache (master)


So the two nodes have the following nodenumbers:


 

Fencing the node with the higher node number ocfs2 would have fenced byaz10 and crs/css would have fenced byaz05. This is exactly the behaviour we watched. But how to solve this? Oracle says it's certified to use ocfs2 with
 RAC. Then the used software combination is nearly the same as we us

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-06-05 Thread Luis Freitas
Alexandra, I usually make sure that one of the timeouts is large enought so that the other node death is detected before the other node "self-fence". To solve the problem you could configure the OCFS timeouts to be
larger than the CRS timeouts, so that the CRS fences the node and OCFS detects the other node as dead before it takes any action. Maybe Sunil has a better solution that I am not aware of.
 This is particular of OCFS2 and CRS, which is kind of funny since both are developed by Oracle. With vendor clusterware (Sun cluster, Veritas, etc) CRS is integrated with the vendor clusterware stack so that this kind of situation does not occur. Btw, CRS is kind of picky about its interfaces, if it detects a link down
on the interface, it will shutdown the services on the node. This is
why I asked about the crossover cable, when using a crossover cable and
one node goes down, the inteface goes down on the other node and things
does not work as expected. 
Regards,Luis--- On Thu, 6/5/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:From: [EMAIL PROTECTED] [EMAIL PROTECTED]Subject: [Ocfs2-users] CRS/CSS and OCFS2To: ocfs2-users@oss.oracle.comDate: Thursday, June 5, 2008, 12:38 PM
Hi Sunil,

sorry for the delay but I was ill the
last 10 days.

a. We do not use a crossover cable between
the two nodes. The two systems are seated in two SANs in different building
with redundant switches and HBA's inbetween.

b.ocfs2-node numbers: [EMAIL PROTECTED]
etc]$ cat /etc/ocfs2/cluster.conf
node:
ip_port
= 
ip_address
= 10.190.59.5
number =
0
name = byaz05.bayer-ag.com
cluster
= ocfs2

node:
ip_port
= 
ip_address
= 10.190.59.6
number =
1
name = byaz10.bayer-ag.com
cluster
= ocfs2

cluster:
node_count
= 2
name = ocfs2

Clusterconfiguration css/crs:
/u01/app/oracle/product/crs/log/byaz05/crsd
2008-04-25 09:29:01.855: [ OCRMAS][1210108256]th_master:12:
I AM THE NEW OCR MASTER at incar 1. Node Number 2
2008-04-25 09:29:01.862: [ OCRRAW][1210108256]proprioo:
for disk 0 (/dev/raw/raw101), id match (1), my id set (1723799148,1710759834)
total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes
(1), total votes (2)
2008-04-25 09:29:01.862: [ OCRRAW][1210108256]proprioo:
for disk 1 (/dev/raw/raw201), id match (1), my id set (1723799148,1710759834)
total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes
(1), total votes (2)

/u01/app/oracle/product/crs/log/byaz10/crsd
2008-04-25 10:21:28.781: [ OCRMAS][1210108256]th_master:13:
I AM THE NEW OCR MASTER at incar 4. Node Number 1
2008-04-25 10:21:28.781: [ OCRMSG][1505941856]prom_rpc:1:
NULL con. Probably got disconnected due to a remote server failure.
2008-04-25 10:21:29.324: [ OCRRAW][1210108256]proprioo:
for disk 0 (/dev/raw/raw101), id match (1), my id set (1723799148,1710759834)
total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes
(1), total votes (2)
2008-04-25 10:21:29.324: [ OCRRAW][1210108256]proprioo:
for disk 1 (/dev/raw/raw201), id match (1), my id set (1723799148,1710759834)
total id sets (1), 1st set (1723799148,1710759834), 2nd set (0,0) my votes
(1), total votes (2)
2008-04-25 10:21:29.351: [ OCRMAS][1210108256]th_master:
Deleted ver keys from cache (master)

So the two nodes have the following
nodenumbers:



Fencing the node with the higher node
number ocfs2 would have fenced byaz10 and crs/css would have fenced byaz05.
This is exactly the behaviour we watched. But how to solve this? Oracle
says it's certified to use ocfs2 with RAC. Then the used software combination
is nearly the same as we use it. How can the combination of the two systems
(ocfs2/css) fencing different nodes avoided then?


Greets,
Alex


In such a situation, ocfs2 fences the higher node
number. afaik,
css does the same. What are the css node numbers for the two nodes?

alexandra.strauss
at bayerbbs.com wrote:

 Hello,

 I refer to you hoping you may help me with my problem...
We have got 
 an issur here and opened a SR at Metalink but until now,
we got no 
 useful information in solving our problem. SR-Number is
6855815.994...

 We wanted to protect 9i Single-Instance Databases with
10g Clusterware 
 following the third-party-tool approach. There are no RAC-databases

 involved. But we want to achieve high availability as the
databases 
 are business critical systems. We want to make the systems
able to
 relocate to another machine in case of failure to keep
downtimes 
 low... To achieve this we want to use OCFS2 for the filesystem.

 Relocate is done by script with help of CRS.

 So we took two systems (byaz05 and byaz10) and installed
the following 
 software: 10g CRS (10.2.0.3) and Oracle Software 9.2.0.8
and OCFS2 1.2.8

 We found the following Metalinknotes and adjusted the heartbeat
and 
 timeouts for OCFS2: Metalink Note 395878.1: Heartbeat/Voting/Quorum

 Related Timeout Configuration for Linux, OCFS2, RAC Stack
to avoid 
 unnessary node fencing, panic and reboot
 Metalink Note 

Re: [Ocfs2-users] huge something problem

2008-06-02 Thread Luis Freitas
Alexandre,

nbsp; What are you running on the servers? (NFS Server? apache? Oracle?)

Regards,
Luis


--- On Mon, 6/2/08, Alexandre Racine lt;[EMAIL PROTECTED]gt; wrote:
From: Alexandre Racine lt;[EMAIL PROTECTED]gt;
Subject: Re: [Ocfs2-users] huge something problem
To: Sunil Mushran lt;[EMAIL PROTECTED]gt;
Cc: ocfs2-users@oss.oracle.com
Date: Monday, June 2, 2008, 11:17 AM

Ok. I have the same problem now (load of server at 8.00, no processors
higher then 5% of utilization, and a user can't access his folder).

I did your commands, but there no real data here...

[EMAIL PROTECTED] /mnt/data/testOCFS2 $ sudo ./scanlocks2.sh
racinea@ srv2 /mnt/data/testOCFS2 $ w
 10:16:44 up 9 days, 19:10,  2 users,  load average: 8.43, 8.36, 8.14
USER TTYLOGIN@   IDLE   JCPU   PCPU WHAT
racinea  pts/1 10:130.00s  0.02s  0.00s w
racinea@ srv2 /mnt/data/testOCFS2 $ sudo ./listdomains.sh
41535574BDEB4720B2CE7819A631DF10  /dev/sdd
/home


What else could I try?
Thanks.



Alexandre Racine
[EMAIL PROTECTED]
514-461-1300 poste 3303

gt; -Original Message-
gt; From: Sunil Mushran [mailto:[EMAIL PROTECTED]
gt; Sent: 27 mai 2008 14:26
gt; To: Alexandre Racine
gt; Cc: ocfs2-users@oss.oracle.com
gt; Subject: Re: [Ocfs2-users] huge something problem
gt; 
gt; Alexandre Racine wrote:
gt; gt; Excellent, that works great! Now that I have the locks and the
domain
gt; gt; name what should I do to unlock them? (Or fix the problem).
gt; gt;
gt; The locking and unlocking is handled by the dlm. I'm working
gt; on updating the wikis with more information on debugging such
gt; issues. For the time being, ping me with the info.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-05-27 Thread Luis Freitas
Alexandra,

nbsp;nbsp; You could use only CRS and ext3 instead of ocfs2 for this kind of 
use. You would need to register a script to force umount the filesystem on the 
primary node and mount it on the node you are failing over to, it would be nice 
to be able to check if the filesystem is mounted before atempting to mount it, 
but I am not sure on how to do this)

nbsp;nbsp; Are you using a cross-over cable for the private interconnect?

Regards,
Luis

--- On Fri, 6/27/08, [EMAIL PROTECTED] lt;[EMAIL PROTECTED]gt; wrote:
From: [EMAIL PROTECTED] lt;[EMAIL PROTECTED]gt;
Subject: [Ocfs2-users] CRS/CSS and OCFS2
To: ocfs2-users@oss.oracle.com
Date: Friday, June 27, 2008, 10:41 AM



Hello,



I refer to you hoping you may help me
with my problem... We have got an issur here and opened a SR at Metalink
but until now, we got no useful information in solving our problem. SR-Number
is 6855815.994...



We wanted to protect 9i Single-Instance
Databases with 10g Clusterware following the third-party-tool approach.
There are no RAC-databases involved. But we want to achieve high availability
as the databases are business critical systems. We want to make the systems
able to 
relocate
to another machine in case of failure to keep downtimes low... To achieve
this we want to use OCFS2 for the filesystem. Relocate is done by script
with help of CRS.



So we took two systems (byaz05 and byaz10)
and installed the following software: 10g CRS (10.2.0.3) and Oracle Software
9.2.0.8 and OCFS2 1.2.8



We found the following Metalinknotes
and adjusted the heartbeat and timeouts for OCFS2: Metalink
Note 395878.1: Heartbeat/Voting/Quorum Related Timeout Configuration for
Linux, OCFS2, RAC Stack to avoid unnessary node fencing, panic and reboot

Metalink Note 391771.1: OCFS2 - FREQUENTLY ASKED QUESTIONS (hier insbesondere
der Abschnitt zu Fencing und Quorum)

Metalink Note 434255.1: Common reasons for OCFS2 Kernel Panic or Reboot
Issues

Metalink Note 457423.1: OCFS2 Fencing, Network, and Disk Heartbeat Timeout
Configuration



We did no changes to the CRS/CSS default settings until
now.



During HA-testing we watched unexpected
behaviour of the system. We deactivated the bond for private interconnect
and expected only one node to go down. But we faced both nodes going down.
As it seems to me one node was rebooted from OCFS2 and the other one from
CRS/CSS.



Timestamp nbsp; nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp;

--

10:21:06 nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;bond1 disabled
(eth1) nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp;

/var/log/messages byaz05

Apr 25 10:21:06 byaz05 kernel:
bonding: bond1: link status definitely down for interface eth1, disabling
it

Apr 25 10:21:06 byaz05 kernel:
bonding: bond1: making interface eth5 the new active one.



10:21:09 nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;bond1 disabled
(eth5) nbsp; nbsp; nbsp; nbsp;

/var/log/messages byaz05

Apr 25 10:21:09 byaz05 kernel:
bonding: bond1: link status definitely down for interface eth5, disabling
it

Apr 25 10:21:09 byaz05 kernel:
bonding: bond1: now running without any active interface !



10:21:23
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;o2net
– no longer connected nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;
nbsp; nbsp;

/var/log/messages byaz05

Apr 25 10:21:23 byaz05 kernel:
o2net: no longer connected to node byaz10.bayer-ag.com (num 1) at 
10.190.59.6:

/var/log/messages byaz10

Apr 25 10:21:23 byaz10 kernel:
o2net: no longer connected to node byaz05.bayer-ag.com (num 0) at 
10.190.59.5:



10:21:27 nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;CSSD failure 134

10:21:29 nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;Reboot initiated
by CRS

/var/log/messages byaz05

Apr 25 10:21:27 byaz05 logger:
Oracle clsomon failed with fatal status 12.

Apr 25 10:21:27 byaz05 logger:
Oracle CSSD failure 134.

Apr 25 10:21:27 byaz05 su(pam_unix)[25839]:
session closed for user oracle

Apr 25 10:21:27 byaz05 logger:
Oracle CRS failure. nbsp;Rebooting for cluster integrity.

Apr 25 10:21:27 byaz05 kernel:
md: stopping all md devices.

Apr 25 10:21:27 byaz05 kernel:
md: md0 switched to read-only mode.

Apr 25 10:21:29 byaz05 logger:
Oracle CRS failure. nbsp;Rebooting for cluster integrity.

Apr 25 10:21:29 byaz05 kernel:
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex

Apr 25 10:21:29 byaz05 logger:
Oracle init script ceding reboot to sibling 27383.



10:21:58 nbsp; nbsp;
nbsp; nbsp; nbsp; nbsp; nbsp; nbsp;Reboot initiated
by OCFS2(?)

/var/log/messages byaz10

Apr 25 10:21:58 byaz10 su(pam_unix)[4595]:
session opened for user oracle by (uid=0)

Apr 25 10:21:58 byaz10 su(pam_unix)[4595]:
session closed for user oracle

Apr 25 10:25:58 byaz10 syslogd
1.4.1: restart.

Apr 25 10:25:58 byaz10 syslog:
syslogd startup succeeded

Apr 25 10:25:58 byaz10 kernel:
klogd 1.4.1, log 

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-05-27 Thread Luis Freitas
Hmm,

nbsp; There is a lazy umount:


nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; -lnbsp;nbsp;nbsp;nbsp; Lazy unmount. 
Detach the filesystem from the filesystem hierar-
nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; 
chynbsp; now,nbsp; and cleanup all references to the filesystem as soon
nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; 
as it is not busy anymore. This option allows a âbusyâ filesys-
nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; 
tem to be unmounted.nbsp; (Requires kernel 2.4.11 or later.)

nbsp; Not sure if this prevents writes on the filesystem after the umount 
completes, or if there is some way to fence the device, so these problems would 
need to be verified.

nbsp; This is the way any other cold failover HA solution works, they dont use 
cluster filesystems. If there is no way to force the filesystem to be umounted 
or fence the device, it should be possible to configure CRS to evict the 
primary node before the filesystem is mounted on the secondary node. (Or use 
some external fence device, like a SAN switch or power appliance).

nbsp; Btw, is OCFS2 officially supported with Oracle 9i single instance 
databases?

Regards,
Luis
--- On Tue, 5/27/08, Sunil Mushran lt;[EMAIL PROTECTED]gt; wrote:
From: Sunil Mushran lt;[EMAIL PROTECTED]gt;
Subject: Re: [Ocfs2-users] CRS/CSS and OCFS2
To: [EMAIL PROTECTED]
Cc: ocfs2-users@oss.oracle.com, [EMAIL PROTECTED]
Date: Tuesday, May 27, 2008, 5:07 PM

AFAIK:
a. There is no force umount in Linux.
b. There is no way to know whether a local fs is mounted on another node.

Luis Freitas wrote:
gt; Alexandra,
gt;
gt;You could use only CRS and ext3 instead of ocfs2 for this kind of 
gt; use. You would need to register a script to force umount the 
gt; filesystem on the primary node and mount it on the node you are 
gt; failing over to, it would be nice to be able to check if the 
gt; filesystem is mounted before atempting to mount it, but I am not sure 
gt; on how to do this)
gt;
gt;Are you using a cross-over cable for the private interconnect?
gt;
gt; Regards,
gt; Luis
gt;
gt; --- On *Fri, 6/27/08, [EMAIL PROTECTED] 
gt; /lt;[EMAIL PROTECTED]gt;/* wrote:
gt;
gt; From: [EMAIL PROTECTED]
lt;[EMAIL PROTECTED]gt;
gt; Subject: [Ocfs2-users] CRS/CSS and OCFS2
gt; To: ocfs2-users@oss.oracle.com
gt; Date: Friday, June 27, 2008, 10:41 AM
gt;
gt;
gt; Hello,
gt;
gt; I refer to you hoping you may help me with my problem... We have
gt; got an issur here and opened a SR at Metalink but until now, we
gt; got no useful information in solving our problem. SR-Number is
gt; 6855815.994...
gt;
gt; We wanted to protect 9i Single-Instance Databases with 10g
gt; Clusterware following the third-party-tool approach. There are no
gt; RAC-databases involved. But we want to achieve high availability
gt; as the databases are business critical systems. We want to make
gt; the systems able to
gt; relocate to another machine in case of failure to keep downtimes
gt; low... To achieve this we want to use OCFS2 for the filesystem.
gt; Relocate is done by script with help of CRS.
gt;
gt; So we took two systems (byaz05 and byaz10) and installed the
gt; following software: 10g CRS (10.2.0.3) and Oracle Software 9.2.0.8
gt; and OCFS2 1.2.8
gt;
gt; We found the following Metalinknotes and adjusted the heartbeat
gt; and timeouts for OCFS2: Metalink Note 395878.1:
gt; Heartbeat/Voting/Quorum Related Timeout Configuration for Linux,
gt; OCFS2, RAC Stack to avoid unnessary node fencing, panic and reboot
gt; Metalink Note 391771.1: OCFS2 - FREQUENTLY ASKED QUESTIONS (hier
gt; insbesondere der Abschnitt zu Fencing und Quorum)
gt; Metalink Note 434255.1: Common reasons for OCFS2 Kernel Panic or
gt; Reboot Issues
gt; Metalink Note 457423.1: OCFS2 Fencing, Network, and Disk Heartbeat
gt; Timeout Configuration
gt;
gt; We did no changes to the CRS/CSS default settings until now.
gt;
gt; During HA-testing we watched unexpected behaviour of the system.
gt; We deactivated the bond for private interconnect and expected only
gt; one node to go down. But we faced both nodes going down. As it
gt; seems to me one node was rebooted from OCFS2 and the other one
gt; from CRS/CSS.
gt;
gt; Timestamp
gt;
--
gt;
gt; 10:21:06bond1 disabled (eth1)
gt; */var/log/messages byaz05*
gt; Apr 25 10:21:06 byaz05 kernel: bonding: bond1: link status
gt; definitely down for interface eth1, disabling it
gt; Apr 25 10:21:06 byaz05 kernel: bonding: bond1: making interface
gt; eth5 the new active one.
gt;
gt; 10:21:09bond1 disabled (eth5)
gt; */var/log/messages byaz05*
gt; Apr 25 10:21:09 byaz05 kernel: bonding

Re: [Ocfs2-users] Node reboot during network outage

2008-04-23 Thread Luis Freitas
 If you expect one of the switches to remain alive you should configure your bonding driver timeout low to force it to failover soon to the other switch. What kind of bonding are you using? There are several different modes on Linux, for different types of switch configurations. I have used balance-tlb with good results, also active-backup could work too. The main problem I see is the time the switch takes to identify the different path. I had some problems with virtual IP failover due to the kernel not broadcasting something to the switch if forwarding is not enabled. Although virtual IPs are very different from network bonding you might want to try enabling ip_forward to see if the switches reconfigure faster. (echo 1  /proc/sys/net/ipv4/ip_forward).  Some people here have asked the O2CB to allow for multiple
 heartbeat interfaces, it would provide an alternative to this problem, as the switches would not need to reconfigure.Regards,Luis--- On Tue, 4/22/08, Brendan Beveridge [EMAIL PROTECTED] wrote:From: Brendan Beveridge [EMAIL PROTECTED]Subject: Re: [Ocfs2-users] Node reboot during network outageTo: Cc: "ocfs2-users@oss.oracle.com" ocfs2-users@oss.oracle.comDate: Tuesday, April 22, 2008, 8:06 PMSTP Shouldnt have anything to do with the nodes still seeing each other when the switch failsHave you checked that your bonding config is correct?iecat /proc/net/bonding/bond0and check that it fails over to the other eth when the switch goes downCheersBrendanSunil Mushran wrote: The issue is not the time the
 switch takes to reboot. The issue is the amount of time the secondary switch takes to find a unique path. http://en.wikipedia.org/wiki/Spanning_tree_protocol Mick Waters wrote:Thanks Sunil, The network switch is brand new but has a fairly complex configurationdue to us running a number of VLANs - however, we have found that it has alwaystaken quite a while to reboot. I'll try increasing the idle timeout as suggested and let you knowwhat happens.  However, surely this is only treating the symptoms of what is,after all, a contrived scenario.  Rebooting the switch is supposed to test whatwould happen if we had a real network outage.  What if the switch were to staydown? My issue is that we have an alternative route via the other NIC in thebond and the other switch.  The affected nodes in cluster
 shouldn't fencebecause they should still be able to see all of the other nodes in the clustervia this other route. Does this make sense? Regards, Mick. -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: 22 April 2008 17:40 To: Mick Waters Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Node reboot during network outage The interface died at 14:25:44 and recovered at 14:27:43. That's two minutes. One solution is to increase o2cb_idle_timeout to  2mins. Better solution would be to look into your router setup to determinewhy it is taking 2 minutes for the router to reconfigure. Mick Waters wrote:   
  Hi, my company is in the process of moving our web and database servers to new hardware.  We have a HP EVA 4100 SAN which is being used by two database servers running in an Oracle 10g cluster andthat works fine.  We have gone to extreme lengths to ensure high availability.  The SAN has twin disk arrays, twin controllers, andall servers have dual fibre interfaces.  Networking is (should be) similarly redundant with bonded NICs connected in two-switch configuration, two firewalls and so on. We also want to share regular Linux filesystems between ourservers - HP DL580 G5s running RedHat AS 5 (kernel 2.6.18-53.1.14.el5) andwe chose OCFS2 (1.2.8) to manage the cluster. As stated, each server in the 4 node cluster has a
 bondedinterface set up as bond0 in a two-switch configuration (each NIC in thebond is connected to a different switch).  Because this is a two-switch configuration, we are running the bond in active-standby mode andthis works just fine. Our problem occurred when we were doing failover testing where we simulated the loss of one of the network switches by powering itoff. The result was that the servers rebooted and this make a mockeryof our attempts at a HA solution. Here is a short section from /var/log/messages following a rebootof one of the switches to simulate an outage:--  Apr 22 14:25:44 mtkws01p1 kernel: bonding:
 bond0: backup interface eth0 is now down Apr 22 14:25:44 mtkws01p1 kernel: bnx2: eth0 NIC Link is Down Apr 22 14:26:13 mtkws01p1 kernel: o2net: connection to node mtkdb01p2 (num 1) at 10.1.3.50: has beenidle for 30.0 seconds, shutting it down. Apr 22 14:26:13 mtkws01p1 kernel: (0,12):o2net_idle_timer:1426here are some times that might help debug the situation: (tmr 1208870743.673433 now 1208870773.673192 dr 1208870743.673427 adv 1208870743.673433:1208870743.673434 func (97690d75:2) 1208870697.670758:1208870697.670760) Apr 22 14:26:13 mtkws01p1 kernel: o2net: no longer 

Re: [Ocfs2-users] AoE+ocfs2 = Heartbeat write timeout to device

2008-03-08 Thread Luis Freitas
Sunil,

Can I configure this heartbeat to use a high priority (realtime) 
schedulling?

 If I simply increase the timeout it still could timeout on heavy I/O 
situations, like several different threads queuing large amounts of writes. The 
kernel should know this is a high priority write so that it is put ahead of the 
queue.

Regards,
Luis

Sunil Mushran [EMAIL PROTECTED] wrote: The older 12 sec default timeout was 
too low. It has been bumped
up to 60 secs. The FAQ has details on this.

[EMAIL PROTECTED] wrote:
 Hi,

 I got a problem regarding 100Mbit Ethernet, AoE and ocfs2. I setup 2 boxes
 connected per 100Mbit ethernet to their Ata-over-Ethernet storage. The
 ocfs filesystem resides on such an AoE-Partition. If I produce high
 troughput to that ocfs-partition on one node, it reboots after some
 seconds.

 I use dd for testing, like dd if=/dev/zero of=test bs=1M count=1000
 If I write 100Mb of data to the disk everything is fine. If I write 1Gb of
 data to the disk, the node reboots after some seconds and prints the
 following error:

 (9,0):o2hb_write_timeout:167 ERROR: Heartbeat write timeout to device
 etherd/e402.0 after 12000 milliseconds
 (9,0):o2hb_stop_all_regions:1865 ERROR: stopping heartbeat on all active
 regions.

 This couldn't be caused by lost heartbeat packets. I setup a seperate
 network for heartbeat to track this problem.

 Actually I know that 100Mbit Ethernet is a bottleneck, but this should not
 cause the system to reboot, right? Even if I could switch to Gigbit
 Ethernet it may be the bottleneck in future..

 Someone experienced this already? Do you know how to solve this issue?
 Please help, I need to do some tests..
 Your help is really appreciated.

 Cheers,
 Holger


 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
   


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Never miss a thing.   Make Yahoo your homepage.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Possibility for networkless installation (for long distance shared storage)

2008-01-03 Thread Luis Freitas
Michael,

   You might be able to route IP over your FC network. I know that some unixes 
do this, but I am not sure on how to do it on Linux.

Regards,
Luis

Mark Fasheh [EMAIL PROTECTED] wrote: On Thu, Jan 03, 2008 at 08:47:37AM 
-0800, Michael M. wrote:
 As we have fiber channel, and we have fiber connections, sometimes direct
 p-to-p, where we could run it into our fiber channel switch, and share a disk
 array over hundreds of feet, to hundreds of miles, but where servers 
 wouldn’t
 be able to talk via fast enough network connection, does (or will) ocfs2
 support a mode where there is no network connection needed, and all voting, 
 and
 cluster “talking” can be done to the block device itself, including the
 heartbeat?

Ocfs (v1) used to do that, but we moved away from it for Ocfs2 because it
quickly became unusably slow once all the additional locking traffic for
general purpose usage was added.
 --Mark

--
Mark Fasheh
Principal Software Developer, Oracle
[EMAIL PROTECTED]

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Never miss a thing.   Make Yahoo your homepage.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Large difference between space used on reiserfs vs ocfs2

2007-12-25 Thread Luis Freitas

  If you have lots of small files, it could happen. The basic space allocation 
unit is one block. If I am not mistaken on OCFS2 the default is 4k.

  A quick search on google shows that Reiserfs can coalesce small files 
together  and save space.

  You could try to format your OCFS partition with a smaller block size, near 
the average file size.

   Other possible explanation for this (That is, without filesystem corruption) 
is a large file that was deleted but is still in use by some process. The space 
remains allocated until you kill the process.

Regards,
Luis

Michael M. [EMAIL PROTECTED] wrote:Dear list,
   
  I have all of my web files for my apache servers on multiple machines placed 
on ocfs2 volumes. I recently did an rsync to a reiserfs volume on an external 
usb harddrive, and df reports the following:
   
  (ocfs2 is on top, reiserfs on bottom)
   
  /dev/sde5218757056 176814592  41942464  81% /mnt/www
  /dev/sdd1732549604 117493932 615055672  17% /mnt/pba-web
   
  Is this normal?
   
  Thanks,
  Michael
  
  ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

   
-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Fwd: Returned mail: see transcript for details

2007-11-28 Thread Luis Freitas
Anjan,

   Yes, it should be like that. 

   Also see if you can follow Mark suggestion as he knows this stuff better 
than me. You could get a iSCSI server running with a third machine and 
openfiler and test to see if you get the same results. (Btw I never installed 
openfiler, last time I installed a test system like this I used NFS :-P, 
neither are supported of course)

Regards,
Luis

Anjan Chakraborty [EMAIL PROTECTED] wrote: Luis,
  Let me understanfd your suggestion. Are you suggesting:
  $ORA_CRS_HOME -- without datavolume,nointr
  $ORACLE_HOME   -- without datavolume,nointr
  But place OCR, Voting Disks  future datafiles with datavolume,nointr
   
  Did I understand correctly?
  I have also gone back to OCFS2 1.2.5-6.
  Okay, if this is the suggestion -- I will surely try.
  Thanks.
  Anjan

Luis Freitas [EMAIL PROTECTED] wrote:
  
  Anjan,

The two installs I did with x86_64 were with ASM, with kernel versions that 
today are very old, some years  ago. Here I am using x86 (even as the hardware 
is x86_64 capable) due to a managment decision not to go to x86_64. 

 Anyone else on the list?

 About the mount points, the CRS home is a ORACLE_HOME, so you shouldnt 
mount it with datavolume. If you want to keep the CRS ORACLE_HOME and the RDBMS 
ORACLE_HOME each on one filesystem you should have at least a third partition 
for the cluster registry, voting disks and datafiles, mounted with the 
datavolume option, and mount those two without it. 

If you need to get this working fast, I think you should try a older OCFS2 
module version. If not keep working with Mark and the other Oracle people here 
on the list as they can diagnose these errors and get it fixed.

What is the firewire disk you are using?

Regards,
Luis


Anjan Chakraborty [EMAIL PROTECTED]  wrote:  

Luis Freitas [EMAIL PROTECTED] wrote:  Luis,
As far as I know, it's not recommended to mount ORACLE_HOME with 
datavolume,nointr but that's not the case with CRS. So, I have different mount 
options for ORA_CRS_HOME (home/oracle/ocrvotcrs)  ORACLE_HOME 
(home/oracle/orasys). If I am wrong, please correct me. Also, I don't have any 
datafile yet because I don't have the basic thing -- Oracle binary under 
ORACLE_HOME. Once I have that, I will have datafiles  mount will contain 
datavolume,nointr.
Do you know anybody has successfully implemented OCFS2 1.2.7-1 on Red Hat4.5 
(kernel 2.6.9-55.EL) under 86_64 architecture? 
What exactly you have?
Thanks  for your continuous effort to help me.
I hope somebody else will also start joining this discussion  help me.
I am really stuck -- can't advance at all -- need help desperately.
Anjan


Anjan,

   Well it is only a sugestion.

   About those mount options, perhaps the Oracle people can give you a feedback 
on that also. But I noticed that you have two mount points, and seems to be 
installing CRS binaries on one, and RDBMS binaries on the other. So, where are 
datafiles? 

You shouldnt be installing the CRS home on /home/oracle/ocrvotcrs as it has 
the datavolume option. It is not clear if you are trying this.

The filesystems where you have binaries should not have the datavolume 
option, and the filesystems with datafiles, voting disks and the cluster 
registry need to have the datavolume option.

Regards,
Luis

Anjan Chakraborty  [EMAIL PROTECTED] wrote:Mark,
  Thanks to both of you for trying to help me.
  I have alreaddy communicated to Luis that I have to have OCFS2 because my 
software needs Clustering technology that is absent in EXT3. Moreover, both CRS 
 RDBMS homes should also be on shared/clustered system -- so, OCFS2 is the 
only choice.
  If you can help me to understand/resolve this issue, I will really appreciate 
that.
  Please note that my FireWire shared drive works perfectly when I use RAW but 
as soon as I am trying to use OCFS2, all the problems started.
  Thanks.
  Anjan 

Mark Fasheh [EMAIL PROTECTED] wrote:
  Luis,
Thanks  for helping here. I have one comment regarding moving the
shared home to ext3. If Anjan's setup is having issues like this, moving his
oracle home to a local disk would only hide problems which take longer to
reproduce for the crs and data files partitions. What needs to happen is
that the root cause of his problems is discovered and fixed.

Keep in mind, I'm not saying someone couldn't use ext3 for their oracle
install - it's an excellent choice for running a non-shared oracle home.
It's just not a good shared disk diagnostic tool ;)

Once again, thanks for stepping up and lending a hand.
--Mark

On Tue, Nov 27, 2007 at 08:50:28AM -0800, Luis Freitas wrote:
 Anjan,
 
 You dont need to share the database binaries, only the CRS and the
 datafiles. You can do it to save disk space, but it is not mandatory. The CRS
 and datafiles are much less stressfull to the filesystem structures as there 
 is
  a reduced number of large files, although they usually have a heavy i/o load,
 and stress the disk subsystem and the locking

Re: [Ocfs2-users] OCFS2 on CentOS 4.5 for CRS/RAC

2007-11-27 Thread Luis Freitas
Anjan,

   You dont need to share the database binaries, only the CRS and the 
datafiles. You can do it to save disk space, but it is not mandatory. The CRS 
and datafiles are much less stressfull to the filesystem structures as there is 
a reduced number of large files, although they usually have a heavy i/o load, 
and stress the disk subsystem and the locking algorithms.

So you can have two separated ext3 filesystems located at the same place on 
each server, and one or more ocfs2 shared filesystems for the CRS and the 
database datafiles. The Oracle installer takes care of copying the binaries 
between the servers during the installation.

It might be usefull to try a lower version like 1.2.6, as you are using the 
latest version available. I am using 1.2.4-2 here with RH 4.0 and kernel 
2.6.9-42 and it seems rather stable, only needed to increase the timeouts. (But 
I dont have the oracle_home shared.)

Also you might have a hardware problem somewhere on the SAN. And I still 
have to check those mount options you sent...

One detail. I dont know if the Centos distro includes the OCFS2 module. Are 
you using the modules downloaded from the oss.oracle.com site for the 
equivalent RH 4.0 kernel, or modules built by Centos? If using CENTOS modules 
you might get better results by changing to the Oracle built modules for the 
equivalent RH 4.0 kernel.

Regards,
Luis

Anjan Chakraborty [EMAIL PROTECTED] wrote: Luis,
  I am intending to use CRS/RAC that needs a Cluster File System. How does EXT3 
falls into that area?
  Thanks.
  Anjan

Thanks a lot for the response. Here is what I am doing:
1. 
mkfs.ocfs2 -b 4K -C 32K -N 4 -L ocrvotcrs /dev/sdb3   -- for CRS
mkfs.ocfs2 -b 4K -C 32K -N 4 -L orasys /dev/sdb4  -- for RDBMS
 
2. Then mounting using /etc/fstab:
/dev/sdb3 /home/oracle/ocrvotcrsocfs2_netdev,datavolume,nointr 
0 0
/dev/sdb4 /home/oracle/orasysocfs2_netdev 0 0
If you find anything wrong here, can you please tell what to do?
It's a non-production system  so I can experiment with whatever you 
suggest and won't held you responsible for that.
Thanks.
Anjan
 
 
 
Luis Freitas [EMAIL PROTECTED] wrote:
  Anjan,

  Are you installing the binaries on OCSF2 too? How are you mounting the 
filesystem?

  You might want to try using ext3 for the binaries and OCF2 only for datafiles 
and archives, until you get this fixed.

Regards,
Luis

Anjan Chakraborty [EMAIL PROTECTED] wrote:Hi,
  I sent an email to Mark Fisheh of Oracle Corp.  posted this issue at OTN 
under Linux thread this morning. I hope that someone among you might have 
experienced this and can help. On that basis, I am sending this to you too. I 
am stuck  will really appreciate if you can shed some light on this. 
  Thanks.
  Anjan
  
***
  I have a 2 node CentOS 4.5 86_64 system (kernel 2.6.9-55.EL). On this I 
installed Oracle OCFS2 1.2.7-1 (with exact kernel matching). After this I 
installed Oracle CRS 10.2.0.1 and that installation went fine. Then I tried to 
install Oracle RDBMS 10.2.0.1 and all the problems started from there. The 
/var/log/messages file got filled up with messages (giving some to avoid 
confusion):
ocfs2_read_locked_inode: .. : ERROR: Invalid  dinode #0 signature =
ocfs2_lookup: .. : ERROR: Unable to create inode 

Then OUI gave several error messages, e.g.
 Invalid stored block length on file ./em/em.war followed by I/O error 
in file
Errors in invoking to files ins_rdbms.mk and ins_ldap.mk

Then /var/log/messages gave:
OCFS2: ERROR (device ): ocfs2_extend_file: Dinode # .. has bad 
signature O' # I 
And the installation failed  CRS died. And the machines reboot.
  I ran fsck.ocfs2 -n /dev/, it came clean.
I have tested this several timnes  always same thing happening.
If I use RAW partitions, everything works fine. So, the problem may be in the 
OCFS2  OS/Oracle -- but, not sure how to bypass this.
I have to have OCFS2 -- can't use RAW for various reasons.
Can somebody please help me to resolve this?
Thanks. 
  
***
  
  
-
  Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See 
how.___
Ocfs2-users mailing  list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

-
  Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how.

   

-
Never miss a thing.   Make Yahoo your homepage.  

   
-
Get easy, one-click access to your favorites.  Make Yahoo! your homepage.___
Ocfs2-users mailing list
Ocfs2-users

Re: [Ocfs2-users] OCFS2 on CentOS 4.5 for CRS/RAC

2007-11-27 Thread Luis Freitas
Anjan,

   Well it is only a sugestion.

   About those mount options, perhaps the Oracle people can give you a feedback 
on that also. But I noticed that you have two mount points, and seems to be 
installing CRS binaries on one, and RDBMS binaries on the other. So, where are 
datafiles? 

You shouldnt be installing the CRS home on /home/oracle/ocrvotcrs as it has 
the datavolume option. It is not clear if you are trying this.

The filesystems where you have binaries should not have the datavolume 
option, and the filesystems with datafiles, voting disks and the cluster 
registry need to have the datavolume option.

Regards,
Luis

Anjan Chakraborty [EMAIL PROTECTED] wrote: Mark,
  Thanks to both of you for trying to help me.
  I have alreaddy communicated to Luis that I have to have OCFS2 because my 
software needs Clustering technology that is absent in EXT3. Moreover, both CRS 
 RDBMS homes should also be on shared/clustered system -- so, OCFS2 is the 
only choice.
  If you can help me to understand/resolve this issue, I will really appreciate 
that.
  Please note that my FireWire shared drive works perfectly when I use RAW but 
as soon as I am trying to use OCFS2, all the problems started.
  Thanks.
  Anjan 

Mark Fasheh [EMAIL PROTECTED] wrote:
  Luis,
Thanks for helping here. I have one comment regarding moving the
shared home to ext3. If Anjan's setup is having issues like this, moving his
oracle home to a local disk  would only hide problems which take longer to
reproduce for the crs and data files partitions. What needs to happen is
that the root cause of his problems is discovered and fixed.

Keep in mind, I'm not saying someone couldn't use ext3 for their oracle
install - it's an excellent choice for running a non-shared oracle home.
It's just not a good shared disk diagnostic tool ;)

Once again, thanks for stepping up and lending a hand.
--Mark

On Tue, Nov 27, 2007 at 08:50:28AM -0800, Luis Freitas wrote:
 Anjan,
 
 You dont need to share the database binaries, only the CRS and the
 datafiles. You can do it to save disk space, but it is not mandatory. The CRS
 and datafiles are much less stressfull to the filesystem structures as there 
 is
 a reduced number of large files, although they usually have a heavy i/o load,
 and stress the disk subsystem and the locking algorithms.
 
 So you  can have two separated ext3 filesystems located at the same place on
 each server, and one or more ocfs2 shared filesystems for the CRS and the
 database datafiles. The Oracle installer takes care of copying the binaries
 between the servers during the installation.
 
 It might be usefull to try a lower version like 1.2.6, as you are using the
 latest version available. I am using 1.2.4-2 here with RH 4.0 and kernel
 2.6.9-42 and it seems rather stable, only needed to increase the timeouts. 
 (But
 I dont have the oracle_home shared.)
 
 Also you might have a hardware problem somewhere on the SAN. And I still
 have to check those mount options you sent...
 
 One detail. I dont know if the Centos distro includes the OCFS2 module. Are
 you using the modules downloaded from the oss.oracle.com site for the
 equivalent RH 4.0 kernel, or modules built by Centos? If using CENTOS  modules
 you might get better results by changing to the Oracle built modules for the
 equivalent RH 4.0 kernel.
 
 Regards,
 Luis
 
 Anjan Chakraborty wrote:
 
 Luis,
 I am intending to use CRS/RAC that needs a Cluster File System. How does
 EXT3 falls into that area?
 Thanks.
 Anjan
 
 Thanks a lot for the response. Here is what I am doing:
 
 1.
 mkfs.ocfs2 -b 4K -C 32K -N 4 -L ocrvotcrs /dev/sdb3 -- for CRS
 mkfs.ocfs2 -b 4K -C 32K -N 4 -L orasys /dev/sdb4 -- for RDBMS
 
 2. Then mounting using /etc/fstab:
 /dev/sdb3 /home/oracle/ocrvotcrs ocfs2 _netdev,datavolume,nointr
 0 0
 /dev/sdb4 /home/oracle/orasys ocfs2 _netdev 0 0
 If you find anything wrong here, can you please tell what to do?
 It's a non-production system  so I can experiment with whatever  you
 suggest and won't held you responsible for that.
 Thanks.
 Anjan
 
 
 Luis Freitas wrote:
 
 Anjan,
 
 Are you installing the binaries on OCSF2 too? How are you mounting
 the filesystem?
 
 You might want to try using ext3 for the binaries and OCF2 only for
 datafiles and archives, until you get this fixed.
 
 Regards,
 Luis
 
 Anjan Chakraborty wrote:
 
 Hi,
 I sent an email to Mark Fisheh of Oracle Corp.  posted this issue
 at OTN under Linux thread this morning. I hope that someone among
 you might have experienced this and can help. On that basis, I am
 sending this to you too. I am stuck  will really appreciate if you
 can shed some light on this.
 Thanks.
 Anjan
  
 ***
 I have a 2 node CentOS 4.5 86_64 system (kernel 2.6.9-55.EL). On
 this I installed Oracle OCFS2 1.2.7-1 (with exact kernel matching).
 After this I installed Oracle CRS 10.2.0.1 and that installation

[Ocfs2-users] Fwd: Returned mail: see transcript for details

2007-11-27 Thread Luis Freitas

 Anjan,

The two installs I did with x86_64 were with ASM, with kernel versions that 
today are very old, some years ago. Here I am using x86 (even as the hardware 
is x86_64 capable) due to a managment decision not to go to x86_64. 

 Anyone else on the list?

 About the mount points, the CRS home is a ORACLE_HOME, so you shouldnt 
mount it with datavolume. If you want to keep the CRS ORACLE_HOME and the RDBMS 
ORACLE_HOME each on one filesystem you should have at least a third partition 
for the cluster registry, voting disks and datafiles, mounted with the 
datavolume option, and mount those two without it. 

If you need to get this working fast, I think you should try a older OCFS2 
module version. If not keep working with Mark and the other Oracle people here 
on the list as they can diagnose these errors and get it fixed.

What is the firewire  disk you are using?

Regards,
Luis


Anjan Chakraborty [EMAIL PROTECTED] wrote: 

Luis Freitas [EMAIL PROTECTED] wrote:Luis,
As far as I know, it's not recommended to mount ORACLE_HOME with 
datavolume,nointr but that's not the case with CRS. So, I have different mount 
options for ORA_CRS_HOME (home/oracle/ocrvotcrs)  ORACLE_HOME 
(home/oracle/orasys). If I am wrong, please correct me. Also, I don't have any 
datafile yet because I don't have the basic thing -- Oracle binary under 
ORACLE_HOME. Once I have that, I will have datafiles  mount will contain 
datavolume,nointr.
Do you know anybody has successfully implemented OCFS2 1.2.7-1 on Red Hat4.5 
(kernel  2.6.9-55.EL) under 86_64 architecture? 
What exactly you have?
Thanks for your continuous effort to help me.
I hope somebody else will also start joining this discussion  help me.
I am really stuck -- can't advance at all -- need  help desperately.
Anjan


Anjan,

   Well it is only a sugestion.

   About those mount options, perhaps the Oracle people can give you a feedback 
on that also. But I noticed that you have two mount points, and seems to be 
installing CRS binaries on one, and RDBMS binaries on the other. So, where are 
datafiles? 

You shouldnt be installing the CRS home on /home/oracle/ocrvotcrs as it has 
the datavolume option. It is not clear if you are trying this.

The filesystems where you have binaries should not have the datavolume 
option, and the filesystems with datafiles, voting disks and the cluster 
registry need to have the datavolume  option.

Regards,
Luis

Anjan Chakraborty [EMAIL PROTECTED] wrote: Mark,
  Thanks  to both of you for trying to help  me.
  I have alreaddy communicated to Luis that I have to have OCFS2 because my 
software needs Clustering technology that is absent in EXT3. Moreover, both CRS 
 RDBMS homes should also be on shared/clustered system -- so, OCFS2 is the 
only choice.
  If you can help me to understand/resolve this issue, I will really appreciate 
that.
  Please note that my FireWire shared drive works perfectly when I use RAW but 
as soon as I am trying to use OCFS2, all the problems started.
  Thanks.
  Anjan 

Mark Fasheh [EMAIL PROTECTED] wrote:
  Luis,
Thanks for helping here. I have one comment regarding moving the
shared home to ext3. If Anjan's setup is having issues like this, moving his
oracle home to a local disk  would only hide  problems which take longer  to
reproduce for the crs and data files partitions. What needs to happen is
that the root cause of his problems is discovered and fixed.

Keep in mind, I'm not saying someone couldn't use ext3 for their oracle
install - it's an excellent choice for running a non-shared oracle home.
It's just not a good shared disk diagnostic tool ;)

Once again, thanks for stepping up and lending a hand.
--Mark

On Tue, Nov 27, 2007 at 08:50:28AM -0800, Luis Freitas wrote:
 Anjan,
 
 You dont need to share the database binaries, only the CRS and the
 datafiles. You can do it to save disk space, but it is not mandatory. The CRS
 and datafiles  are much less stressfull to the filesystem structures as there 
 is
 a reduced number of large files, although they usually have a heavy i/o load,
 and stress the disk subsystem and the locking algorithms.
 
 So you  can have two  separated ext3 filesystems  located at the same place on
 each server, and one or more ocfs2 shared filesystems for the CRS and the
 database datafiles. The Oracle installer takes care of copying the binaries
 between the servers during the installation.
 
 It might be usefull to try a lower version like 1.2.6, as you are using the
 latest version available. I am using 1.2.4-2 here with RH 4.0 and kernel
 2.6.9-42 and it seems rather stable, only needed to increase the timeouts. 
 (But
 I dont have the oracle_home shared.)
 
 Also you might have a hardware problem somewhere on the SAN. And I still
 have to check those mount options you  sent...
 
 One detail. I dont know if the Centos distro includes the OCFS2 module. Are
 you using the modules downloaded from the oss.oracle.com site for the
 equivalent RH

Re: [Ocfs2-users] Cluster setup

2007-10-11 Thread Luis Freitas

 Well, I stand corrected, and do think he has a point. OCFS2 is heavilly 
tested on Oracle RAC related scenarios, large files, high I/O rates, etc. On 
other scenarios, dealing with ACLs, and multiple user ids, it might not be so 
throrougly tested. But I do see a large effort to resolve any issues that are 
reported.

 Also there is the point that it is designed for Oracle database use. So 
the fencing and reboots are intentional, and they happen fast. All nodes in 
Oracle RAC freeze when one of the nodes is unreachable, either by SAN or 
network failures, this is inevitable due to the architecture of the database 
itself, so it is highly desirable that the misbehaving node go down fast, so 
that the other nodes can start crash recovery of the failed node, during which 
the database is still hung, since the crashed node memory structures are lost 
and need to be recovered from the redo and rollback segments. 

One thing that I think could be improved is that there is no communication 
between the O2CB and CRS, so if the timeouts are not setup correctly, one can 
have a situation where O2CB resolves to fence one node, and CRS fences the 
other. This seems to not happen with vendor clusterware, that always have to be 
integrated with CRS to be supported with Oracle RAC.

Other thing that would be interesting, if technically feasible, would be to 
have a option for OCFS2 behave more like NFS, and instead of rebooting, fence 
all I/O to the filesystems on the evicted node, waiting to see if it can rejoin 
the cluster. When this situation occurs the fenced node would need to consider 
all the blocks on the buffer cache as invalid, so that other nodes could modify 
them safely. Not sure how filesystem locks could be handled.

Regards,
Luis

Sunil Mushran [EMAIL PROTECTED] wrote: Randy Ramsdell wrote:
 I am not taking sides but I think Alexei's postings are a positive 
 contribution to this project and more of a contribution than the 
 lurkers who write nothing to the list. His feedback does have merit 
 and should be considered valuable although it is critical of ocfs2. 
 We, although I strongly disagreed, have stopped using ocfs2 and we 
 WERE using it in a production environment.

The so-called lurkers file bugs, validate patches. That helps everybody.

His feedback is useless. Ranting about the core design does not help
anyone. Dreaming up possible bugs, without actually looking for them,
is useless.

When you encounter a bug, file a bugzilla with as much relevant
information as possible. If you do that, we can fix the bug and everyone
benefits. Sending emails like a broken record is not a positive 
contribution.

BTW, I hear great things about Vista.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel. ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Cluster setup

2007-10-10 Thread Luis Freitas
Alexei,
   
I do not agree on the heavily loaded part. Oracle run certification tests 
for their database, so the OCFS2 must have passed through this certification 
process, which must include high load scenarios. Last time I checked LVM2 was 
not supported for RAC use, so it probably was not tested.
   
I do agree on the part of OCFS2+LVM and OCFS2+LVM+NFS not being well 
tested. Also I suspect one cannot run a Active-Active NFS cluster without 
special NFS software. It would need to be Active-Passive.
   
  Regards,
  Luis

Alexei_Roudnev [EMAIL PROTECTED] wrote:
  Yes, it can be done.

Question is in reliability:
- OCFSv2 is not very stable when it is about millions of files;
- OCFSv2 cluster tend to self-fence after a small SAN storage glitches (it 
is by design so you can't heal it even if
you fix all timeouts - just to improve);
- OCFSv2 + LVM +_ NFS is not well tested territory.

It should work - in theory, IT works practically, under average load and FS 
size. No one knows how it behaves on a very big storage and a very big file 
systems in 1 - 2 years of active usage. I manage ti get stable OCFSv2 system 
here, after applying few pathces and discovering few issues, BUT
I use it on lightly-loaded file system (which is not critical at all)to get 
more statistics on behavior before I wil use it for anything else.

If comparing with heartbeat + LVM + reiserfs + NFS:
- all technologies in stack are well tested and heavily used;
- heartbeat have external fencing (stonith) so it is extremely reliable in 
the long term - it can recover from almost any failure (sometimes it dont 
feel failure, it's true);
- ReiserFS (or ext3) proved to be very stable on a huge file systems (it is 
widely used, so we dont expect any problems here).
One problem comes from Novell - since they stoped using it as a default, I 
can';t trust to ReiserFS on SLES10 (because it is not default) but we stil 
can trust into it on SLES9 etc... (where it is default).

Common rule - if you want reliable system, use defaults where possible. 
OCFSv2 + NFS is not default yet (through OCFSv2 improved dramatically during 
last 2 years).

- Original Message - 
From: Pavel Georgiev 

To: 
Sent: Wednesday, October 10, 2007 1:25 AM
Subject: Re: [Ocfs2-users] Cluster setup


 How about using just OCFSv2 as I described in my first mail - two servers
 export their storage, the rest of the servers mount it and a failure of 
 any
 of the two storage servers remains transparent to the clients. Can this be
 done with OCFSv2?


 On Tuesday 09 October 2007 21:46:15 Alexei_Roudnev wrote:
 You better use

 LVM + heartbeat + NFS + cold failover cluster.

 It works 100% stable and is 100% safe from the bugs (and it allows online
 resizing, if your HBA or iSCSI can add lun's on the fly).

 Combining NFS + LVM + OCFSv2 can cause many unpredictable problems, esp. 
 on
 the unusual (for OCFSv2) system (such as Ubuntu).

 - Original Message -
 From: Brian Anderson 
 To: 
 Sent: Tuesday, October 09, 2007 11:35 AM
 Subject: RE: [Ocfs2-users] Cluster setup



 Not exactly. I'm in a similar boat right now. I have 3 NFS servers all
 mounting an OCFS2 volume. Each NFS server has its own IP, and the
 clients load balance manually... some mount fs1, others fs2, and the
 rest fs3. In an ideal world, I'd have the NFS cluster presenting a
 single IP, and failing over / load balancing some other way.

 I'm looking at NFS v4 as one potential avenue (no single IP, but it does
 let you fail over from 1 server to the next in line), and commercial
 products such as IBRIX.




 Brian

  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Sunil Mushran
  Sent: Tuesday, October 09, 2007 2:27 PM
  To: Luis Freitas
  Cc: ocfs2-users@oss.oracle.com
  Subject: Re: [Ocfs2-users] Cluster setup
 
  Unsure what you mean. If the two servers mount the same
  ocfs2 volume and export them via nfs, isn't that clustered nfs?
 
  Luis Freitas wrote:
   Is there any cluster NFS solution out there? (Two NFS
 
  servers sharing
 
   the same filesystem with distributed locking and failover
 
  capability)
 
   Regards,
   Luis
  
   */Sunil Mushran /* wrote:
  
   Appears what you are looking for is a mix of ocfs2 and nfs.
   The storage servers mount the shared disks and the reexport
   them via nfs to the remaining servers.
  
   ubuntu 6.06 is too old. If you are stuck on Ubuntu LTS, the
   next version 7.10 should have all you want.
  
   Pavel Georgiev wrote:
Hi List,
   
I`m trying to build a cluster storage with commodity
 
  hardware in
 
   a way that
  
the all the data would be on  1 server. It should
 
  have the meet
 
   the
  
following requirements:
1) If one of the servers goes down, the cluster
 
  should continue
 
   to work with
  
rw access from all clients.
2) Clients that mount the storage should not be part
 
  of cluster
 
   (not export
  
any disk storage) - I have few servers

Re: [Ocfs2-users] Re: OCF2 and LVM

2007-10-09 Thread Luis Freitas


   In OCFSv1 this could only be done with the oracle patched binutils, but it 
should be safe to backup OCFSv2 files directly.

   Any input from the developers?

Regards,
Luis

Shivaprasad Kambalimath [EMAIL PROTECTED] wrote: 

We use EMC's BCV for OCFS2 split mirror and it works just fine (10.2.0.3
and RH Linux 4). The only problem is we can't backup to tape directly
from ocfs2 (we are using legato and I haven't heard of any other third
party s/w doing this without rman). So, we copy files to ext3, zip them
and then take tape backup.

Thanks, 
Shiva Kambalimath

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, October 08, 2007 12:00 PM
To: ocfs2-users@oss.oracle.com
Subject: Ocfs2-users Digest, Vol 46, Issue 7

Send Ocfs2-users mailing list submissions to
 ocfs2-users@oss.oracle.com

To subscribe or unsubscribe via the World Wide Web, visit
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
or, via email, send a message with subject or body 'help' to
 [EMAIL PROTECTED]

You can reach the person managing the list at
 [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than Re: Contents of Ocfs2-users digest...


Today's Topics:

   1. OCF2 and LVM (Riccardo Paganini)
   2. Re: OCF2 and LVM (Jordi Prats)
   3. Re: OCF2 and LVM (Alexei_Roudnev)


--

Message: 1
Date: Mon, 08 Oct 2007 12:35:54 +0200
From: Riccardo Paganini 
Subject: [Ocfs2-users] OCF2 and LVM
To: ocfs2-users@oss.oracle.com
Message-ID: 
Content-Type: text/plain;charset=iso-8859-1;format=flowed

Does anybody knows if is there a certified procedure in to 
backup a RAC DB 10.2.0.3 based on OCFS2 ,
via split mirror or snaphots technology ?


Using Linux LVM and OCFS2, does anybody knows if is 
possible to dinamically extend an OCFS filesystem,
once the underlying LVM Volume has been extended ?

Thanks in advance
Riccardo Paganini



--

Message: 2
Date: Mon, 08 Oct 2007 20:20:08 +0200
From: Jordi Prats 
Subject: Re: [Ocfs2-users] OCF2 and LVM
To: Riccardo Paganini 
Cc: ocfs2-users@oss.oracle.com
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
I'm not an expert on oracle, but I don't think using a snapshot is a 
recommended backup solution. You should use data pump or exp utilities 
to backup your database.

To extend your fs, tune2fs.ocfs2 requires you to umount it, according to

it's man page:

==
tunefs.ocfs2 is used to adjust OCFS2 file system parameters on  disk. 
In order to prevent data  loss,  tunefs.ocfs2  will not  perform any 
action on the specified device if it is mounted on any node in the 
cluster.  This tool requires the O2CB cluster to be online.
==

The concrete command should be: tune2fs.ocfs2 -S /dev/LVM_volume

I don't know if there's any way to extend it without umounting it.

If you want to extend your fs without umounting it you should use ASM 
instead of OCFS2.

regards,
Jordi

Riccardo Paganini wrote:
 Does anybody knows if is there a certified procedure in to backup a
RAC 
 DB 10.2.0.3 based on OCFS2 ,
 via split mirror or snaphots technology ?
 
 
 Using Linux LVM and OCFS2, does anybody knows if is possible to 
 dinamically extend an OCFS filesystem,
 once the underlying LVM Volume has been extended ?
 
 Thanks in advance
 Riccardo Paganini
 
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 
 



--

Message: 3
Date: Mon, 8 Oct 2007 11:49:29 -0700
From: Alexei_Roudnev 
Subject: Re: [Ocfs2-users] OCF2 and LVM
To: Jordi Prats , Riccardo Paganini
 
Cc: ocfs2-users@oss.oracle.com
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; format=flowed; charset=iso-8859-1;
 reply-type=response

I dont knoiw any lvm technology which can make a snapshort in clustered 
environment, so better forget this idea.

Recommended method to do oracle backups is rman. Any violation of this 
brings you into the swamp of possible oracle bugs.


- Original Message - 
From: Jordi Prats 
To: Riccardo Paganini 
Cc: 
Sent: Monday, October 08, 2007 11:20 AM
Subject: Re: [Ocfs2-users] OCF2 and LVM


 Hi,
 I'm not an expert on oracle, but I don't think using a snapshot is a 
 recommended backup solution. You should use data pump or exp utilities
to 
 backup your database.

 To extend your fs, tune2fs.ocfs2 requires you to umount it, according
to 
 it's man page:

 ==
 tunefs.ocfs2 is used to adjust OCFS2 file system parameters on  disk.
In 
 order to prevent data  loss,  tunefs.ocfs2  will not  perform any
action 
 on the specified device if it is mounted on any node in the cluster.
This 
 tool requires the O2CB cluster to be online.
 ==

 The concrete command should be: tune2fs.ocfs2 -S /dev/LVM_volume

 I don't know if there's any way to extend it without 

RE: [Ocfs2-users] OCFS2 mount problem at Linux reboot when device names are non persistent.

2007-08-27 Thread Luis Freitas
 
  Anyone got this working on emc powerpath?
   
  I remeber seeing somewhere some special configuration to get mounting by 
label working with multipath devices, but could not find it again.
   
  Regards,
  Luis

Ricardo Fernandez [EMAIL PROTECTED] wrote:
  Hi people, 

Thanks a lot for your answer!
It worked perfectly.

Best regards
Ricardo



-Original Message-
From: Sunil Mushran [mailto:[EMAIL PROTECTED] 
Sent: viernes, 24 de agosto de 2007 14:26
To: Randy Ramsdell
Cc: ocfs2-users@oss.oracle.com; Ricardo Fernandez
Subject: Re: [Ocfs2-users] OCFS2 mount problem at Linux reboot when
device names are non persistent.

While mount-by-uuid will work, mount-by-label should also work.
The one gotcha in the latter is that it expects the device to be
partitioned. As in, it will not mount-by-label if the device is /dev/sda
but will if the device is /dev/sda1 or sda2, etc.

Randy Ramsdell wrote:
 Ricardo Fernandez wrote:
 
 Hi,
 
 I have the following problem when the servers accessing OCFS2 reboot:

 as the Linux device names are non persistent, at reboot they usually 
 change, and then OCFS2 can't mount the device because it is expecting

 a different device name as stated in the fstab file. (it is specified

 in the format /dev/sdx as the instructions of the OCFS2 installation 
 manual say) If I change the device name to the new name, it works 
 fine. But this is not an acceptable solution, as each node should be 
 able to start in a fully automatic way. (without human intervention)
 
 I thought that the purpose of the disk LABEL that I added when 
 formatting the partition with OCFS2 was exactly this. (Am I right?) I

 changed the fstab to use the LABEL option, and also try to mount it 
 from the command line using the LABEL option but it didn't work. Is 
 there any bug or known issue on this topic. I guess that if I glue 
 the device name with udev it will work, but I really would expect 
 OCF2 to solve this problem (because it is not a new one, and most of 
 the file systems I know can handle it)
 
 I would appreciate any help on this topic.
 
 
 Thanks a lot
 Ricardo
 
 I work with: 
 
 RHEL 4
 Local SCSI devices
 External devices locates in an EVA8000 SAN, accessed through a fibre
channel bus. The OCFS2 file system is on one of these.
 

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 
 
 Do not use label use UUID name and _netdev_ fstab option.
 This is the UUID of a volume we have.
 /dev/disk/by-uuid/be12775a-ec1c-4ed7-a06b-f30a081a0603

 UUID's are unique and never change so they are ideal for what you are 
 describing.

 Randy Ramsdell

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Need a vacation? Get great deals to amazing places on Yahoo! Travel. ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Urgent :: 11i on OCFS2.. I mean APPL_TOP, COMMON_TOP etc..

2007-08-24 Thread Luis Freitas
 
  Anyone have coments on the shared mmap?
   
  What could happen if a file is mmaped by say, 3000 different processes in a 4 
node cluster, and someone went there and changed its contents? This is a 
possible situation on a large production APPS environment, or any forms based 
application.
   
  Regards,
  Luis
  
Alexei_Roudnev [EMAIL PROTECTED] wrote:
  It's really a good question. We should know _which exactly 
combinations are used internally_. (OS, File System, NAS/SAN).
   
  In reality therse particular combinations are most stable and preffered ones 
- no matter what is _officially certified_.
   
- Original Message - 
  From: Luis Freitas 
  To: [EMAIL PROTECTED] ; ocfs2-users@oss.oracle.com 
  Sent: Wednesday, August 22, 2007 6:46 PM
  Subject: Re: [Ocfs2-users] Urgent :: 11i on OCFS2.. I mean 
APPL_TOP,COMMON_TOP etc..
  

  Prabhakar,
   
   This post is only my personal opinion. I do not work for Oracle nor have 
any close contact with the support and development groups. But I never heard of 
anyone using this combination.
   
   This is kind of a gray area, since Oracle usually provides support for 
OCFS2 in conjunction with RAC database. The APPS people is a different group 
internally at Oracle, so even if this is officially supported you could get 
into situations where the support analysts start to redirect the TAR between 
these groups, or could argue that since the issues are not related to Oracle 
Database or Oracle database binaries they would not fix it.
   
  I would not do it unless with a very large customer that could escalate 
any issues to get a fast resolution with support. It probably is better to get 
a NAS storage and map it using NFS.
   
 That said since it is supported for the database binaries, there should 
not be immediate problems with other files. The FAQ about shared APPL_TOP on 
metalink says that any disk sharing technology can be used.
   
  One thing that comes to my attention is that Oracle Forms uses memory 
mapped files extensively. I dont know how completelly this is implemented on 
OCFS2 and if it works well.  Anyone can comment on this? There are some posts 
on this list about problems with shared mmap.
   
  Regards,
  Luis
  

[EMAIL PROTECTED] wrote:
  All,

Is Shared APPL_TOP certified with 11i on OCFS2? Please send me the information 
as soon as possible..

We have two RAC nodes on ASM file system and one middle tier right now. Now 
want to setup Shared APPL_TOP so need your help whether this is a certied with 
OCFS2 or not.

OS is SUSE 9 SP3.

Regards,
Prabhakar Tammina.
(614)598-3487(cell).


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


-
  Got a little couch potato? 
Check out fun summer activities for kids. 
-

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

   
-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Using OCFS2 for Shared APPL_TOP

2007-08-24 Thread Luis Freitas
 
  I cant read Russian... Does it work???
   
  Regards,
  Luis

Alexei_Roudnev [EMAIL PROTECTED] wrote:
  There is discussion about it on www.sql.ru, but it is in Russian.


- Original Message - 
From: Santosh Udupa 
To: 
Sent: Thursday, August 23, 2007 12:04 PM
Subject: [Ocfs2-users] Using OCFS2 for Shared APPL_TOP



Hello all,

Has anybody seen any issue with using OCFS2 for shared APPL_TOP in 
E-Business Suite 11i and Release 12?

Thanks,
Santosh

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, 
please notify the sender by e-mail and delete the original message. Further, 
you are not to copy, disclose, or distribute this e-mail or its contents to 
any other person and any such actions are unlawful. This e-mail may contain 
viruses. Infosys has taken every reasonable precaution to minimize this 
risk, but is not liable for any damage you may sustain as a result of any 
virus in this e-mail. You should carry out your own virus checks before 
opening the e-mail or attachment. Infosys reserves the right to monitor and 
review the content of all messages sent to or from this e-mail address. 
Messages sent to or from this e-mail address may be stored on the Infosys 
e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] cluvfy fails in pre crs install

2007-08-22 Thread Luis Freitas
Harry,
   
   Do you have DISPLAY set on the root shell? Try to run xclock and see if 
it appears correctly.
   
   I am not sure about this but I think that you need to load the 
environment variables that you have defined on you oracle account when running 
these tools as root. Usually I logon with the oracle account and do a su 
without the - parameter so the environment is not overwriten.
   
  Also when you are with the root user, check if you have any java 
executable on the path that is not from the JDK bundled in ORACLE_HOME/jdk. 
This could be either GNU gcj or IBM Jdk, and can cause problems with the 
graphical tools if they are used by mistake.
   
  which java
  java -version
   
  Regards,
  Luis
  
Harry Ronis [EMAIL PROTECTED] wrote:
body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}  Did this,blew away clusterware as per metalink 
note, reinstalled clustrware this time properly filling in two node cluster and 
got clusterware installled on both. Thks.
$ ssh hoffman /apps/crs/oracle/product/10/app/bin/olsnodes -n
hackman 1
hoffman 2
[EMAIL PROTECTED] /apps/crs/oracle/product/10/app/bin
$ ./olsnodes -n
hackman 1
hoffman  2

Now this: Need to manually run vipca as ROOT to config virtual ips. in runing 
vip this is what happens:
AS ROOT
[EMAIL PROTECTED] bin]# ./vipca
Exception in thread main [EMAIL PROTECTED] bin]#

[EMAIL PROTECTED] bin]# ./vipca
Exception in thread main [EMAIL PROTECTED] bin]#

I believe this is the LAST HUMP in clusterware


  -Original Message- 
From: Luis Freitas 
Sent: Aug 21, 2007 3:35 PM 
To: Harry Ronis , [EMAIL PROTECTED] 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

  Harry,
   
  From this it looks like you forgot to fill the second node details on the 
installer. There is a screen on the begining of the installation where the 
installer asks for all the nodes hostnames, public node interfaces and private 
node interfaces, and it brings only the first node details as default. This 
root.sh and the rest of the installation get the node names from that screen.
   
  But the only way to confirm this would be to look at the installer logs.
   
   You probably can add the second node but it may be easier to cleanup and 
reinstall. There is a note with a procedure to manually remove CRS on metalink, 
you need to stop the services, restore inittab and remove some rc.d scripts.
   
  Regards,
  Luis

Harry Ronis [EMAIL PROTECTED] wrote:
  Why is it only seeing 1 node when it run root.sh   second node empty


  -Original Message- 
From: Harry Ronis 
Sent: Aug 21, 2007 1:03 PM 
To: Luis Freitas , [EMAIL PROTECTED] 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

running install --asking to run root.sh from hackman only
Didn't populated anything on Hoffman yet  -- is this normal ??

Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node nodenumber: nodename private interconnect name hostname
node 1: hackman hackman-priv hackman
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /ocfs2/crs/vdisk1
Now formatting voting device: /ocfs2/crs/vdisk2
Now formatting voting device: /ocfs2/crs/vdisk3
Format of 3 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
hackman
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), eth0 is not public. Public interfaces should be used 
to configure virtual IPs.


  -Original Message- 
From: Luis Freitas 
Sent: Aug 21, 2007 11:25 AM 
To: Harry Ronis , [EMAIL PROTECTED] 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

  Harry,
   
 This tool fails here too if I put these -c and -q parameters for the 
storage checks. Even so RAC is installed and running. I would not pay much 
attention to this at this point. 
   
  Also as far as I know these parameters are originaly intended to work 
with raw devices. It is a generic tool that run in all

Re: [Ocfs2-users] cluvfy fails in pre crs install

2007-08-21 Thread Luis Freitas
Harry,
   
 This tool fails here too if I put these -c and -q parameters for the 
storage checks. Even so RAC is installed and running. I would not pay much 
attention to this at this point. 
   
  Also as far as I know these parameters are originaly intended to work 
with raw devices. It is a generic tool that run in all platforms and probably 
the developer did not pay much attention to OCFS2, which is specific on the 
Linux platform.
   
  That error on the VIP checking also is documented and you can ignore it 
at this point. But you probably will have a error in the end of the CRS install 
and in this case there is manual procedure that need to be done with vipca 
before you start to install the database.
   
  touch /ocfs2/ocr2 
  touch /ocfs2/vdisk1
   
  Regards,
  Luis

Harry Ronis [EMAIL PROTECTED] wrote:
body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}  Am at a LOSS
Shared storage seems ok -- yet cluvfy for ANY shared storage commands FAIL ala
[EMAIL PROTECTED] /apps/oracle/stage/clusterware/cluvfy
$ ./runcluvfy.sh comp ssa  -n hackman,hoffman -verbose

Verifying shared storage accessibility

Checking shared storage accessibility...


Shared storage check failed on nodes hoffman,hackman.

Verification of shared storage accessibility was unsuccessful on all the nodes.
[EMAIL PROTECTED] /apps/oracle/stage/clusterware/cluvfy


  -Original Message- 
From: Marcos E. Matsunaga 
Sent: Aug 21, 2007 8:26 AM 
To: hr 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

Try to create the ocr/voting under a directory on /ocfs2. If I'm not wrong, 
there was a bug on CRS install identifying ocr/voting disks directly under the 
mountpoint.

Also, make sure both nodes can mount the same partition. Create a file on one 
node and see if you can remove it from the other node.

Regards,Marcos Eduardo MatsunagaOracle USA  Linux Engineering  


hr wrote: /runcluvfy.sh stage -pre crsinst -n hackman,hoffman -c 
/ocfs2/ocr2 -q 
/ocfs2/vdisk1 -verbose
   
  output
   
  Performing pre-checks for cluster services setup

Checking node reachability...

Check: Node reachability from node hackman
Destination Node Reachable?
 
hackman yes
hoffman yes
Result: Node reachability check passed from node hackman.


Checking user equivalence...

Check: User equivalence for user oracle
Node Name Comment
 
hoffman passed
hackman passed
Result: User equivalence check passed for user oracle.

Checking administrative privileges...

Check: Existence of user oracle
Node Name User Exists Comment
  
hoffman yes passed
hackman yes passed
Result: User existence check passed for oracle.

Ch

Administrative privileges check failed.

Checking node connectivity...


Suitable interfaces for the private interconnect on subnet 10.32.250.0:
hoffman eth0:10.32.250.175
hackman eth0:10.32.250.155

Suitable interfaces for the private interconnect on subnet 192.168.1.0:
hoffman eth1:192.168.1.2
hackman eth1:192.168.1.1

ERROR:
Could not find a suitable set of interfaces for VIPs.

Result: Node connectivity check failed.


Checking shared storage accessibility...

ERROR: /ocfs2/ocr2
Could not find the storage


Shared storage check failed on nodes hoffman,hackman.

Checking shared storage accessibility...

ERROR: /ocfs2/vdisk1
Could not find the storage


Shared storage check failed on nodes hoffman,hackman.

Checking system requirements for 'crs'...
No checks registered for this product.



-
  ___  Ocfs2-users mailing list  
Ocfs2-users@oss.oracle.com  http://oss.oracle.com/mailman/listinfo/ocfs2-users

  Harry Roniscell  646 529 8853Home  718 224 9793work   212 465v 
2826
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

   
-
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] cluvfy fails in pre crs install

2007-08-21 Thread Luis Freitas
Harry,
   
  From this it looks like you forgot to fill the second node details on the 
installer. There is a screen on the begining of the installation where the 
installer asks for all the nodes hostnames, public node interfaces and private 
node interfaces, and it brings only the first node details as default. This 
root.sh and the rest of the installation get the node names from that screen.
   
  But the only way to confirm this would be to look at the installer logs.
   
   You probably can add the second node but it may be easier to cleanup and 
reinstall. There is a note with a procedure to manually remove CRS on metalink, 
you need to stop the services, restore inittab and remove some rc.d scripts.
   
  Regards,
  Luis

Harry Ronis [EMAIL PROTECTED] wrote:
body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}body{font-family: 
Geneva,Arial,Helvetica,sans-serif;font-size:9pt;background-color: 
#ff;color: black;}  Why is it only seeing 1 node when it run root.sh  
 second node empty


  -Original Message- 
From: Harry Ronis 
Sent: Aug 21, 2007 1:03 PM 
To: Luis Freitas , [EMAIL PROTECTED] 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

running install --asking to run root.sh from hackman only
Didn't populated anything on Hoffman yet  -- is this normal ??

Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node nodenumber: nodename private interconnect name hostname
node 1: hackman hackman-priv hackman
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /ocfs2/crs/vdisk1
Now formatting voting device: /ocfs2/crs/vdisk2
Now formatting voting device: /ocfs2/crs/vdisk3
Format of 3 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
hackman
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), eth0 is not public. Public interfaces should be used 
to configure virtual IPs.


  -Original Message- 
From: Luis Freitas 
Sent: Aug 21, 2007 11:25 AM 
To: Harry Ronis , [EMAIL PROTECTED] 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

  Harry,
   
 This tool fails here too if I put these -c and -q parameters for the 
storage checks. Even so RAC is installed and running. I would not pay much 
attention to this at this point. 
   
  Also as far as I know these parameters are originaly intended to work 
with raw devices. It is a generic tool that run in all platforms and probably 
the developer did not pay much attention to OCFS2, which is specific on the 
Linux platform.
   
  That error on the VIP checking also is documented and you can ignore it 
at this point. But you probably will have a error in the end of the CRS install 
and in this case there is manual procedure that need to be done with vipca 
before you start to install the database.
   
  touch /ocfs2/ocr2 
  touch /ocfs2/vdisk1
   
  Regards,
  Luis

Harry Ronis [EMAIL PROTECTED] wrote:
  Am at a LOSS
Shared storage seems ok -- yet cluvfy for ANY shared storage commands FAIL ala
[EMAIL PROTECTED] /apps/oracle/stage/clusterware/cluvfy
$ ./runcluvfy.sh comp ssa  -n hackman,hoffman -verbose

Verifying shared storage accessibility

Checking shared storage accessibility...


Shared storage check failed on nodes hoffman,hackman.

Verification of shared storage accessibility was unsuccessful on all the nodes.
[EMAIL PROTECTED] /apps/oracle/stage/clusterware/cluvfy


  -Original Message- 
From: Marcos E. Matsunaga 
Sent: Aug 21, 2007 8:26 AM 
To: hr 
Cc: ocfs2-users@oss.oracle.com 
Subject: Re: [Ocfs2-users] cluvfy fails in pre crs install 

Try to create the ocr/voting under a directory on /ocfs2. If I'm not wrong, 
there was a bug on CRS install identifying ocr/voting disks directly under the 
mountpoint.

Also, make sure both nodes can mount the same partition. Create a file on one 
node and see if you can remove it from the other node.

Regards,Marcos Eduardo MatsunagaOracle USA  Linux Engineering  


hr wrote: /runcluvfy.sh stage -pre crsinst -n hackman,hoffman -c 
/ocfs2/ocr2 -q 
/ocfs2/vdisk1 -verbose
   
  output
   
  Performing pre-checks for cluster services setup

Checking node reachability

Re: [Ocfs2-users] Replication not works

2007-08-14 Thread Luis Freitas
Yohan
   
 The device you are using for OCFS2 need to be shared between nodes, either 
a shared device on a external storage or a mirrored device using DRDB or iSCSI. 
For Oracle database use on a certified configuration you need to use a external 
storage, as drdb is not a officially supported option for Oracle Database.
   
  The OCFS does not replicate the data, it only manages the filesystem so 
that it can be mounted simultaneosly on both nodes.
   
  Regards,
  Luis

Yohan [EMAIL PROTECTED] wrote:
  Dr J Pelan a écrit :
 On Fri, 10 Aug 2007, Yohan wrote:
 
 [EMAIL PROTECTED]:~# mounted.ocfs2 -f
 Device FS Nodes
 /dev/sda3 ocfs2 t1

 [EMAIL PROTECTED]:~# mounted.ocfs2 -f
 Device FS Nodes
 /dev/sda3 ocfs2 t2
 
 Tell us about /dev/sda.
/dev/sda is the same on the two nodes

Disk /dev/sda: 73.4 GB
Device Boot Start End Blocks Id System
/dev/sda1 * 1 2432 19535008+ 83 Linux
/dev/sda2 2433 2675 1951897+ 82 Linux swap / Solaris
/dev/sda3 2676 2800 1004062+ 83 Linux

did i make a mistake here ?

 The use of the word 'replication' is interesting.
 
My 1st goal is to have clusters of 3 nodes. Each node need to be the 
mirror of the other one. All of them are read/write accessed.
It's not what ocfs2 does ?


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Looking for a deal? Find great prices on flights and hotels with Yahoo! 
FareChase.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Kernel panic with OCFS2 1.2.6 for EL5

2007-07-05 Thread Luis Freitas
 
  I always get these annoying messages with EMC storages.
   
 They seem to have a unusable lun that appears for the operating system and 
the kernel keeps trying to access it.
   
 On RH 4.0 these messages appears on dmesg, I remember that RH 3.0 showed 
them on /var/log/messages also:
   
  SCSI error : 0 0 0 0 return code = 0x2
SCSI error : 0 0 0 0 return code = 0x2
SCSI error : 1 0 0 1 return code = 0x2
SCSI error : 0 0 0 0 return code = 0x2
SCSI error : 0 0 0 0 return code = 0x2
SCSI error : 1 0 0 0 return code = 0x2

 Anyone knows how to prevent the kernel from trying to access this lun?
   
  Regards,
  Luis Freitas

Daniel [EMAIL PROTECTED] wrote:
  Hello

System: Two brand new Dell 1950 servers with dual Intel Quadcore Xeon connected 
to an EMC CX3-20 SAN. Running CentOS 5 x86_64 - both with kernel 
2.6.18-8.1.6-el5 x86_64.

I just noticed a panic on one of the servers: 

Jul  2 04:08:52 megasrv2 kernel: (3568,2):dlm_drop_lockres_ref:2289 ERROR: 
while dropping ref on 
87B24E40651A4C7C858EF03ED6F3595F:M00021af916b7dfbde4 (master=0) got 
-22.
Jul  2 04:08:52 megasrv2 kernel: (3568,2):dlm_print_one_lock_resource:294 
lockres: M00021af916b7dfbde4, owner=0, state=64 
Jul  2 04:08:52 megasrv2 kernel: (3568,2):__dlm_print_one_lock_resource:309 
lockres: M00021af916b7dfbde4, owner=0, state=64
Jul  2 04:08:52 megasrv2 kernel: (3568,2):__dlm_print_one_lock_resource:311   
last used: 4747810336, on purge list: yes 
Jul  2 04:08:52 megasrv2 kernel: (3568,2):dlm_print_lockres_refmap:277   refmap 
nodes: [ ], inflight=0
Jul  2 04:08:52 megasrv2 kernel: (3568,2):__dlm_print_one_lock_resource:313   
granted queue: 
Jul  2 04:08:52 megasrv2 kernel: (3568,2):__dlm_print_one_lock_resource:328   
converting queue: 
Jul  2 04:08:52 megasrv2 kernel: (3568,2):__dlm_print_one_lock_resource:343   
blocked queue: 
Jul  2 04:08:52 megasrv2 kernel: --- [cut here ] - [please bite 
here ] -

After booting the server I'm getting a lot of the following messages: 

Jul  5 11:09:54 megasrv2 kernel: Additional sense: Logical unit not ready, 
manual intervention required
Jul  5 11:09:54 megasrv2 kernel: end_request: I/O error, dev sdd, sector 0
Jul  5 11:09:54 megasrv2 kernel: Buffer I/O error on device sdd, logical block 
0 
Jul  5 11:09:54 megasrv2 kernel: sd 1:0:0:2: Device not ready: 6: Current: 
sense key: Not Ready

But I guess this one has something to do with EMC PowerPath as sdd is not a 
valid device. And there is no PowerPath for use with RHEL5 yet... 

I'm sorry I haven't had the time to investigate this much. But right now I have 
no clue what caused this panic, or if it will happen again...
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

   
-
Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Best way to use multiple ocvfs2 volumes

2007-06-15 Thread Luis Freitas
Randy,
 
 If you have one big volume with several partitions you will have to stop 
all servers if you want to create new partitions or delete one, or be 
extremelly careful. With LVM2 and CLVM you could work with one big disk without 
this kind of trouble.

Personally I prefer having multiple small volumes and letting the storage 
do the hard work. You will still need to reboot the server or reload the hba 
module to force a bus scan in order to see new volumes, but you can do this one 
server at a time. 

  Before doing this kind of reorganization on production with everything 
running  I would recommend testing it well. RedHat 2.1 used to rescan the bus 
dynamically and I had some bad experiences, like sda suddenly becoming sdb with 
the filesystems mounted. I believe current kernels are better with this kind of 
stuff, and you should have no trouble if you are using a software like 
powerpath.

 I do not know if there is any way to force a bus rescan without reloading 
the hba module. Sugestions, anyone?

Regards,
Luis

Marcos E. Matsunaga [EMAIL PROTECTED] wrote:Randy,
 
 You can either create multiple partitions in a volume and mount them or just 
use the whole volume, it doesn't matter. Make sure the disks are accessible on 
all nodes, define your cluster, format and mount. Works just like other FS, 
with the exception of the cluster part.
 
 A single cluster can handle a large number of partitions without problem.
 
 Randy Ramsdell wrote:
Hi,

What would be the best way to implement multi-use of a ocfs2 cluster?

We need to have several (2-4) different mount points that provide
clustered data for different services. I don't think we can have 2
different clusters on 1 machine so would mounting multiple ocfs2
partitions/volumes for each of these services on the same cluster work?
Or what would be the best way to handle this?

We could use 1 large volume with multiple partitions or several volumes
with a partition/raw device.

Thanks,
Randy Ramsdell

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users   
  
 
-- 

Regards,

Marcos Eduardo Matsunaga

Oracle USA
Linux Engineering



 ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

   
-
Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] PBL with RMAN and ocfs2

2007-05-10 Thread Luis Freitas
Gaetano,
   
  If o2cb or CRS is killing the machine, it usually shows on 
/var/log/messages with lines explaining what happened. Take a look on the 
/var/log/messages just before the last syslogd x.x.x: restart.
   
  Regards,
  Luis
  


Gaetano Giunta wrote:
 Hello.

 On a 2 node RAC 10.2.0.3 setup, on RH ES 4.4 x86_64, with ocfs 1.2.5-1, we 
 are experiencing some troubles with RMAN: when the archive log destination is 
 on an ASM partition, and the backup detsination is on ocfs2, running

 backup archivelog all format 
 '/home/SANstorage/oracle/backup/rman/dump_log/FULL_20070509_154916/arc_%d_%u' 
 delete input;

 consistently causes a reboot.

 The rman catalog is clean, and has been crosschecked in every way.

 We tried on both nodes, and the node executing the backup always reboots.
 I am thus inclined to think that it is not the ocfs2 dlm that triggers the 
 reboot, because in that case the victim would always be the second node.

 I also tested the same command using as backup destination /tmp, and all was 
 fine. The backup file of the archived logs is 1249843712 in size.

 Our local oracle guy went through metalink and said there is no open 
 bug/patch for that at this time.

 Any suggestions ???

 Thanks
 Gaetano Giunta

 
 

 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



   
-
Ahhh...imagining that irresistible new car smell?
 Check outnew cars at Yahoo! Autos.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

RE: [Ocfs2-users] Hi

2007-05-07 Thread Luis Freitas
Ulf,
   
  I have implemented a few RAC databases and worked with OCFS, OCFS2, 
NetApp and ASM on different machines.
   
 In my opinion, currently, OCFS2 seems to be rather stable for Oracle use, 
except for some race conditions with CRS when the CRS files are on OCFS2 and 
the small default timeouts. Both can be tuned or worked around so as to not 
cause major headaches.
   
  One thing that caused me trouble was the fact that 10g has asyncronous 
I/O linked in by default, this was changed from what was on 9i, and the async 
I/O handlers deplete very fast if they are not tuned up when using OCFS2. 
Somehow this seems to be completely undocumented except for some ancient RedHat 
papers, and seems to cause problems in CRS. This can be seen as 
/proc/sys/fs/aio-nr reaches the value on /proc/sys/fs/aio-max-nr.
   
  Aside from that usually the problems I have are on changes of the 
optimizer or some 10g specific bugs causing ORA-600. One thing that is very 
important is to gather statistics, including system statistics, as there is a 
big change on the optimizer that now considers also the CPU cost for choosing 
the execution plan. This change on the optimizer also appears when applying 
9.2.0.7 and 9.2.0.8.
   
  Hope you have better luck on the next try.
   
  Regards,
  Luis
  
Ulf Zimmermann [EMAIL PROTECTED] wrote:
  
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:ocfs2-users-
 [EMAIL PROTECTED] On Behalf Of Sunil Mushran
 Sent: 05/07/2007 10:47
 To: Alexei_Roudnev
 Cc: Ocfs2-users@oss.oracle.com
 Subject: Re: [Ocfs2-users] Hi
 
 None of what you have written allows you to use our resources to
 spread your opinions as official recommendation.
 
 Alexei_Roudnev wrote:
 
  Oracle itself have not a SINGLE opinion (to be curious, I hear a
strong
  recommendation against OCFSv2 from oracle support, which I can not
agree
  with), so we can't treat your recommendations as official as well -
you
 are
  interested in OCFSv2 while users are not (users are interested in
making
 our
  data centers run smoothly). The only _official_ thing is
_certification
  matrix_.
 
  - Original Message -
  Alexei,
  While you are free to use this forum to share your opinions, do not
  couch these opinions as official recommendations. When push comes to
  shove, we are helping users not you. We develop, build, distribute
  the software, not you. So it may serve to community better if you
  let us offer the official recommendations and not you.
 
  Sunil

Just to add some comments from a user of Oracle 9i with OCFSv1 on RedHat
AS2.1 who tried to upgrade to EL4 and OCFSv2 and failed miserable:

Oracle support pretty much told us the problems we were running into are
problems of OCFSv2 and they weren't really willing to help us. The
feeling we were getting was that two Oracle departments (the one writing
the Database RAC engine and the one writing OCFSv2) are fighting with
each other.

In general I have a very low opinion of Oracle and their quality of code
and tools. Like patch revision numbering? Does not exist. Patch tools
suppose to patch all machines in clusters? You wish. Decent error
messages? They never heard about that.

We ended up with staying on AS2.1 and OCFSv1 for now and just migrating
our data to a new SAN.

Regards, Ulf.

-
ATC-Onlane Inc., T: 650-532-6382, F: 650-532-6441
4600 Bohannon Drive, Suite 100, Menlo Park, CA 94025
-

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


   
-
Ahhh...imagining that irresistible new car smell?
 Check outnew cars at Yahoo! Autos.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] High on buffers and deep on swap

2007-04-09 Thread Luis Freitas
Alexei,
   
 How can I relate the information on slabtop to the actual memory used by 
buffers?
   
 I see this on slabtop:
   
   Active / Total Objects (% used): 603822 / 649643 (92.9%)
 Active / Total Slabs (% used)  : 47216 / 47216 (100.0%)
 Active / Total Caches (% used) : 97 / 133 (72.9%)
 Active / Total Size (% used)   : 176601.08K / 181508.49K (97.3%)
 Minimum / Average / Maximum Object : 0.01K / 0.28K / 128.00K
OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
202461 202451  99%0.54K  289237115692K ext3_inode_cache
232206 232153  99%0.15K   8931   26 35724K dentry_cache
 29974  26092  87%0.27K   2141   14  8564K radix_tree_node
 68250  62400  91%0.05K910   75  3640K buffer_head
   855855 100%4.00K8551  3420K pmd
   647647 100%4.00K6471  2588K size-4096
  8595   7694  89%0.25K573   15  2292K filp
 20835  17331  83%0.09K463   45  1852K vm_area_struct
   780767  98%2.00K3902  1560K size-2048
 18849   7962  42%0.06K309   61  1236K size-64
  2440632  25%0.50K3058  1220K size-512
  2926   2889  98%0.34K266   11  1064K inode_cache
   256256 100%3.00K1282  1024K biovec-(256)
   600592  98%1.38K1205   960K task_struct
   515512  99%1.38K1035   824K pirpIo
  5084   2365  46%0.12K164   31   656K size-128

   
The largest area is about 100Mb, but on free there are over 3Gb on the 
cached column:
   
  [EMAIL PROTECTED] ~]$ free
 total   used   free sharedbuffers cached
Mem:   51907364461420 729316  0 1418363265464
-/+ buffers/cache:10541204136616
Swap:  2048248  02048248
[EMAIL PROTECTED] ~]$

   
  Regards,
  Luis
   
  

Alexei_Roudnev [EMAIL PROTECTED] wrote:
  Did you run slabtop ? It can show unreleased buffers in the system.
   
- Original Message - 
  From: Luis Freitas 
  To: Alexei_Roudnev ; Brian Sieler ; ocfs2-users@oss.oracle.com 
  Sent: Monday, April 09, 2007 2:29 PM
  Subject: Re: [Ocfs2-users] High on buffers and deep on swap
  

  Alexei,
   
Yes, it seems to have no effect, which too is very strange. On 2.4 
vm.freepages had a very easy to notice effect.
   
 There are other people having problems with buffers not being released on 
the list and some of them are forcing the kernel cache to be flushed with:
   
  echo 3  /proc/sys/vm/drop_caches
   
  But I dont see this parameter on RHAS 4.0.
   
   Also, to be fair this seems to be a generic VM issue, I see this on 
servers that are not running ocfs2 too. And I only see this behavior on 
machines with more than 2Gb of memory.
   
  Regards,
  Luis
   
  Regards,
Luis

Alexei_Roudnev [EMAIL PROTECTED] wrote:
  v\:* {   BEHAVIOR: url(#default#VML)  }  o\:* {   BEHAVIOR: 
url(#default#VML)  }  w\:* {   BEHAVIOR: url(#default#VML)  }  .shape {   
BEHAVIOR: url(#default#VML)  }  Did you tried vm.swappiness 
parameter?
   
  (/proc/sys/vm/swappiness)
   
- 
  Original Message - 
  From: Brian Sieler 
  To: 'Luis Freitas' ; ocfs2-users@oss.oracle.com 
  Sent: Sunday, April 08, 2007 10:52 PM
  Subject: RE: [Ocfs2-users] High on buffers and deep on swap
  

Luis, yes I am experiencing what appears to be a similar problem you are 
describing. See my post from just a few minutes ago on another thread.
   
  I run a 2-node cluster with OCFS2/RAC on 2.6.9-34.0.2.ELsmp (RHEL 4.0) as 
well.
   
   total   used   free sharedbuffers cached
  Mem:   40444964005516  38980  0  341082236636
  -/+ buffers/cache:17347722309724
  Swap:  2097144 6482441448900
   
  If you’ve uncovered anything since posting this message, please pass it along?
   
  
-
  
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luis Freitas
Sent: Friday, February 23, 2007 5:32 PM
To: ocfs2-users@oss.oracle.com
Subject: [Ocfs2-users] High on buffers and deep on swap

   
Hi,

 

   This is a bit off topic, hope there is not a problem.

 

   Anyone out there experiencing high swapping with the kernel retaining a 
large amount of buffers? This used to be a problem on 2.4, and I usually 
changed /proc/sys/vm/freepages to fix it. But on 2.6 this parameter no longer 
exists.

 

One of the servers here is holding over 3.5Gb of cache even when using 
over 700Mb of swap, and free memory is always low.

 

[EMAIL PROTECTED] ~]$ free
 total   used   free sharedbuffers cached
Mem:   51907364810880 379856  0 1430323583868
-/+ buffers/cache:10839804106756
Swap:  2048248

Re: [Ocfs2-users] ocfs2 cluster becomes unresponsive

2007-03-13 Thread Luis Freitas
Andy,
   
 I found helpfull to diagnose this kind of hang to keep a priority 0 shell 
opened on the server. This shell usually keeps working even during heavy 
swapping or other situations where the system becomes unresponsive. You can 
start one with this command:
   
  nice -n -20 bash
   
  From this you could run a top or a vmstat to see what is happening when 
the server is unresponsive. Just be careful to not run any command that might 
generate a large output or have high CPU usage, as you might hang the server 
yourself.
   
  Regards,
  Luis

Andy Kipp [EMAIL PROTECTED] wrote:
  I checked bugzilla and what is happening is almost identical to bug #819. 
However, the dead node continues to heartbeat, yet is unresponsive. No log 
output at all is generated on the dead node. This has been happening for a 
few months however frequency is increasing. Is there any information I can 
provide to hopefully figure this out?

- Andy
-- 

Andrew Kipp
Network Administrator
Velcro USA Inc.
Email: [EMAIL PROTECTED]
Work: (603) 222-4844

CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged material. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply e mail and 
destroy all copies of the original message. If you are the intended recipient 
but do not wish to receive communications through this medium, please so advise 
immediately.


 On 3/9/2007 at 9:39 PM, in message [EMAIL PROTECTED], Sunil Mushran
wrote:
 File a bugzilla with the messages from all three nodes. Appears
 node 2 went down but kept heartbeating. Strange. The messages
 from node 2 may shed more light.
 
 Andy Kipp wrote:
 We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both 
 nodes will become unresponsive. Can not access either machine via ssh or 
 terminal (hangs after typing in username). However the machine still responds 
 to pings. This continues until one node is rebooted, at which time the second 
 node resumes normal operations. 

 I am not entirely sure that this is an OCFS2 problem at all however the 
 syslog shows it had issues Here is the log from the node that was not 
 rebooted. The node that was rebooted contained no log information. The system 
 appeared to have gone down at about 3AM, until the node was rebooted at 
 around 7:15.

 Mar 8 03:06:32 groupwise-1-mht kernel: o2net: connection to node 
 groupwise-2-mht (num 2) at 192.168.1.3: has been idle for 10 seconds, 
 shutting it down.
 Mar 8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are 
 some times that might help debug the situation: (tmr 1173341182.367220 now 
 1173341192.367244 dr 1173341182.367213 adv 
 1173341182.367228:1173341182.367229 func (05ce6220:2) 
 1173341182.367221:1173341182.367224)
 Mar 8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node 
 groupwise-2-mht (num 2) at 192.168.1.3:
 Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330 
 ERROR: link to 2 went down!
 Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914 
 ERROR: status = -112
 Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_send_proxy_ast_msg:458 
 ERROR: status = -107
 Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR: 
 status = -107
 Mar 8 03:19:54 groupwise-1-mht kernel: 
 (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
 Mar 8 03:19:54 groupwise-1-mht last message repeated 127 times
 Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330 
 ERROR: link to 2 went down!
 Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914 
 ERROR: status = -107
 Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330 
 ERROR: link to 2 went down!
 Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914 
 ERROR: status = -107
 Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330 
 ERROR: link to 2 went down!
 Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914 
 ERROR: status = -107
 Mar 8 03:45:29 groupwise-1-mht -- MARK --
 Mar 8 04:15:02 groupwise-1-mht kernel: 
 (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
 Mar 8 04:15:03 groupwise-1-mht last message repeated 383 times
 Mar 8 06:27:54 groupwise-1-mht kernel: 
 (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
 Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
 Mar 8 06:27:54 groupwise-1-mht kernel: 
 (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
 Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
 Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330 
 ERROR: link to 2 went down!
 Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914 
 ERROR: status = -107
 Mar 8 06:52:45 

Re: [Ocfs2-users] High on buffers and deep on swap

2007-02-24 Thread Luis Freitas
Hi,
   
  This particular server is 2.6.9-34.ELsmp (RedHat 4.0) as this is our 
production cluster and we did not update it recently.
   
  Regards,
  Luis

Sunil Mushran [EMAIL PROTECTED] wrote:
  Hmmm... the last time I saw your numbers, ocfs2's foot print was 15M.
You'll have to do better than that.

Anycase, Luis problem is the relationship between swap and cached buffers.
Which kernel is this?

John Lange wrote:
 It seems that ocfs has an unfixed memory leak even in the most recent
 version.

 I hope to make a more detailed bug report on Monday.

 John

 On Fri, 2007-02-23 at 15:31 -0800, Luis Freitas wrote:
 
 Hi,
 
 This is a bit off topic, hope there is not a problem.
 
 Anyone out there experiencing high swapping with the kernel
 retaining a large amount of buffers? This used to be a problem on 2.4,
 and I usually changed /proc/sys/vm/freepages to fix it. But on 2.6
 this parameter no longer exists.
 
 One of the servers here is holding over 3.5Gb of cache even when
 using over 700Mb of swap, and free memory is always low.
 
 [EMAIL PROTECTED] ~]$ free
 total used free shared buffers
 cached
 Mem: 5190736 4810880 379856 0 143032
 3583868
 -/+ buffers/cache: 1083980 4106756
 Swap: 2048248 723064 1325184
 [EMAIL PROTECTED] ~]$

 I am tuning /proc/sys/vm/swappiness, but this seems to have no
 effect at all. Changed from 60 to 10 and seems to have no effect. The
 server runs Oracle RAC with OCFS2.
 
 Regards,
 Luis
 
 

 __
 Food fight? Enjoy some healthy debate
 in the Yahoo! Answers Food  Drink QA.
 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 


 ___
 Ocfs2-users mailing list
 Ocfs2-users@oss.oracle.com
 http://oss.oracle.com/mailman/listinfo/ocfs2-users
 

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


 
-
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] High on buffers and deep on swap

2007-02-23 Thread Luis Freitas
Hi,
   
 This is a bit off topic, hope there is not a problem.
   
 Anyone out there experiencing high swapping with the kernel retaining a 
large amount of buffers? This used to be a problem on 2.4, and I usually 
changed /proc/sys/vm/freepages to fix it. But on 2.6 this parameter no longer 
exists.
   
  One of the servers here is holding over 3.5Gb of cache even when using 
over 700Mb of swap, and free memory is always low.
   
  [EMAIL PROTECTED] ~]$ free
 total   used   free sharedbuffers cached
Mem:   51907364810880 379856  0 1430323583868
-/+ buffers/cache:10839804106756
Swap:  2048248 7230641325184
[EMAIL PROTECTED] ~]$

 I am tuning /proc/sys/vm/swappiness, but this seems to have no effect at 
all. Changed from 60 to 10 and seems to have no effect. The server runs Oracle 
RAC with OCFS2.
   
  Regards,
  Luis
   
   

 
-
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food  Drink QA.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: Hmm, here is an example. Re: [Ocfs2-users] Also just a comment to theOracle guys

2007-02-11 Thread Luis Freitas
Alexei,
 
  I think you got a point too, maybe OCFS2 could behave like Netapp, and 
simply hang when there is a problem and leave the fencing for CRS or whathever 
other clusterware is in use.
 
  Anyone from Oracle got a opinion on this?
 
 Regards,
 Luis

Alexei_Roudnev [EMAIL PROTECTED] wrote: Absolutely. I know how redo and 
RAC interacts, you are  absolutely correct.
  
 Sometimes CSSD reboots one node and that's all - good luck.  Sometimes OCFS 
reboots one node and CSSD reboots another node - bad luck. That's  why it is 
important do not mix different cluster managers on the same servers,  or at 
least allow them to interact and make similar  decision _who is  master today_ 
(so who will survive split-brain situation).
  
 RAC is little simple case because Oracle is usually primary  service - so if 
it decide to reboot, it's reasonable decision. OCFSv2 is another  story - 
sometimes
 it is _secondary_ service (for example, it is used for the  backups only), and 
if it is secondary then it should better stop working then  reboot.
  
 It reveals 2 big problems (both, Oracle and OCFSv2, are  affected):
 - single interface heartbeat is not reliable. You CAN NOT  build reliable 
cluster using single heartbeat channel. Classical clusters  (Veritas VCS)
 uses 2 - 4 different heartbeat media (we use 2 independent  Ethernet HUBS and 
2 generic Ethernet LAN-s in Veritas, use 2 Ethernets + 1  Serial in Linux 
clusters,
 use 2 Ethernet + 1 Serial in Cisco PIX cluster, and so on).  Both OCFS and 
Oracle RAC can not use more then one (to be correct, you can  configure few 
interfaces
 for RAC interconnection in SPFile, but it wil not affect  CSSD). In addition, 
OCFS defaults are very strange and unrealistic - Ethernet  and FC can not 
guarantee heartbeat times better than 1 minute, in average (I  mean - it case 
of any network reconfiguration heartbeat wil experience 30 - 50  seconds delay, 
so if you configure 12 seconds timeout /default in OCFSv2/ you  are at least 
naive.
  
 - too easy _self fencing_. Just again, if OCFS node lost  connection to the 
disk, it should not self fence - it can send data to another  nodes (or request 
them from
 another nodes), it can unmount file system and try to remount  it, it can 
release control and resume operations. Immediate fencing is necessary  in SOME 
cases
 but not in all. If FS have not pending operations, then  Fencing by reboot 
don't make much difference with just _remount_. It's not so  simple as I 
explain here,
 but the truth is that fencing decisions are not flexible  enough and decrease 
reliability dramatically (I posted a list of scenarios when  fencing should not 
happen).
  
 IN addition, I noticed other problems with OCFSv2 too (such as  excessive CPU 
usage in some cases).
  
 I use OCFSv2, even in production. But I do it with a grain of  salt, have a 
backup plan _how to run without it_, and don't use it for heavily  loaded
 file systems with million files (I use  heartbeat, reiserfs and APC switch 
fencing - and 3 independent heartbeats,  with 40 seconds timeout). For now, I 
had one glitch on OCFSv2 (when it remounted  read only on one node) and that's 
all - no other problems  in production  (OCFSv2 is used during start/stops 
only, so it is safe). But I run stress  tests in the lab, I am running it in 
the lab clusters now (including RAC),  and conclusion is simple - as a cluster, 
it is not reliable; as a file  system, it may have hidden bugs so be extra 
careful with it.
  
 PS. Good point - it improves every month. Some problems are in  the past 
already. 
  
 PPS. All this lab reboots have been caused by extremely heavy  load or by 
hardware failures (simulated or real). It works better in real life.  But my 
experience says me, that if I can break something in the lab in 3 days,  it's a 
matter of few month, when it broke in production.
  
- Original Message - 
   From:LuisFreitas 
   To: Ocfs2-users@oss.oracle.com 
   Sent: Saturday, February 10, 2007 4:52PM
   Subject: Re: Hmm,here is an example. Re:[Ocfs2-users] Also just a 
comment to theOracle guys
   

   Alexei,

   Actually your log seems to show that CSSD (OracleCRS) rebooted the 
node before OCFS2 got a chance to do it.

   On a RAC cluster, if the interconnect isinterrupted, all the nodes 
hang until a split brain resolution iscomplete and the recovery of all the 
crashed nodes is completed. This isneeded because every read on a Oracle 
datablock needs a ping to the othernodes. 

   The view of the data must be consistent, when one noderead a 
particular data block, the Oracle Database first ping the othernodes to 
ensure that they did not modify the block and still have not flushedit to 
disk. Another node may even forward a reply with the block,preventing the 
disk access (Cache Fusion). 

   When a split brain occurs, there is the loss of theseblocks not 
flushed

Re: Hmm, here is an example. Re: [Ocfs2-users] Also just a comment to the Oracle guys

2007-02-10 Thread Luis Freitas
Alexei,
   
  Actually your log seems to show that CSSD (Oracle CRS) rebooted the node 
before OCFS2 got a chance to do it.
   
  On a RAC cluster, if the interconnect is interrupted, all the nodes hang 
until a split brain resolution is complete and the recovery of all the crashed 
nodes is completed. This is needed because every read on a Oracle datablock 
needs a ping to the other nodes. 
   
  The view of the data must be consistent, when one node read a particular 
data block, the Oracle Database first ping the other nodes to ensure that they 
did not modify the block and still have not flushed it to disk. Another node 
may even forward a reply with the block, preventing the disk access (Cache 
Fusion). 
   
  When a split brain occurs, there is the loss of these blocks not flushed 
to disk, and they are rebuilt using the redo threads of the particular nodes 
that crashed. During this interval all the database instances freeze, since 
before the node recovery is complete there is no way to guarantee that a block 
read from disk has not been altered on the crashed node.
   
  So the fencing is needed even if there is no disk activity, as the entire 
cluster becomes hang the moment the interconnect is down. And the timeout for 
the fencing must be as small as possible to prevent a long cluster 
reconfiguration delay. Of course the timeout must be tuned so as to be larger 
than ethernet switch failovers, or storage controller or disk multipath 
failovers. Or if possible the failover times should be reduced.
   
 Now, on the other hand, I am too having problems with OCFS2. It seems much 
less robust than ASM and the previous version, OCFS, specially under heavy disk 
activity. But I do expect these problems to get solved in the near future, as 
did the 2.4 kernel VM problems.
   
  Regards,
  Luis
  
Alexei_Roudnev [EMAIL PROTECTED] wrote:
  Additional info - node had not ANY active OCFSv2 operations (OCFSv2 
used for backups only and from another node only). So, if system just SUSPEND 
all FS operations and try to rejoin to the cluster, it all could work 
(moreover, connection to the disk system was intact, so it could close file 
sytem gracefully).
   
  It reveals 3 problems at once:
  - single heartbeat link (instead of multiple links)
  - timeout too short (ethernet can't guarantee 10 seconds, it can guarantee 1 
minute minimum);
  - fencing even if system is passive and can remount / reconnect instead of 
rebooting.
   
  All we did in the lab was _disconnect 1 of trunks between switches for a few 
seconds, then insert it back into the socket_. No one other application failed
  (including heartbeat clusters). Database cluster was not doing anything on 
OCFS in time of failure (even backups).
   
  I will try heartbeat between loopback interfaces (and OCFS protocol) next 
time (I am just curios if it can provide 10 seconds for network 
reconfiguration).
   
  ...
  Feb  1 12:19:13 testrac12 kernel: o2net: connection to node testrac11 (num 0) 
at 10.254.32.111: has been idle for 10 seconds, shutting it down. 
Feb  1 12:19:13 testrac12 kernel: (13,3):o2net_idle_timer:1310 here are some 
times that might help debug the situation: (tmr 1170361135.521061 now 
1170361145.520476 dr 1170361141.852795 adv 1170361135.521063:1170361135.521064 
func (c4378452:505) 1170361067.762941:1170361067.762967) 
Feb  1 12:19:13 testrac12 kernel: o2net: no longer connected to node testrac11 
(num 0) at 10.254.32.111: 
Feb  1 12:19:13 testrac12 kernel: (1855,3):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:19:13 testrac12 kernel: (1855,3):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:19:13 testrac12 kernel: (1855,1):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:19:13 testrac12 kernel: (1855,1):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:22:22 testrac12 kernel: (1855,2):dlm_send_remote_convert_request:398 
ERROR: status = -107 
Feb  1 12:22:22 testrac12 kernel: (1855,2):dlm_wait_for_node_death:371 
5AECFF0BBCF74F069A3B8FF79F09FB5A: waiting 5000ms for notification of death of 
node 0 
Feb  1 12:22:27 testrac12 kernel: (13,3):o2quo_make_decision:144 ERROR: fencing 
this node because it is connected to a half-quorum of 1 out of 2 nodes which 
doesn't include the lowest active node 0 
Feb  1 12:22:27 testrac12 kernel: (13,3):o2hb_stop_all_regions:1889 ERROR: 
stopping heartbeat on all active regions. 
Feb  1 12:22:27 testrac12 kernel: Kernel panic: ocfs2 is very sorry to be 
fencing this system by panicing 
Feb  1 12:22:27 testrac12 kernel: 
Feb  1 12:22:28 testrac12 su: pam_unix2: session finished for user oracle, 
service su 
Feb  1 12:22:29 testrac12 logger: Oracle CSSD failure.  Rebooting for cluster 
integrity. 
Feb  1 12:22:32 testrac12 su: pam_unix2: session finished for user 

Re: [Ocfs2-users] Success!

2007-02-06 Thread Luis Freitas
Brandon,
   
 Can you post details about the Disk layout you are using and disk models? 
(RAID5/RAID10, 15K RPM or 10K RPM, 36Gb or 76Gb, how many disks on each raid 
group)
   
  Best Regards,
Luis

Brandon Lamb [EMAIL PROTECTED] wrote:
  Well today I did clean installs on 3 test machines.

For the server I used the openfiler (latest from sourceforge iso) as
the server os, using a 500 gig sata drive and exporting as iscsi.

I then installed CentOS 4.4 on two servers, ran the yum updates, and
then installed the ocfs2 kernel and tools, devel and debug rpms.

After my previous experience i was able to get up and going very
quickly since i didnt have to relearn how to do the iscsi stuff and
edit the ocfs2 config.

I havent had a chance to really test it yet as the machines just
finished updating awhile ago but they can mount the drive, so I am
excited at seeing some progress.

One one node I was able to do a copy from /tmp (hda3) of a 771 meg
maildir to the ocfs2 mount over iscsi. Took about 1 minute, using gige
(no jumbo frames, just a cheap switch).

I used the -T mail option for mkfs.ocfs2 and mounted with
_netdev,nointr in fstab.

I briefly went over our PRTG logs and it looks our current NFS server
that stores our mail data moves about 114 gigabytes a day of traffic
on our switch (total in and out). Doing simple math and not taking
into account spurts its about 1.6 megabytes per second I think, so I
am hoping this will work out quite well for us.

One thing I did notice is that the server (running openfiler) went up
to 7+ load. Im wondering/thinking of just reinstalling it with centos
and isntalling the iscsi-target from source (newer version i think).
This is a pentium d 3ghz, 8gig ram machine. I was supprised to see it
go to 7 with just 1 node copying data to it. But this is an iscsi
issue i assume so i'll mess with that plus network tuning.

All in all, I am happy with the results today, mostly because I have
had no crashes or errors or kernels panics/reboots or anything.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


 
-
Have a burning question? Go to Yahoo! Answers and get answers from real people 
who know.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] expand ocfs2

2007-01-24 Thread Luis Freitas
Antonio,
   
 Last time I looked, about 6 months ago, LVM2 still wasnt cluster aware. So 
changing volume sizes on a cluster disk could lead to data corruption. At least 
a reboot of all involved nodes would be required after any LVM change, or some 
other procedure to force LVM to reread the volumes from disk on the remote 
nodes.
   
 With fdisk, you can force the other nodes to re-read the partition table 
from disk by running fdisk and forcing a write of the partition table on each 
node.
   
  Regards,
  Luis

Antonio Trujillo Carmona [EMAIL PROTECTED] wrote:
  El mié, 24-01-2007 a las 16:49 +0100, Antonio Trujillo Carmona escribió:
 Hello I'm try to install a 2 node acces with a proliant and hba conect
 to a EVA.
 my doubt is. is posibol expand a ocfs2 filesystem? and, what is better,
 in order to use the multipath device, format /dev/sda or use lvm to
 create a logical volumen and format it? 
 
OK. I found the answer to my first cuestion in the FAQ
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#RESIZE.
What abauot the second one?
any one can guide me to leading with multipath?
-- 
Antonio Trujillo Carmona 



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com ___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] Correction

2007-01-16 Thread Luis Freitas
 
 Well, I dont know. But I have this issue also, OCFS2 is much slower than 
OCFS when doing file copies.
   
  Regards,
  Luis

GOKHAN [EMAIL PROTECTED] wrote:
  Why db2 is too slow than db1 ?
  Anybody have idea ?

  Thx Luis 4 the correction..
   
  
Message: 4
Date: Tue, 16 Jan 2007 03:17:17 -0800 (PST)
From: Luis Freitas [EMAIL PROTECTED]
Subject: Re: [Ocfs2-users] ocfs Vs ocfs2
To: ocfs2-users@oss.oracle.com
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=iso-8859-1

Hi,
   
   It seems that you inverted the times on the chart. 
   
  Regards,
  Luis

GOKHAN [EMAIL PROTECTED] wrote:
  Hi everbody this is my first post,
  I have two test server .(Both of them is idle)
  db1 : RHEL4 OCFS2
  db2 : RHEL3 OCFS
   
  I test the IO both of them
  The result is below.
   
   db1(Time Spend)  db2(Time Spend)  OS Test Commanddd 
(1GB) (Yazma)  0m0.796s  0m18.420s  time dd if=/dev/zero of=./sill.t bs=1M 
count=1000dd (1GB) (Okuma)  0m0.241s  8m16.406s  time dd of=/dev/zero 
if=./sill.t bs=1M count=1000cp (1GB)  0m0.986s  7m32.452s  time cp sill.t 
sill2.t
   
  Why db1 is too slow than db2 ?
  Anybody have idea ?
   
  My production database is oracle 9205 on  RHEL3 and OCFS 
  I test the 9208 on RHEL4 and OCFS2 .
  I will try to decide to upgrade or not...
   
  Thx..
   
   


  
-
  Need Mail bonding?
Go to the Yahoo! Mail QA for great tips from Yahoo! Answers 
users.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-- next part --
An HTML attachment was scrubbed...
URL: 
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20070116/beac5eb0/attachment-0001.html

--

Message: 5
Date: Tue, 16 Jan 2007 09:20:28 -0600
From: Brian Sieler [EMAIL PROTECTED]
Subject: [Ocfs2-users] OCFS2 crash
To: ocfs2-users@oss.oracle.com
Message-ID:
[EMAIL PROTECTED]
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed

Using 2-node clustered file system on DELL/EMC SAN/RHEL
2.6.9-34.0.2.ELsmp x86_64.

Config:

O2CB_HEARTBEAT_THRESHOLD=30

Kernel param: elavator=deadline (per FAQ)

These log items appear and the server crashes. Has happened twice now
at three week intervals, each time during a heavy IO operation:

Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_setup_one_bio:371 ERROR:
Could not alloc slots BIO!
Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_read_slots:507 ERROR: status = -12
Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_do_disk_heartbeat:973
ERROR: status = -12
Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_setup_one_bio:371 ERROR:
Could not alloc slots BIO!
Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_read_slots:507 ERROR: status = -12
Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_do_disk_heartbeat:973 ERROR: status

Can't find much on any of these errors…what is 507 ERROR status = -12?

Any help appreciated

-- 
Brian



--

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


End of Ocfs2-users Digest, Vol 37, Issue 12
***





  
-
  Don't pick lemons.
See all the new 2007 cars at Yahoo! 
Autos.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


 
-
The fish are biting.
 Get more visitors on your site using Yahoo! Search Marketing.___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users