[Linux-HA] ocf:heartbeat:exportfs fails tests

2012-07-12 Thread EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
Hi,

I am wondering why migrating a simple nfs resource leads to fencing of the 
cluster node.

Running resource-agents-3.9.2-0.25.5 on SLES11-SP2 fully updated.

ocf-tester -n test -o fsid=1 -o directory=/SHARED/nfs/home -o 
options=rw,mountpoint -o clientspec=10.0.0.0/255.0.0.0 -o 
wait_for_leasetime_on_stop=true -o unlock_on_stoptrue 
/usr/lib/ocf/resource.d/heartbeat/exportfs
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/exportfs...
* rc=0: Monitoring a stopped resource should return 7
* rc=0: The initial probe for a stopped resource should return 7 or 5 even if 
all binaries are missing
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the reload action (optional)
Tests failed: /usr/lib/ocf/resource.d/heartbeat/exportfs failed 5 tests

Is there anything wrong with my testing?

Is is normal that released resources fail their included ocf-tester tests? Is 
this problem SuSE-specific?

Best regards

Martin Konold

Robert Bosch GmbH
Automotive Electronics
Postfach 13 42
72703 Reutlingen
GERMANY
www.bosch.comhttp://www.bosch.com

Tel. +49 7121 35 3322

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Volkmar Denner, 
Siegfried Dais;
Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm, Dirk Hoheisel, Christoph 
Kübel, Uwe Raschke,
Wolf-Henning Scheider, Werner Struth, Peter Tyroller



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] ocf:heartbeat:exportfs fails tests

2012-07-12 Thread EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)
Hi,

 I am wondering why migrating a simple nfs resource leads to fencing of the 
 cluster node.

Still trying to resolve this.

 * Your agent does not support the reload action (optional) Tests failed: 
 /usr/lib/ocf/resource.d/heartbeat/exportfs failed 5 tests

 Is there anything wrong with my testing?

I figured out that actually putting the cluster in maintenance mode before 
doing the test helps significantly ;-)

Sorry for the noise!

Best regards

Martin Konold

Robert Bosch GmbH
Automotive Electronics
Postfach 13 42
72703 Reutlingen
GERMANY
www.bosch.comhttp://www.bosch.com

Tel. +49 7121 35 3322

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Volkmar Denner, 
Siegfried Dais; Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm, Dirk 
Hoheisel, Christoph Kübel, Uwe Raschke, Wolf-Henning Scheider, Werner Struth, 
Peter Tyroller



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Pacemaker and software RAID using shared storage.

2012-07-12 Thread Caspar Smit
Hi all,

I'm exploring the possibility to create a shared storage cluster using
software raid (mdadm).

The idea is to have two normal servers each equipped with a SAS HBA
(non-raid) with external SAS connections (e.g. LSI 9200-8e card)
Then a dual controller JBOD which connects to both servers so that
both servers can see all disks in the JBOD.

Now the interesting part. I would like to create a software raid6 set
(or multiple) with the disks in the JBOD and have the possibility to
use
the raid6 in an active/passive cluster.

For instance create a raid6 set (md0) of 10 disks with the first 10
disks in the JBOD on server1. Create a filesystem on md0. Export the
filesystem
as NFS export (using exportfs).

When doing a migrate of the resources it should stop exportfs, unmount
filesystem, stop md0 (mdadm -S /dev/md0) on server1
and start md0 on server2 (mdadm -A /dev/md0), mount filesystem, start exportfs.

Is this scenario possible and/or obsolete/superceded by another
mechanism because I couldn't find anything regarding mdadm in
combination with
pacemaker except a fairly old raid1 resource agent which only seems to
support raid1 sets.

Any other recommendations in the scenario?

Kind regards,

Caspar Smit
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] ocf:heartbeat:exportfs fails tests

2012-07-12 Thread EXTERNAL Konold Martin (erfrakon, RtP2/TEF72)

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of EXTERNAL Konold 
Martin (erfrakon, RtP2/TEF72)
Sent: Donnerstag, 12. Juli 2012 10:08
To: 'General Linux-HA mailing list'
Subject: Re: [Linux-HA] ocf:heartbeat:exportfs fails tests

Hi,

  I am wondering why migrating a simple nfs resource leads to fencing of the 
  cluster node.

 Still trying to resolve this.

For readers oft the archive: This can be caused by insufficient timeouts.

ocf:heartbeat:exportfs needs at least 100s timeout for stopping in order to 
work properly

Best regards

Martin Konold

Robert Bosch GmbH
Automotive Electronics
Postfach 13 42
72703 Reutlingen
GERMANY
www.bosch.comhttp://www.bosch.com

Tel. +49 7121 35 3322

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Volkmar Denner, 
Siegfried Dais; Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm, Dirk 
Hoheisel, Christoph Kübel, Uwe Raschke, Wolf-Henning Scheider, Werner Struth, 
Peter Tyroller
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Pacemaker and software RAID using shared storage.

2012-07-12 Thread Lars Marowsky-Bree
On 2012-07-12T10:31:53, Caspar Smit c.s...@truebit.nl wrote:

 Now the interesting part. I would like to create a software raid6 set
 (or multiple) with the disks in the JBOD and have the possibility to
 use
 the raid6 in an active/passive cluster.

Sure. md RAID in a fail-over configuration is managed by the Raid1
resource agent just fine; don't worry, it will also handle raid6, the
name is historical.

It probably ought to be renamed to Raid. But, think of the 1 not as a
reference to the raid level - it might be the version of the RA. ;-)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Services within XEN DomU oom-killed

2012-07-12 Thread Helmut Wollmersdorfer
Hi,


today one DomU running mainly Apache, Typo3, MySQL went unusable.

Last message from Nagios:

Ram : 98%, Swap : 100% :  99, 90 : CRITICAL

Logging in with xm console I saw oom-killer messages and did an xm  
destroy.

Is there any solution to automatically destroy and restart a DomU in  
such a case?

The current crm config is as follows (only the relevant part of one  
DomU):


xen11:/# crm configure show
node $id=xxx xen11
node $id=yyy xen10


primitive xen_drbd2_1 ocf:linbit:drbd \
params drbd_resource=drbd2_1 \
op monitor interval=15s \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s
primitive xen_drbd2_2 ocf:linbit:drbd \
params drbd_resource=drbd2_2 \
op monitor interval=15s \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s
primitive xen_typo3 ocf:heartbeat:Xen \
params xmfile=/etc/xen/typo3.cfg \
op monitor interval=3s timeout=30s \
op start interval=0 timeout=60s \
op stop interval=0 timeout=40s \
meta target-role=Started allow-migrate=false is-managed=true


group group_drbd2 xen_drbd2_1 xen_drbd2_2

ms DrbdClone2 group_drbd2 \
meta master_max=1 master-mode-max=1 clone-max=2 clone-node- 
max=1 notify=true

location cli-prefer-xen_typo3 xen_typo3 \
rule $id=cli-prefer-rule-xen_typo3 inf: #uname eq xen10

colocation xen_typo3_and_drbd inf: xen_typo3 DrbdClone2:Master

order xen_typo3_after_drbd inf: DrbdClone2:promote xen_typo3:start

property $id=cib-bootstrap-options \
dc-version=1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b \
cluster-infrastructure=Heartbeat \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1331211005 \
maintenance-mode=false
rsc_defaults $id=rsc-options \
resource-stickiness=100


TIA

Helmut Wollmersdorfer




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Services within XEN DomU oom-killed

2012-07-12 Thread Dejan Muhamedagic
Hi,

On Thu, Jul 12, 2012 at 03:05:28PM +0200, Helmut Wollmersdorfer wrote:
 Hi,
 
 
 today one DomU running mainly Apache, Typo3, MySQL went unusable.
 
 Last message from Nagios:
 
 Ram : 98%, Swap : 100% :  99, 90 : CRITICAL
 
 Logging in with xm console I saw oom-killer messages and did an xm  
 destroy.
 
 Is there any solution to automatically destroy and restart a DomU in  
 such a case?

You can add the script hook to your xen resource. See:

 crm ra info Xen
...
monitor_scripts (string): list of space separated monitor scripts
To additionally monitor services within the unprivileged domain,
add this parameter with a list of scripts to monitor.
...

A script should exit with 0 on success. I guess that you can
just do curl URL which exercises the apache and/or mysql.

Thanks,

Dejan


 The current crm config is as follows (only the relevant part of one  
 DomU):
 
 
 xen11:/# crm configure show
 node $id=xxx xen11
 node $id=yyy xen10
 
 
 primitive xen_drbd2_1 ocf:linbit:drbd \
   params drbd_resource=drbd2_1 \
   op monitor interval=15s \
   op start interval=0 timeout=240s \
   op stop interval=0 timeout=100s
 primitive xen_drbd2_2 ocf:linbit:drbd \
   params drbd_resource=drbd2_2 \
   op monitor interval=15s \
   op start interval=0 timeout=240s \
   op stop interval=0 timeout=100s
 primitive xen_typo3 ocf:heartbeat:Xen \
   params xmfile=/etc/xen/typo3.cfg \
   op monitor interval=3s timeout=30s \
   op start interval=0 timeout=60s \
   op stop interval=0 timeout=40s \
   meta target-role=Started allow-migrate=false is-managed=true
 
 
 group group_drbd2 xen_drbd2_1 xen_drbd2_2
 
 ms DrbdClone2 group_drbd2 \
   meta master_max=1 master-mode-max=1 clone-max=2 clone-node- 
 max=1 notify=true
 
 location cli-prefer-xen_typo3 xen_typo3 \
   rule $id=cli-prefer-rule-xen_typo3 inf: #uname eq xen10
 
 colocation xen_typo3_and_drbd inf: xen_typo3 DrbdClone2:Master
 
 order xen_typo3_after_drbd inf: DrbdClone2:promote xen_typo3:start
 
 property $id=cib-bootstrap-options \
   dc-version=1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b \
   cluster-infrastructure=Heartbeat \
   stonith-enabled=false \
   no-quorum-policy=ignore \
   last-lrm-refresh=1331211005 \
   maintenance-mode=false
 rsc_defaults $id=rsc-options \
   resource-stickiness=100
 
 
 TIA
 
 Helmut Wollmersdorfer
 
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] ocf:heartbeat:exportfs fails tests

2012-07-12 Thread Dejan Muhamedagic
Hi,

On Thu, Jul 12, 2012 at 12:10:28PM +0200, EXTERNAL Konold Martin (erfrakon, 
RtP2/TEF72) wrote:
 
 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of EXTERNAL Konold 
 Martin (erfrakon, RtP2/TEF72)
 Sent: Donnerstag, 12. Juli 2012 10:08
 To: 'General Linux-HA mailing list'
 Subject: Re: [Linux-HA] ocf:heartbeat:exportfs fails tests
 
 Hi,
 
   I am wondering why migrating a simple nfs resource leads to fencing of 
   the cluster node.
 
  Still trying to resolve this.
 
 For readers oft the archive: This can be caused by insufficient timeouts.
 
 ocf:heartbeat:exportfs needs at least 100s timeout for stopping in order to 
 work properly

stop timeout is advertised as 10s for exportfs. That's probably
too short. But, do you know why does it take so long to stop?
100s sounds excessive.

Thanks,

Dejan

 Best regards
 
 Martin Konold
 
 Robert Bosch GmbH
 Automotive Electronics
 Postfach 13 42
 72703 Reutlingen
 GERMANY
 www.bosch.comhttp://www.bosch.com
 
 Tel. +49 7121 35 3322
 
 Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
 Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Volkmar 
 Denner, Siegfried Dais; Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm, 
 Dirk Hoheisel, Christoph Kübel, Uwe Raschke, Wolf-Henning Scheider, Werner 
 Struth, Peter Tyroller
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat over VPN

2012-07-12 Thread Dejan Muhamedagic
Hi,

On Wed, Jul 11, 2012 at 04:24:42AM +0700, Nanang Purnomo wrote:
 I want to implement a failover cluster server with heartbeat, but the
 problem I use vpn network. Is the heartbeat can be run through two
 different networks?

Sure. Just make sure that the port is open and that various
parameters fit your network.

Now, if it's a two-node cluster, you need a stonith solution
which runs over another independent media. If that's not
possible, you'll need an arbitrator in the third site.

Thanks,

Dejan

 I hope you give me solution,please
 
 
 Best Regards,
 Nanang
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Choosing Monitoring System

2012-07-12 Thread Dejan Muhamedagic
Hi,

On Wed, Jul 11, 2012 at 08:40:13PM +0200, Michael Schwartzkopff wrote:
Hi,

I would like to build a HA Solution with 2 servers with 1 in
hot-standy i.e. fail-over. The advice so far was to use GlusterFS
(replicated) for the filesystem and mysql replication (master-slave)
for the database. The purpose is a web server (apache) with a typo3
CMS.

In the event of a failure I need to run a script to perform the actual
failover (switch ip via web request (Hetzner robot) and change mysql
replication settings).

Would you recommend to use corosync/pacemaker to monitor the
   
   database,
   
gluster and apache or should I simply check the availability of a DB
generated Web Page from a script in a loop?

Any other recommendations?
   
   Yes!
   Set up real cluster, use DRBD to replicate the data between the nodes and
   pacamaker to monitor the resources (IP, filesystem, database and
   webserver).
   pacemaker will do the failover in case of problems.
   
   See: http://www.linbit.at/training/webseminare-auf-abruf/mysql-
   replikation-
   mit-pacemaker/
   
Since I don't have a lot of corosync experience it seems to me like a
huge task (overkill) for a simple monitoring of a failover scenario.

What do you think?
   
   I heard there is a very good book from O'Reilly ;-)
   
   Greetings,
   
   --
   Dr. Michael Schwartzkopff
  
  Hi Michael,
  
  the problems starts when I start to think about fencing: I am working with
  a standard root server, it simply doesn't have a STONITH hardware. I've
  read various threads which basically say if you don't have proper fencing
  when it's all your fault ... You don't really suggest to use the
  ssh-stonith for production ... 
 
 STONITH idea for Hetzner:
 http://lists.linux-ha.org/pipermail/linux-ha/2011-May/043187.html

external/hetzner is distributed with cluster-glue.

Thanks,

Dejan

  I could externally trigger a hardware reset
  of a machine and reroute the ip to the other server. Would that be a valid
  stonith action? I did actually like the book, but fencing takes some fun
  out of the whole setup (makes it a lot more complicated). Also I tend to
  believe it's best to use the database replication methods if available
  rather than do it via the file system.
 
 DRBD is block replication, not file system sync.
 
  I didn't like drbd too much, since there is no easy way to gain access to
  the filesystem on the slave machine. I never got the whole stack in dual
  primary mode with ocfs2 running. Gluster easily provides for this.
  
  Stefan
  
  P.s.  :-) I did actually like the book, but the fencing part is a killer
 
 I would be happy on a 5-star comment on amazon ;-)
 
 Do as you like. But setting up a Linux Cluster including fencing and DRBD is 
 not a big deal. And if your compare it with writing your own failover scripts 
 ...
 
 -- 
 Dr. Michael Schwartzkopff
 Guardinistr. 63
 81375 München
 
 Tel: (0163) 172 50 98



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] DRBD perfomance on IBM M5110e controller

2012-07-12 Thread Ken Dechick
Hello list, 

Odd one for you today. We recently started looking at the new IBM x3650M4 
server as our next high-end machine for all clients. We have deployed several 
standalone installations at smaller clients, running CentOS 6.2 - things have 
been running very well - this is a monster of a server - 1U rack mount w/ 16 
600GB HDD's in RAID 10 configuration with 132GB RAM. 
Couple days ago I began setting up a pair of machines in HA as that's the next 
logical step. Suddenly running into a DRBD perfomance issue I do not 
understand. I set this pair of machines up the same way I always did under 
CentOS 5.3: 

- 2 servers in active/passive configuration 
IBM x3650M4 with 4.3TB DRBD partiton across 16 600GB RAID10 2.5 HDDs 
RAID Controller: IBM MegaRAID M5110e (LSI SAS2208 Thunderbolt) 
132GB system RAM 
- CentOS 6.2 on Kernel 2.6.32-220.7.1.el6.x86_64 
- heartbeat v3.0.4 
- pacemaker v1.1.6 
- DRBD v8.4.1 
- using the standard tuning I developed past couple years with IBM hardware and 
the handy DRBD tuning guide: 
- deadline scheduler via elevator=deadline in kernel command line 
- using this drbd.conf: 


global { usage-count yes; } 
common { 
handlers { 
pri-on-incon-degr /usr/local/bin/support_drbd_deg; 
split-brain /usr/local/bin/support_drbd_sb; 
fence-peer /usr/lib/drbd/crm-fence-peer.sh; 
fence-peer /usr/lib64/heartbeat/drbd-peer-outdater -t 5; 
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; 
} 
disk { 
resync-rate 300M; # limit the bandwidth which may be used by background 
# synchronizations; use 30M for 1Gb NIC, 300M for 10Gb NIC 
al-extents 3833; # Must be prime, number of active sets. 
on-io-error detach; # What to do when the lower level device errors. 
disk-barrier no; 
disk-flushes no; 
md-flushes no; 
fencing resource-only; 
#size 1000G; # for setting exact size of DRBD resource - DO NOT uncomment 
this!! 
#become-primary-on node-name # use this for DRBD withOUT heartbeat 
} 
net { 
protocol C; 
verify-alg md5; # can also use md5, crc32c, ect 
csums-alg md5; # can also use md5, crc32c, ect 
#timeout 60; # 6 seconds (unit = 0.1 seconds) 
#connect-int 10; # 10 seconds (unit = 1 second) 
#ping-int 10; # 10 seconds (unit = 1 second) 
#ping-timeout 5; # 500 ms (unit = 0.1 seconds) 
unplug-watermark 131072; # flush RAID controller buffers 
max-buffers 8; #datablock buffers used before writing to disk. 
max-epoch-size 2; # set max transfer size 
sndbuf-size 0; 
rcvbuf-size 0; 
ko-count 4; # Peer is dead if this count is exceeded. 
after-sb-0pri discard-zero-changes; 
after-sb-1pri consensus; 
after-sb-2pri disconnect; 
rr-conflict disconnect; 
cram-hmac-alg sha256; 
} 
} 
resource drbd0 { 
options { 
cpu-mask 0; 
on-no-data-accessible io-error; 
} 
device /dev/drbd0; 
disk /dev/sda4; 
meta-disk internal; 
on mofpeasHA1 { 
address 10.211.32.1:7789; 
} 
on mofpeasHA2 { 
address 10.211.32.2:7789; 
} 
} 

I did of course diligently read through all of the documentation for DRBD 
v8.4.1 with it being my first time above v8.3.7 on CentOS 5.3. Found these new 
option that sounded flavorful: 

options { 
cpu-mask 0; 
on-no-data-accessible io-error; 
} 

Also found that many options had moved around ( resync-rate replacing the old 
rate, no more syncer section, etc ), so I modified the config we have been 
using for a few years to reflect all these new changes I learned about. The 
above config seems to work just fine at this point. 

Doing a simple test of copying 4.5GB of data from my DRBD partition directly to 
memory (/dev/shm/.) I have plenty of room there: 

tmpfs 64G 4.5G 59G 8% /dev/shm 

- echo 3  /proc/sys/vm/drop_caches - first I drop cache for an accurate test 
- time cp -rp /usr/medent/tapetest/ /dev/shm/. - here I copy a dir with 
roughtly 4.5GB of random data real world data to system memory 

real 0m57.623s 
user 0m0.001s 
sys 0m0.188s 


Wow that takes a long time - almost a full minute. Doesn't seem right as this 
machine is blazing fast. So I clear the cache and /dev/shm, then try the same 
test but pulling the same data from a non-DRBD partition: 


- echo 3  /proc/sys/vm/drop_caches - first I drop cache for an accurate test 
- time cp -rp /root/tapetest/ /dev/shm/. 

real 0m7.625s 
user 0m0.064s 
sys 0m3.272s 

- Quite a large difference there!! 
I can reproduce this over and over again - happens if DRBD is online and fully 
replicating or if I take one node down to run without any replication going on 
just to be sure. 

-We did so much tuning in DRBD back with CentOS 5 and IBM MegaRAID M5015 in 
similar RAID10 that I would hate to strip down my config to the default after 
install and start from scratch 


-- 
Kenneth DeChick 
Linux Systems Administrator 
-- MEDENT -- 


Kirk to Enterprise -- beam down yeoman Rand and a six-pack. 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] ocf:heartbeat:exportfs fails tests

2012-07-12 Thread Martin Marji Cermak
On 13 July 2012 01:07, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Hi,

 On Thu, Jul 12, 2012 at 12:10:28PM +0200, EXTERNAL Konold Martin
 (erfrakon, RtP2/TEF72) wrote:
 
  -Original Message-
  From: linux-ha-boun...@lists.linux-ha.org
  [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of EXTERNAL Konold
  Martin (erfrakon, RtP2/TEF72)
  Sent: Donnerstag, 12. Juli 2012 10:08
  To: 'General Linux-HA mailing list'
  Subject: Re: [Linux-HA] ocf:heartbeat:exportfs fails tests
 
  Hi,
 
I am wondering why migrating a simple nfs resource leads to fencing
of the cluster node.
 
   Still trying to resolve this.
 
  For readers oft the archive: This can be caused by insufficient
  timeouts.
 
  ocf:heartbeat:exportfs needs at least 100s timeout for stopping in order
  to work properly

 stop timeout is advertised as 10s for exportfs. That's probably
 too short. But, do you know why does it take so long to stop?
 100s sounds excessive.


I believe 100s will be needed only if you have the
wait_for_leasetime_on_stop parameter set to true.
It's needed only if you use NFSv4. It causes the RA to wait  NFS grace
period, which is usually 90 seconds, when stopping.

If you don't use NFSv4, they set this parameter to false and you will
save NFS grace period amount of time.
If you do use NFSv4, you can try configure your NFS server to use
lower grace period - see chapter 6.1. NFSv4 lease time of this very
good document:
www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf

Kind regards,
Marji


 Thanks,

 Dejan

  Best regards
 
  Martin Konold
 
  Robert Bosch GmbH
  Automotive Electronics
  Postfach 13 42
  72703 Reutlingen
  GERMANY
  www.bosch.comhttp://www.bosch.com
 
  Tel. +49 7121 35 3322
 
  Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
  Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Volkmar
  Denner, Siegfried Dais; Stefan Asenkerschbaumer, Bernd Bohr, Rudolf Colm,
  Dirk Hoheisel, Christoph Kübel, Uwe Raschke, Wolf-Henning Scheider, Werner
  Struth, Peter Tyroller
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Services within XEN DomU oom-killed

2012-07-12 Thread Vadym Chepkov

On Jul 12, 2012, at 9:05 AM, Helmut Wollmersdorfer wrote:

 location cli-prefer-xen_typo3 xen_typo3 \
   rule $id=cli-prefer-rule-xen_typo3 inf: #uname eq xen10


Did you forget ?

crm resource unmove xen_typo3


Cheers,
Vadym

signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems