Re: [Linux-ha-dev] [Problem] The designation of the S option seems to have a problem.
Hi Dejan, I agree to your opinion, too. I think that the abolition of hb_report included in glue is right. If the abolition is decided, we use the hb_report function of crm_report or the crm shell. Best Regards, Hideo Yamauchi. - Original Message - > From: Dejan Muhamedagic> To: MLLIST-HA-DEV > Cc: > Date: 2016/5/3, Tue 20:58 > Subject: Re: [Linux-ha-dev] [Problem] The designation of the S option seems > to have a problem. > > Hi Hideo-san, > > On Mon, May 02, 2016 at 04:57:09PM +0900, renayama19661...@ybb.ne.jp wrote: >> Hi All, >> >> The S option of hb_report does not work well. >> Mr. Kristoer made similar modifications in hb_report of the crm shell. >> >> * https://github.com/ClusterLabs/crmsh/issues/137 >> >> I just request this correction in glue. > > Thanks for the patch. But I think that we should deprecate > hb_report in favour of crm report, no use keeping two copies > around. > > Cheers, > > Dejan > >> Best Regards, >> Hideo Yamauchi. > > >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ > > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Problem] The designation of the S option seems to have a problem.
Hi All, The S option of hb_report does not work well. Mr. Kristoer made similar modifications in hb_report of the crm shell. * https://github.com/ClusterLabs/crmsh/issues/137 I just request this correction in glue. Best Regards, Hideo Yamauchi. option_s.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] oracle RA - Change of the judgment of the check_mon_user processing.
Hi Dejan, Thank you for comments. 415 if echo $output | grep -w EXPIRED /dev/null; then Also, could you verify if common_sql_filter() need modifications? I will confirm it once again tomorrow. I send a patch once again, if necessary. Best Regards, Hideo Yamacuhi. - Original Message - From: Dejan Muhamedagic deja...@fastmail.fm To: renayama19661...@ybb.ne.jp; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 18:53 Subject: Re: [Linux-ha-dev] [Patch] oracle RA - Change of the judgment of the check_mon_user processing. On Tue, Jul 22, 2014 at 11:57:04AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, Consideration when NLS_LANG is set for other languages in oracle resource agent is necessary. I attached a patch. The patch looks good. I wonder if this string is also translated: 415 if echo $output | grep -w EXPIRED /dev/null; then Also, could you verify if common_sql_filter() need modifications? Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.
Hi Dejan, All right!! Is that with the latest version? I confirm RA now in Oracle12c. It is the latest edition of oracle. Many Thanks! Hideo Yamauchi. - Original Message - From: Dejan Muhamedagic deja...@fastmail.fm To: renayama19661...@ybb.ne.jp; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 18:46 Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent. Hi Hideo-san, On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I am going to explain the next change to our user. * https://github.com/ClusterLabs/resource-agents/pull/367 * https://github.com/ClusterLabs/resource-agents/pull/439 Let me confirm whether it is the next contents that a patch intends. 1) Because it was a problem that OCFMON user was added while the oracle manager did not know it, patch changed it to appoint it explicitly. The OCFMON user and password parameters are optional, hence in this respect nothing really changed. The user is still created by the RA. However, it is good that they're now visible in the meta-data. 2) Patch changed a deadline of OCFMON.(A deadline for password of the default may be 180 days.) That's the problem we had with the previous version. Now there's a profile created for the monitoring user which has unlimited password expiry. If the password expired in the meantime, due to a missing profile, then it is reset. If the monitor still fails, the RA tries as sysdba again. 3) Patch kept compatibility with old RA. Yes. Is there the main point of any other patches? No. If there is really the problem that occurred, before this change, please teach to me. As mentioned above, the issue was that the password could expire. I intend to really show the problem that happened to a user. * For example, a time limit of OCFMON expired and failed in a monitor of oracle Is that with the latest version? Cheers, Dejan I am going to send a patch later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent.
Hi Dejan, I confirmed it in the environment where NLS_LANG was set in Japanese(Japanese_Japan.AL32UTF8). I changed the expiration date of the OCFMON user and pushed forward the date of the system for one year. I confirmed that the next processing worked definitely.(...on oracle12c) Confirmed 1) After OCFMON user became expired (EXPIRED), the monitor processing in the sysdba user succeeds. Confirmed 2) The grep judgment of the EXPIRED character string is carried out definitely. Confirmed 3) When we start oracle again after OCFMON user expired, the time limit of the OCFMON user is changed. 415 if echo $output | grep -w EXPIRED /dev/null; then Also, could you verify if common_sql_filter() need modifications? As a result, the correction of the next grep was not necessary.(Confirmed 2,Confirmed 3) Best Regards, Hideo Yamauchi. - Original Message - From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp To: Dejan Muhamedagic deja...@fastmail.fm; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 20:50 Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent. Hi Dejan, All right!! Is that with the latest version? I confirm RA now in Oracle12c. It is the latest edition of oracle. Many Thanks! Hideo Yamauchi. - Original Message - From: Dejan Muhamedagic deja...@fastmail.fm To: renayama19661...@ybb.ne.jp; High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Cc: Date: 2014/7/22, Tue 18:46 Subject: Re: [Linux-ha-dev] [Question] About the change of the oracle resource agent. Hi Hideo-san, On Tue, Jul 22, 2014 at 11:07:29AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I am going to explain the next change to our user. * https://github.com/ClusterLabs/resource-agents/pull/367 * https://github.com/ClusterLabs/resource-agents/pull/439 Let me confirm whether it is the next contents that a patch intends. 1) Because it was a problem that OCFMON user was added while the oracle manager did not know it, patch changed it to appoint it explicitly. The OCFMON user and password parameters are optional, hence in this respect nothing really changed. The user is still created by the RA. However, it is good that they're now visible in the meta-data. 2) Patch changed a deadline of OCFMON.(A deadline for password of the default may be 180 days.) That's the problem we had with the previous version. Now there's a profile created for the monitoring user which has unlimited password expiry. If the password expired in the meantime, due to a missing profile, then it is reset. If the monitor still fails, the RA tries as sysdba again. 3) Patch kept compatibility with old RA. Yes. Is there the main point of any other patches? No. If there is really the problem that occurred, before this change, please teach to me. As mentioned above, the issue was that the password could expire. I intend to really show the problem that happened to a user. * For example, a time limit of OCFMON expired and failed in a monitor of oracle Is that with the latest version? Cheers, Dejan I am going to send a patch later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Question] About the change of the oracle resource agent.
Hi All, I am going to explain the next change to our user. * https://github.com/ClusterLabs/resource-agents/pull/367 * https://github.com/ClusterLabs/resource-agents/pull/439 Let me confirm whether it is the next contents that a patch intends. 1) Because it was a problem that OCFMON user was added while the oracle manager did not know it, patch changed it to appoint it explicitly. 2) Patch changed a deadline of OCFMON.(A deadline for password of the default may be 180 days.) 3) Patch kept compatibility with old RA. Is there the main point of any other patches? If there is really the problem that occurred, before this change, please teach to me. I intend to really show the problem that happened to a user. * For example, a time limit of OCFMON expired and failed in a monitor of oracle I am going to send a patch later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] oracle RA - Change of the judgment of the check_mon_user processing.
Hi All, Consideration when NLS_LANG is set for other languages in oracle resource agent is necessary. I attached a patch. Best Regards, Hideo Yamauchi. trac2891.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch:crmsh] Correction of the mistake of the processing to transfer comment.
Hi Kristoffer, Sorry, I should have mentioned that I applied the patch to the development version, not to 1.2.5, when testing. I suspect that the difference is that in older versions, comments were stripped completely from the configuration, but in newer versions, comments are kept. However, it seems that with this patch there are comments generated in the XML code that the CLI syntax cannot represent. I have not had time to completely investigate. I will look into the problem further and let you know what I find. It was recognized that the patch which I donated was unnecessary after all. rpm which we used somehow or other seemed to have a problem. I withdraw a patch. Best Regards, Hideo Yamauchi. --- On Wed, 2014/1/15, Kristoffer Grönlund kgronl...@suse.com wrote: On Tue, 14 Jan 2014 12:31:29 +0900 (JST) renayama19661...@ybb.ne.jp wrote: Hi Kristoffer, In addition, the error did not happen on the edit test. I passed the test of edit even if I did not apply my patch even if I applied a patch. Did you execute the command of what kind of test? Sorry, I should have mentioned that I applied the patch to the development version, not to 1.2.5, when testing. I suspect that the difference is that in older versions, comments were stripped completely from the configuration, but in newer versions, comments are kept. However, it seems that with this patch there are comments generated in the XML code that the CLI syntax cannot represent. I have not had time to completely investigate. I will look into the problem further and let you know what I find. Thank you, * on crmsh-7cd5688c164d.tar(tip) (snip) [root@rh64-2744 test]# ./regression.sh confbasic. checking... PASS confbasic-xml. checking... PASS edit checking... PASS (snip) * on crmsh-ef3f08547688(1.2.5) (snip) [root@rh64-2744 test]# ./regression.sh confbasic. checking... PASS confbasic-xml. checking... FAIL edit. checking... PASS (snip) Best Regards, Hideo Yamauchi. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch:crmsh] Correction of the mistake of the processing to transfer comment.
Hi Krstoffer, As for the problem, contents of rpm of crmsh1.2.5 which we used seem to have a problem somehow or other. The problem did not occur in crm which I made from a source code of crmsh1.2.5. The application of the patch which I donated seems to be unnecessary. I confirm the details and contact me again. Best Regards, Hideo Yamauchi. --- On Sun, 2014/1/12, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Kristoffer, Thank you for comment. I will look the day after tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2014/1/10, Kristoffer Grönlund kgronl...@suse.com wrote: On Fri, 10 Jan 2014 16:27:47 +0900 (JST) renayama19661...@ybb.ne.jp wrote: Hi Dejan, I send a patch of crmsh1.2.5. Similar correction seems to be necessary for latest crmsh. Best Regards, Hideo Yamauchi Hello, Thank you for the patch! I tried applying the patch to the latest crmsh, but when running the regression test suite, I got an error. I think the patch is fixing a bug, but unfortunately it seems to reveal a different problem. Maybe you can help me figure out what is going wrong! Failing test case output included below: [ 77s] Fri Jan 10 12:38:47 UTC 2014: BEGIN testcase edit [ 77s] -- [ 77s] testcase edit failed [ 77s] output is in crmtestout/edit.out [ 77s] diff (from crmtestout/edit.diff): [ 77s] --- /usr/share/crmsh/tests/testcases/edit.exp 2014-01-10 12:38:36.0 + [ 77s] +++ - 2014-01-10 12:38:53.149599264 + [ 77s] @@ -84,4 +84,8 @@ [ 77s] .TRY configure rsc_defaults $id=rsc_options failure-timeout=10m [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure filter sed 's/2m/60s/' cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show rsc_options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] rsc_defaults $id=rsc_options \ [ 77s] @@ -89,3 +93,6 @@ [ 77s] .TRY configure property stonith-enabled=true [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] @@ -94,4 +101,8 @@ [ 77s] .TRY configure filter 'sed s/stonith-enabled=.true.//' [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +ERROR: 13: syntax: Unknown command near xml parsing 'xml rsc_location id=loc-d1 rsc=d1 !--# -- rule id=r1 score=-INFINITY boolean-op=or expression operation=not_defined attribute=webserver id=loc-d1-expression/ expression attribute=mem type=number operation=lte value=0 id=loc-d1-expression-3/ /rule rule id=loc-d1-rule score=-INFINITY expression operation=not_defined attribute=a2 id=loc-d1-expression-2/ /rule rule id=r2 score-attribute=webserver expression operation=defined attribute=webserver id=loc-d1-expression-0/ /rule !--# -- /rsc_location' [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] - default-action-timeout=60s [ 77s] + default-action-timeout=60s \ [ 77s] + stonith-enabled=true [ 77s] -- [ 77s] Fri Jan 10 12:38:53 UTC 2014: END testcase edit -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch:crmsh] Correction of the mistake of the processing to transfer comment.
Hi Kristoffer, In addition, the error did not happen on the edit test. I passed the test of edit even if I did not apply my patch even if I applied a patch. Did you execute the command of what kind of test? * on crmsh-7cd5688c164d.tar(tip) (snip) [root@rh64-2744 test]# ./regression.sh confbasic. checking... PASS confbasic-xml. checking... PASS edit checking... PASS (snip) * on crmsh-ef3f08547688(1.2.5) (snip) [root@rh64-2744 test]# ./regression.sh confbasic. checking... PASS confbasic-xml. checking... FAIL edit. checking... PASS (snip) Best Regards, Hideo Yamauchi. --- On Tue, 2014/1/14, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Krstoffer, As for the problem, contents of rpm of crmsh1.2.5 which we used seem to have a problem somehow or other. The problem did not occur in crm which I made from a source code of crmsh1.2.5. The application of the patch which I donated seems to be unnecessary. I confirm the details and contact me again. Best Regards, Hideo Yamauchi. --- On Sun, 2014/1/12, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Kristoffer, Thank you for comment. I will look the day after tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2014/1/10, Kristoffer Grönlund kgronl...@suse.com wrote: On Fri, 10 Jan 2014 16:27:47 +0900 (JST) renayama19661...@ybb.ne.jp wrote: Hi Dejan, I send a patch of crmsh1.2.5. Similar correction seems to be necessary for latest crmsh. Best Regards, Hideo Yamauchi Hello, Thank you for the patch! I tried applying the patch to the latest crmsh, but when running the regression test suite, I got an error. I think the patch is fixing a bug, but unfortunately it seems to reveal a different problem. Maybe you can help me figure out what is going wrong! Failing test case output included below: [ 77s] Fri Jan 10 12:38:47 UTC 2014: BEGIN testcase edit [ 77s] -- [ 77s] testcase edit failed [ 77s] output is in crmtestout/edit.out [ 77s] diff (from crmtestout/edit.diff): [ 77s] --- /usr/share/crmsh/tests/testcases/edit.exp 2014-01-10 12:38:36.0 + [ 77s] +++ - 2014-01-10 12:38:53.149599264 + [ 77s] @@ -84,4 +84,8 @@ [ 77s] .TRY configure rsc_defaults $id=rsc_options failure-timeout=10m [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure filter sed 's/2m/60s/' cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show rsc_options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] rsc_defaults $id=rsc_options \ [ 77s] @@ -89,3 +93,6 @@ [ 77s] .TRY configure property stonith-enabled=true [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] @@ -94,4 +101,8 @@ [ 77s] .TRY configure filter 'sed s/stonith-enabled=.true.//' [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +ERROR: 13: syntax: Unknown command near xml parsing 'xml rsc_location id=loc-d1 rsc=d1 !--# -- rule id=r1 score=-INFINITY boolean-op=or expression operation=not_defined attribute=webserver id=loc-d1-expression/ expression attribute=mem type=number operation=lte value=0 id=loc-d1-expression-3/ /rule rule id=loc-d1-rule score=-INFINITY expression operation=not_defined attribute=a2 id=loc-d1-expression-2/ /rule rule id=r2 score-attribute=webserver expression operation=defined attribute=webserver id=loc-d1-expression-0/ /rule !--# -- /rsc_location' [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] - default-action-timeout=60s [ 77s] + default-action-timeout=60s \ [ 77s] + stonith-enabled=true [ 77s] -- [ 77s] Fri Jan 10 12:38:53 UTC 2014: END testcase edit -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___
Re: [Linux-ha-dev] [Patch:crmsh] Correction of the mistake of the processing to transfer comment.
Hi Kristoffer, Thank you for comment. I will look the day after tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2014/1/10, Kristoffer Grönlund kgronl...@suse.com wrote: On Fri, 10 Jan 2014 16:27:47 +0900 (JST) renayama19661...@ybb.ne.jp wrote: Hi Dejan, I send a patch of crmsh1.2.5. Similar correction seems to be necessary for latest crmsh. Best Regards, Hideo Yamauchi Hello, Thank you for the patch! I tried applying the patch to the latest crmsh, but when running the regression test suite, I got an error. I think the patch is fixing a bug, but unfortunately it seems to reveal a different problem. Maybe you can help me figure out what is going wrong! Failing test case output included below: [ 77s] Fri Jan 10 12:38:47 UTC 2014: BEGIN testcase edit [ 77s] -- [ 77s] testcase edit failed [ 77s] output is in crmtestout/edit.out [ 77s] diff (from crmtestout/edit.diff): [ 77s] --- /usr/share/crmsh/tests/testcases/edit.exp 2014-01-10 12:38:36.0 + [ 77s] +++ - 2014-01-10 12:38:53.149599264 + [ 77s] @@ -84,4 +84,8 @@ [ 77s] .TRY configure rsc_defaults $id=rsc_options failure-timeout=10m [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure filter sed 's/2m/60s/' cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show rsc_options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] rsc_defaults $id=rsc_options \ [ 77s] @@ -89,3 +93,6 @@ [ 77s] .TRY configure property stonith-enabled=true [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] @@ -94,4 +101,8 @@ [ 77s] .TRY configure filter 'sed s/stonith-enabled=.true.//' [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] +ERROR: 13: syntax: Unknown command near xml parsing 'xml rsc_location id=loc-d1 rsc=d1 !--# -- rule id=r1 score=-INFINITY boolean-op=or expression operation=not_defined attribute=webserver id=loc-d1-expression/ expression attribute=mem type=number operation=lte value=0 id=loc-d1-expression-3/ /rule rule id=loc-d1-rule score=-INFINITY expression operation=not_defined attribute=a2 id=loc-d1-expression-2/ /rule rule id=r2 score-attribute=webserver expression operation=defined attribute=webserver id=loc-d1-expression-0/ /rule !--# -- /rsc_location' [ 77s] .TRY configure show cib-bootstrap-options [ 77s] +INFO: object loc-d1 cannot be represented in the CLI notation [ 77s] property $id=cib-bootstrap-options \ [ 77s] - default-action-timeout=60s [ 77s] + default-action-timeout=60s \ [ 77s] + stonith-enabled=true [ 77s] -- [ 77s] Fri Jan 10 12:38:53 UTC 2014: END testcase edit -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch:crmsh] Correction of the mistake of the processing to transfer comment.
Hi Dejan, I send a patch of crmsh1.2.5. Similar correction seems to be necessary for latest crmsh. Best Regards, Hideo Yamauchi trac2744.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Exit code of reset of external/libvirt is wrong.
Hi Dejan, Thank you for comments. Many thanks for the patch. Applied (slightly modified). I wonder if we should also ignore the outcome of libvirt_start. What are the chances that it fails? I found that I failed in libvirt_start for the first time. When I use libvirt in vSphere environment, it occurs. In the vSphere environment, libvirt can carry out libvirt_stop of the fail over host of the vSphere HA, but cannot carry out libvirt_start. Probably I think that it is a problem not to happen in KVM. Best Regards, Hideo Yamauchi. --- On Fri, 2013/7/5, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Jul 05, 2013 at 08:51:14AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, The exit code of reset of external/libvirt is wrong. Indeed. Quite sloppy the latest change, my apologies. I attached a patch. Many thanks for the patch. Applied (slightly modified). I wonder if we should also ignore the outcome of libvirt_start. What are the chances that it fails? Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch]Exit code of reset of external/libvirt is wrong.
Hi All, The exit code of reset of external/libvirt is wrong. I attached a patch. Best Regards, Hideo Yamauchi. libvirt.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Problem] external/vcenter fails in stonith of the guest of the similar name.
Hi Dejan, Please revise it to add a character of ^ to a search. Applied. Thanks! I confirmed it. (http://hg.linux-ha.org/glue/rev/0809ed6abeb7) Many Thanks, Hideo Yamauchi. --- On Tue, 2012/10/23, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Oct 22, 2012 at 09:20:53AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, external/vcenter fails in stonith of the guest of the similar name. For example, as for the practice of stonith of sr2, stonith does backup-sr2 when two guests of sr2,backup-sr2 exist. The problem is a thing by the next search. $vm = Vim::find_entity_view(view_type = VirtualMachine, filter = { name = qr/\Q$host_to_vm{$targetHost}\E/i }); It seems to be caused by the fact that the correction that Mr. Lars pointed out before leaks out. * http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-April/018397.html (snip) Unless this filter thing has a special mode where it internally does a $x eq $y for scalars and $x =~ $y for explicitly designated qr// Regexp objects, I'd suggest to here also do filter = { name = qr/^\Q$realTarget\E$/i } (snip) Please revise it to add a character of ^ to a search. Applied. Thanks! Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Problem] external/vcenter fails in stonith of the guest of the similar name.
Hi All, external/vcenter fails in stonith of the guest of the similar name. For example, as for the practice of stonith of sr2, stonith does backup-sr2 when two guests of sr2,backup-sr2 exist. The problem is a thing by the next search. $vm = Vim::find_entity_view(view_type = VirtualMachine, filter = { name = qr/\Q$host_to_vm{$targetHost}\E/i }); It seems to be caused by the fact that the correction that Mr. Lars pointed out before leaks out. * http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-April/018397.html (snip) Unless this filter thing has a special mode where it internally does a $x eq $y for scalars and $x =~ $y for explicitly designated qr// Regexp objects, I'd suggest to here also do filter = { name = qr/^\Q$realTarget\E$/i } (snip) Please revise it to add a character of ^ to a search. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi Dejan, Hi Andrew, I confirmed the update with the patch of glue. * http://hg.linux-ha.org/glue/rev/579e45f957b6 Many Thanks! Hideo Yamauchi. --- On Fri, 2012/10/12, Dejan Muhamedagic de...@suse.de wrote: Hi, On Fri, Oct 12, 2012 at 08:31:21AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Hi Dejan, Makes sense to me. With the patch, the effective options are create+op rather than create+op1+op2+op3... Will it be a meaning to change the structure of the op-done message? I cannot change op message when I think about other influence. I think that a patch is right by the op message of present lrmd and crmd. We want to apply a patch to glue early if we can do it. I'll do some testing first. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/10/11, Andrew Beekhof beek...@gmail.com wrote: On Wed, Oct 10, 2012 at 11:21 PM, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We found pacemaker that we could not judge a result of the operation of lrmd well. When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor. (snip) primitive prmDiskd ocf:pacemaker:Dummy \ params name=diskcheck_status_internal device=/dev/vda interval=30 \ op start interval=0 timeout=60s on-fail=restart prereq=fencing \ op monitor interval=30s timeout=60s on-fail=restart \ op stop interval=0s timeout=60s on-fail=block (snip) This is because lrmd gives back prereq parameter of start as a result of monitor operation. As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation. We can confirm this problem by the next command in Pacemaker1.0.12. Command 1) crm_verify command outputs the difference in digest cord. [root@rh63-heartbeat1 ~]# crm_verify -L crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Command 2) The ptest command outputs the difference in digest cord, too. [root@rh63-heartbeat1 ~]# ptest -L -VV ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 [root@rh63-heartbeat1 ~]# Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource. Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 (local) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_3 ) Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing]
[Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi All, We found pacemaker that we could not judge a result of the operation of lrmd well. When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor. (snip) primitive prmDiskd ocf:pacemaker:Dummy \ params name=diskcheck_status_internal device=/dev/vda interval=30 \ op start interval=0 timeout=60s on-fail=restart prereq=fencing \ op monitor interval=30s timeout=60s on-fail=restart \ op stop interval=0s timeout=60s on-fail=block (snip) This is because lrmd gives back prereq parameter of start as a result of monitor operation. As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation. We can confirm this problem by the next command in Pacemaker1.0.12. Command 1) crm_verify command outputs the difference in digest cord. [root@rh63-heartbeat1 ~]# crm_verify -L crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Command 2) The ptest command outputs the difference in digest cord, too. [root@rh63-heartbeat1 ~]# ptest -L -VV ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 [root@rh63-heartbeat1 ~]# Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource. Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 (local) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_3 ) Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[3] CRM_meta_timeout=[6] cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3 (call=4, status=1, cib-update=0, confirmed=true) Cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: yamauchi Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_3 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: parameters device=/dev/vda name=diskcheck_status_internal interval=30 prereq=fencing CRM_meta_timeout=6/ Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3 (call=5, rc=0, cib-update=53, confirmed=false) ok Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: match_graph_event: Action prmDiskd:0_monitor_3 (1) confirmed on rh63-heartbeat1 (rc=0) It is a problem to judge crmd that a digest cord is changed in not changing the parameter at all. I made a patch. The lrmd always gives back only a parameter depended on to a result from crmd and is a patch copying
Re: [Linux-ha-dev] [Patch] The problem that the cord of the digest cord of crmd becomes mismatched for.
Hi Dejan, Thank you for comments. I wait for comment of Andrew. I hope that a problem is settled with a patch. Many thanks, Hideo Yamauhci. --- On Wed, 2012/10/10, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Wed, Oct 10, 2012 at 03:22:08PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We found pacemaker that we could not judge a result of the operation of lrmd well. When we carry out following crm, a parameter of the operation of start is given back to crmd as a result of operation of monitor. (snip) primitive prmDiskd ocf:pacemaker:Dummy \ params name=diskcheck_status_internal device=/dev/vda interval=30 \ op start interval=0 timeout=60s on-fail=restart prereq=fencing \ op monitor interval=30s timeout=60s on-fail=restart \ op stop interval=0s timeout=60s on-fail=block (snip) This is because lrmd gives back prereq parameter of start as a result of monitor operation. As a result, crmd judge mismatched with a parameter of the monitor operation that crmd asked lrmd for for the parameter that Irmd carried out of the monitor operation. We can confirm this problem by the next command in Pacemaker1.0.12. Command 1) crm_verify command outputs the difference in digest cord. [root@rh63-heartbeat1 ~]# crm_verify -L crm_verify[19988]: 2012/10/10_20:29:58 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Command 2) The ptest command outputs the difference in digest cord, too. [root@rh63-heartbeat1 ~]# ptest -L -VV ptest[19992]: 2012/10/10_20:30:19 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ptest[19992]: 2012/10/10_20:30:19 CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 [root@rh63-heartbeat1 ~]# Command 3) By cibadmin -B command, pengine restart monitor of an unnecessary resource. Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: CRIT: check_action_definition: Parameters to prmDiskd:0_monitor_3 on rh63-heartbeat1 changed: recorded 7d7c9f601095389fc7cc0c6b29c61a7a vs. d38c85388dea5e8e2568c3d699eb9cce (reload:3.0.1) 0:0;6:1:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: RecurringOp: Start recurring monitor (30s) for prmDiskd:0 on rh63-heartbeat1 Oct 10 20:31:00 rh63-heartbeat1 pengine: [19842]: notice: LogActions: Leave resource prmDiskd:0#011(Started rh63-heartbeat1) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: unpack_graph: Unpacked transition 2: 1 actions in 1 synapses Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_te_invoke: Processing graph 2 (ref=pe_calc-dc-1349868660-20) derived from /var/lib/pengine/pe-input-2.bz2 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: te_rsc_command: Initiating action 1: monitor prmDiskd:0_monitor_3 on rh63-heartbeat1 (local) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: do_lrm_rsc_op: Performing key=1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6 op=prmDiskd:0_monitor_3 ) Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: cancel_op: operation monitor[4] on prmDiskd:0 for client 19839, its parameters: CRM_meta_clone=[0] CRM_meta_prereq=[fencing] device=[/dev/vda] name=[diskcheck_status_internal] CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[30] prereq=[fencing] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] CRM_meta_interval=[3] CRM_meta_timeout=[6] cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: rsc:prmDiskd:0 monitor[5] (pid 20009) Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: process_lrm_event: LRM operation prmDiskd:0_monitor_3 (call=4, status=1, cib-update=0, confirmed=true) Cancelled Oct 10 20:31:00 rh63-heartbeat1 lrmd: [19836]: info: operation monitor[5] on prmDiskd:0 for client 19839: pid 20009 exited with return code 0 Oct 10 20:31:00 rh63-heartbeat1 crmd: [19839]: info: append_digest: yamauchi Calculated digest 7d7c9f601095389fc7cc0c6b29c61a7a for prmDiskd:0_monitor_3 (0:0;1:2:0:ca6a5ad2-0340-4769-bab7-289a00862ba6). Source: parameters device=/dev/vda name=diskcheck_status_internal interval=30 prereq=fencing CRM_meta_timeout=6/ Oct 10 20:31:00 rh63-heartbeat1 crmd:
Re: [Linux-ha-dev] improvements of the libvirt stonith plugin
Hi All, We confirmed the connection of libvirt of Esx(vmware) on RHEL6.3. When we are connected to Esxi, different results are provided like RHEL5. [root@rh63-1 ~]# virsh -c esx://root@192.168.133.1/?no_verify=1 destroy sr2 Enter root's password for 192.168.133.1: error: Failed to destroy domain sr2 error: Requested operation is not valid: Domain is not powered on Because the following result is provided, the patch which Mr.Matsuo contributed becomes useful in Esx. [root@rh63-1 ~]# virsh -c esx://root@192.168.133.2/?no_verify=1 dominfo sr1 Enter root's password for 192.168.133.2: Id: 68 Name: sr1 UUID: 423b6068-2b19-b80d-0ef2-0c64e3ee25b3 OS Type:hvm State: running CPU(s): 2 Max memory: 2097152 kB Used memory:2097152 kB Persistent: yes Autostart: disable Managed save: unknown [root@rh63-1 ~]# virsh -c esx://root@192.168.133.1/?no_verify=1 dominfo sr2 Enter root's password for 192.168.133.1: Id: - Name: sr2 UUID: 423b9c27-cded-5616-b5c9-f04f4215b663 OS Type:hvm State: shut off CPU(s): 2 Max memory: 2097152 kB Used memory:2097152 kB Persistent: yes Autostart: disable Managed save: unknown Best Regards, Hideo Yamauchi. --- On Fri, 2012/6/1, Takatoshi MATSUO matsuo@gmail.com wrote: Hi I found that fencing of libvirt plugin is failed on Xen(RHEL5), because virsh's outputs are different. KVM on RHEL6 - # virsh destroy host1 error: Failed to destroy domain host1 error: Requested operation is not valid: domain is not running - Xen on RHEL5 - # virsh destroy host1 error: Failed to destroy domain host1 error: invalid argument in Domain host1 isn't running. - I attached a patch. Regards, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] improvements of the libvirt stonith plugin
Hi Dejan, I confirmed the adoption of the patch. Many Thanks! Hideo Yamauchi. --- On Fri, 2012/7/13, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Jul 13, 2012 at 03:52:09PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We confirmed the connection of libvirt of Esx(vmware) on RHEL6.3. When we are connected to Esxi, different results are provided like RHEL5. [root@rh63-1 ~]# virsh -c esx://root@192.168.133.1/?no_verify=1 destroy sr2 Enter root's password for 192.168.133.1: error: Failed to destroy domain sr2 error: Requested operation is not valid: Domain is not powered on Because the following result is provided, the patch which Mr.Matsuo contributed becomes useful in Esx. Good. I missed the patch somehow, sorry about that. Applied now. Many thanks for the patch to Takatoshi MATSUO. I also modified the search string a bit in a later changeset. Cheers, Dejan [root@rh63-1 ~]# virsh -c esx://root@192.168.133.2/?no_verify=1 dominfo sr1 Enter root's password for 192.168.133.2: Id: 68 Name: sr1 UUID: 423b6068-2b19-b80d-0ef2-0c64e3ee25b3 OS Type: hvm State: running CPU(s): 2 Max memory: 2097152 kB Used memory: 2097152 kB Persistent: yes Autostart: disable Managed save: unknown [root@rh63-1 ~]# virsh -c esx://root@192.168.133.1/?no_verify=1 dominfo sr2 Enter root's password for 192.168.133.1: Id: - Name: sr2 UUID: 423b9c27-cded-5616-b5c9-f04f4215b663 OS Type: hvm State: shut off CPU(s): 2 Max memory: 2097152 kB Used memory: 2097152 kB Persistent: yes Autostart: disable Managed save: unknown Best Regards, Hideo Yamauchi. --- On Fri, 2012/6/1, Takatoshi MATSUO matsuo@gmail.com wrote: Hi I found that fencing of libvirt plugin is failed on Xen(RHEL5), because virsh's outputs are different. KVM on RHEL6 - # virsh destroy host1 error: Failed to destroy domain host1 error: Requested operation is not valid: domain is not running - Xen on RHEL5 - # virsh destroy host1 error: Failed to destroy domain host1 error: invalid argument in Domain host1 isn't running. - I attached a patch. Regards, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pull request 82 for postfix ra *call for help*
Hi Raoul, As for me, you understood a meaning. And I understood that plural contents were not set at data_dir. It is that this loop is a loop in consideration of the expansion of the future directory check. Is my understanding wrong? Many Thanks. Hideo Yamauchi. --- On Wed, 2012/5/16, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hello Hideo-san! On 16.05.2012 06:22, renayama19661...@ybb.ne.jp wrote: Hi Raoul, I forgot it. Is not it necessary to convert a comma into the space from data_dir if you leave a loop of data_dir? example) data_dir=`echo $data_dir | tr ',' ' '` I think we still have a major misunderstanding :) This loop is *not* about looping multiple data directories (multiple data directories are not possible and an error is issued by the new patch) This loop is kept in place if we want to loop different, additional directories, for example the data_dir *and* the mail_spool_directory *and* the queue_directory. As of now, we do not loop more directories but the loop does not harm in any way, so I would rather keep it there. Can anyone help me to express myself in a better way or help me understand the real issue which Hideo-san wants to address? *Please* :) Cheers, Raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pull request 82 for postfix ra *call for help*
Hi Raoul, Thank you for comments. I agree to your correction. I am sorry that I confused you. Many Thanks! Hideo Yamauchi. --- On Wed, 2012/5/16, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hello Hideo-san! On 16.05.2012 08:12, renayama19661...@ybb.ne.jp wrote: Hi Raoul, As for me, you understood a meaning. And I understood that plural contents were not set at data_dir. It is that this loop is a loop in consideration of the expansion of the future directory check. Mhm, I *think* so. So can we agree that there is nothing left to do and I can issue another pull request? :) Otherwise, I'm confused on what you're expecting from me. (If it is simply removing the loop because there currently is *no need* for looping, to which i agree, I would still refrain from this particular change because we would not anything here.) Thanks, Raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, Pushed to http://hg.linux-ha.org/glue We confirmed that a problem was settled with your patch. Many Thanks! Hideo Yamauchi. --- On Thu, 2012/5/17, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Wed, May 16, 2012 at 09:33:48AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Lars, In the environment where we confirmed leak, I confirm your patch. Pushed to http://hg.linux-ha.org/glue Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, Sorry...An answer was late. In the environment where we confirmed leak, I confirm your patch. Many Thanks, Hideo Yamauchi. --- On Wed, 2012/5/16, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, May 15, 2012 at 11:14:53AM +0200, Lars Ellenberg wrote: On Mon, May 14, 2012 at 05:44:55PM +0200, Lars Ellenberg wrote: By the way, I suspect Lars' suggestion would work fine. I would certainly explain what the better patch is in the comments when you apply this one. Hm. Looks like it *does* explode (aka segfault) Continuing my monologue ... it may just have been incomplete. The patch below seems to work just fine. I managed to occasionally trigger the Attempt to remove timeout (%u) with NULL source message, but I have seen that one without the patch as well, so that may just be some other oddity somewhere: double removal of timeout resources ;-) We can find and drop those later, they look harmless enough. I do not see any memleak anywhere anymore with this patch applied. Comments/review/testing welcome. # HG changeset patch # User Lars Ellenberg l...@linbit.com # Date 1337066453 -7200 # Node ID e63dd41f46b7bd150a23a62303bde6be78305c9c # Parent 63d968249025b245e38b1da6d0202438ec45ebf3 [mq]: potential-fix-for-timer-leak diff --git a/lib/clplumbing/GSource.c b/lib/clplumbing/GSource.c --- a/lib/clplumbing/GSource.c +++ b/lib/clplumbing/GSource.c @@ -1507,6 +1507,7 @@ g_source_set_callback(source, function, data, notify); append-gsourceid = g_source_attach(source, NULL); + g_source_unref(source); return append-gsourceid; } @@ -1517,14 +1518,12 @@ GSource* source = g_main_context_find_source_by_id(NULL,tag); struct GTimeoutAppend* append = GTIMEOUT(source); - g_source_remove(tag); - if (source == NULL){ cl_log(LOG_ERR, Attempt to remove timeout (%u) with NULL source, tag); }else{ g_assert(IS_TIMEOUTSRC(append)); - g_source_unref(source); + g_source_remove(tag); } return; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pull request 82 for postfix ra
Hi Raoul, I think the only patch left is postfix.patch.1121 from http://www.gossamer-threads.com/lists/linuxha/dev/76532#76532 right? diff -r aaf72a017c98 postfix --- a/postfixMon Nov 21 10:32:33 2011 +0900 +++ b/postfixMon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if ocf_is_true $status_support; then -data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +orig_data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +data_dir=`echo $orig_data_dir | tr ',' ' '` +dcount=`echo $data_dir | wc -w` +if [ $dcount -gt 1 ]; then +ocf_log err Postfix data directory '$orig_data_dir' cannot set plural parameters. +return $OCF_ERR_PERM +fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. i would slightly modify this: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/heartbeat/postfix b/heartbeat/postfix index 273d5c9..2f4ab13 100755 --- a/heartbeat/postfix +++ b/heartbeat/postfix @@ -264,6 +264,11 @@ postfix_validate_all() if ocf_is_true $status_support; then data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +data_dir_count=`echo $data_dir | tr ',' ' ' | wc -w` +if [ $data_dir_count -gt 1 ]; then + ocf_log err Postfix data directory '$orig_data_dir' cannot be set to multiple directories. +return $OCF_ERR_INSTALLED +fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Thanks! I agree to the patch which you changed. what do you think about that? @@ -278,16 +284,14 @@ # check directory permissions if ocf_is_true $status_support; then user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` -for dir in $data_dir; do -if ! su -s /bin/sh - $user -c test -w $dir; then -if ocf_is_probe; then -ocf_log info Directory '$dir' is not writable by user '$user' during probe. -else -ocf_log err Directory '$dir' is not writable by user '$user'. -return $OCF_ERR_PERM; -fi +if ! su -s /bin/sh - $user -c test -w $data_dir; then +if ocf_is_probe; then +ocf_log info Directory '$data_dir' is not writable by user '$user' during probe. +else +ocf_log err Directory '$data_dir' is not writable by user '$user'. +return $OCF_ERR_PERM; fi -done +fi fi fi As outlined, i see no benefit in removing the loop and would like to keep it in case we want to check some other directories in the future. Okay. But, therefore does not the loop of data_dir have to change it as follows? -for dir in $data_dir; do +for dir in $data_dir; do Many Thanks, Hideo Yamauchi. --- On Tue, 2012/5/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comments. I am slightly busy. I confirm it and will send an email tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2012/5/11, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2012-05-11 02:09, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Hi Dejan, Thank you for the reflection to a repository. To Raoul : The matter of the next email is still left. Please tell your opinion. * http://www.gossamer-threads.com/lists/linuxha/dev/76409 I think the only patch left is postfix.patch.1121 from http://www.gossamer-threads.com/lists/linuxha/dev/76532#76532 right? diff -r aaf72a017c98 postfix --- a/postfix Mon Nov 21 10:32:33 2011 +0900 +++ b/postfix Mon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if ocf_is_true $status_support; then - data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + orig_data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + data_dir=`echo $orig_data_dir | tr ',' ' '` + dcount=`echo $data_dir | wc -w` + if [ $dcount -gt 1 ]; then + ocf_log err Postfix data directory
Re: [Linux-ha-dev] pull request 82 for postfix ra
Hi Raoul, I forgot it. Is not it necessary to convert a comma into the space from data_dir if you leave a loop of data_dir? example) data_dir=`echo $data_dir | tr ',' ' '` Best Regards, Hideo Yamauchi. --- On Wed, 2012/5/16, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, I think the only patch left is postfix.patch.1121 from http://www.gossamer-threads.com/lists/linuxha/dev/76532#76532 right? diff -r aaf72a017c98 postfix --- a/postfix Mon Nov 21 10:32:33 2011 +0900 +++ b/postfix Mon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if ocf_is_true $status_support; then - data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + orig_data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + data_dir=`echo $orig_data_dir | tr ',' ' '` + dcount=`echo $data_dir | wc -w` + if [ $dcount -gt 1 ]; then + ocf_log err Postfix data directory '$orig_data_dir' cannot set plural parameters. + return $OCF_ERR_PERM + fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. i would slightly modify this: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/heartbeat/postfix b/heartbeat/postfix index 273d5c9..2f4ab13 100755 --- a/heartbeat/postfix +++ b/heartbeat/postfix @@ -264,6 +264,11 @@ postfix_validate_all() if ocf_is_true $status_support; then data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + data_dir_count=`echo $data_dir | tr ',' ' ' | wc -w` + if [ $data_dir_count -gt 1 ]; then + ocf_log err Postfix data directory '$orig_data_dir' cannot be set to multiple directories. + return $OCF_ERR_INSTALLED + fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Thanks! I agree to the patch which you changed. what do you think about that? @@ -278,16 +284,14 @@ # check directory permissions if ocf_is_true $status_support; then user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` - for dir in $data_dir; do - if ! su -s /bin/sh - $user -c test -w $dir; then - if ocf_is_probe; then - ocf_log info Directory '$dir' is not writable by user '$user' during probe. - else - ocf_log err Directory '$dir' is not writable by user '$user'. - return $OCF_ERR_PERM; - fi + if ! su -s /bin/sh - $user -c test -w $data_dir; then + if ocf_is_probe; then + ocf_log info Directory '$data_dir' is not writable by user '$user' during probe. + else + ocf_log err Directory '$data_dir' is not writable by user '$user'. + return $OCF_ERR_PERM; fi - done + fi fi fi As outlined, i see no benefit in removing the loop and would like to keep it in case we want to check some other directories in the future. Okay. But, therefore does not the loop of data_dir have to change it as follows? - for dir in $data_dir; do + for dir in $data_dir; do Many Thanks, Hideo Yamauchi. --- On Tue, 2012/5/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comments. I am slightly busy. I confirm it and will send an email tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2012/5/11, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2012-05-11 02:09, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Hi Dejan, Thank you for the reflection to a repository. To Raoul : The matter of the next email is still left. Please tell your opinion. * http://www.gossamer-threads.com/lists/linuxha/dev/76409 I think the only patch left is postfix.patch.1121 from http://www.gossamer-threads.com/lists/linuxha/dev/76532#76532 right? diff -r aaf72a017c98 postfix --- a/postfix Mon Nov 21 10:32:33 2011 +0900 +++ b/postfix Mon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if
Re: [Linux-ha-dev] pull request 82 for postfix ra
Hi Raoul, Thank you for comments. I am slightly busy. I confirm it and will send an email tomorrow. Best Regards, Hideo Yamauchi. --- On Fri, 2012/5/11, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2012-05-11 02:09, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Hi Dejan, Thank you for the reflection to a repository. To Raoul : The matter of the next email is still left. Please tell your opinion. * http://www.gossamer-threads.com/lists/linuxha/dev/76409 I think the only patch left is postfix.patch.1121 from http://www.gossamer-threads.com/lists/linuxha/dev/76532#76532 right? diff -r aaf72a017c98 postfix --- a/postfix Mon Nov 21 10:32:33 2011 +0900 +++ b/postfix Mon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if ocf_is_true $status_support; then - data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + orig_data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + data_dir=`echo $orig_data_dir | tr ',' ' '` + dcount=`echo $data_dir | wc -w` + if [ $dcount -gt 1 ]; then + ocf_log err Postfix data directory '$orig_data_dir' cannot set plural parameters. + return $OCF_ERR_PERM + fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. i would slightly modify this: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/heartbeat/postfix b/heartbeat/postfix index 273d5c9..2f4ab13 100755 --- a/heartbeat/postfix +++ b/heartbeat/postfix @@ -264,6 +264,11 @@ postfix_validate_all() if ocf_is_true $status_support; then data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` + data_dir_count=`echo $data_dir | tr ',' ' ' | wc -w` + if [ $data_dir_count -gt 1 ]; then + ocf_log err Postfix data directory '$orig_data_dir' cannot be set to multiple directories. + return $OCF_ERR_INSTALLED + fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - what do you think about that? @@ -278,16 +284,14 @@ # check directory permissions if ocf_is_true $status_support; then user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` - for dir in $data_dir; do - if ! su -s /bin/sh - $user -c test -w $dir; then - if ocf_is_probe; then - ocf_log info Directory '$dir' is not writable by user '$user' during probe. - else - ocf_log err Directory '$dir' is not writable by user '$user'. - return $OCF_ERR_PERM; - fi + if ! su -s /bin/sh - $user -c test -w $data_dir; then + if ocf_is_probe; then + ocf_log info Directory '$data_dir' is not writable by user '$user' during probe. + else + ocf_log err Directory '$data_dir' is not writable by user '$user'. + return $OCF_ERR_PERM; fi - done + fi fi fi As outlined, i see no benefit in removing the loop and would like to keep it in case we want to check some other directories in the future. quoting http://www.gossamer-threads.com/lists/linuxha/dev/76453#76453 : the current loop: for dir in $data_dir; do ... done (looping exactly one dir) could easily be enhanced to check more dirs, e.g.: for dir in $data_dir $data_dir/active $data_dir/incoming; do ... done (looping three dirs) without having to re-introduce the loop. Cheers, Raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pull request 82 for postfix ra
Hi Raoul, Hi Dejan, Thank you for the reflection to a repository. To Raoul : The matter of the next email is still left. Please tell your opinion. * http://www.gossamer-threads.com/lists/linuxha/dev/76409 Best Regards, Hideo Yamauchi. --- On Thu, 2012/5/10, Dejan Muhamedagic de...@suse.de wrote: Hi Raoul, On Thu, May 10, 2012 at 01:21:42PM +0200, Raoul Bhatia [IPAX] wrote: Hi! While reviewing my repository and patch collection for the resource agents, i opened a pull request [1] for the postfix patches that have been lurking around in my repository since a couple of months. I think there is some outstanding discussion with Hideo-san but I would like to pick them up afterwards. Comments and feedback is welcome! I just pulled the patches. Thanks! Cheers, Dejan Thanks, Raoul [1] https://github.com/ClusterLabs/resource-agents/pull/82 -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Alan, Thank you for comments. FYI: there is code in the heartbeat communication layer which is quite happy to simulate lost packets. I made it difficult to turn on accidentally. Read the code for details if you're interested. All right. Many Thanks, Hideo Yamauchi. --- On Tue, 2012/5/8, Alan Robertson al...@unix.sh wrote: FYI: there is code in the heartbeat communication layer which is quite happy to simulate lost packets. I made it difficult to turn on accidentally. Read the code for details if you're interested. On 04/30/2012 10:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Lars, We confirmed that this problem occurred with v1 mode of Heartbeat. * The problem happens with the v2 mode in the same way. We confirmed a problem in the next procedure. Step 1) Put a special device extinguishing a communication packet of Heartbeat in the network. Step 2) Between nodes, the retransmission of the message is carried out repeatedly. Step 3) Then the memory of the master process increases little by little. As a result of the ps command of the master process -- * node1 (start) 32126 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 32126 ? SLs 0:03 0 182 54729 7868 0.0 heartbeat: master control process (Two hour later) 32126 ? SLs 0:08 0 182 55317 8456 0.0 heartbeat: master control process (Four hours later) 32126 ? SLs 0:24 0 182 56673 9812 0.0 heartbeat: master control process * node2 (start) 31928 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 31928 ? SLs 0:02 0 182 54481 7620 0.0 heartbeat: master control process (Two hour later) 31928 ? SLs 0:08 0 182 55353 8492 0.0 heartbeat: master control process (Four hours later) 31928 ? SLs 0:23 0 182 56689 9828 0.0 heartbeat: master control process The state of the memory leak seems to vary according to a node with the quantity of the retransmission. The increase of this memory disappears by applying my patch. And the similar correspondence seems to be necessary in send_reqnodes_msg(), but this is like little leak. Best Regards, Hideo Yamauchi. --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jprenayama19661...@ybb.ne.jp wrote: Hi Lars, Thank you for comments. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. Yes. I really measured leak. I can show a result next week. #Japan is a holiday until Tuesday. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. It seems to be necessary to release gsource somehow or other. The similar liberation seems to be carried out in lrmd. Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/27, Lars Ellenberglars.ellenb...@linbit.com wrote: On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a patch. Please confirm the contents of the patch. And please reflect a patch in a repository of Heartbeat. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. diff -r 106ca984041b heartbeat/hb_rexmit.c --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900 +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 @@ -164,6 +164,8 @@ seqno_t seq = (seqno_t) ri-seq; struct node_info* node = ri-node; struct ha_msg* hmsg; + unsigned long sourceid; + gpointer value; if (STRNCMP_CONST(node-status, UPSTATUS) != 0 STRNCMP_CONST(node-status, ACTIVESTATUS) !=0) { @@ -196,11 +198,17 @@ node-track.last_rexmit_req = time_longclock(); - if (!g_hash_table_remove(rexmit_hash_table, ri)){ - cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table - for seq/node(%ld %s), - __FUNCTION__, ri-seq, ri-node-nodename); - return FALSE; + value = g_hash_table_lookup(rexmit_hash_table, ri); + if ( value != NULL) { + sourceid = (unsigned long) value; +
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, Thank you for comments. And when it passes more than a full day * node1 32126 ? SLs 79:52 0 182 71189 24328 0.1 heartbeat: master control process * node2 31928 ? SLs 77:01 0 182 70869 24008 0.1 heartbeat: master control process Oh, I see. This is a design choice (maybe not even intentional) of the Gmain_* wrappers used throughout the heartbeat code. The real glib g_timeout_add_full(), and most other similar functions, internally do id = g_source_attach(source, ...); g_source_unref(source); return id; Thus in g_main_dispatch, the need_destroy = ! dispatch (...) if (need_destroy) g_source_destroy_internal() in fact ends up destroying it, if dispatch() returns FALSE, as documented: The function is called repeatedly until it returns FALSE, at which point the timeout is automatically destroyed and the function will not be called again. Not so with the heartbeat/glue/libplumbing Gmain_timeout_add_full. It does not g_source_unref(), so we keep the extra reference around until someone explicitly calls Gmain_timeout_remove(). Talk about principle of least surprise :( Changing this behaviour to match glib's, i.e. unref'ing after g_source_attach, would seem like the correct thing to do, but is likely to break other pieces of code in subtle ways, so it may not be the right thing to do at this point. Thank you for detailed explanation. If you found the method that is appropriate than the correction that I suggested, I approve of it. I'm going to take your patch more or less as is. If it does not show up soon, prod me again. All right. Many Thanks! Hideo Yamauchi. Thank you for tracking this down. Best Regards, Hideo Yamauchi. --- On Tue, 2012/5/1, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Lars, We confirmed that this problem occurred with v1 mode of Heartbeat. * The problem happens with the v2 mode in the same way. We confirmed a problem in the next procedure. Step 1) Put a special device extinguishing a communication packet of Heartbeat in the network. Step 2) Between nodes, the retransmission of the message is carried out repeatedly. Step 3) Then the memory of the master process increases little by little. As a result of the ps command of the master process -- * node1 (start) 32126 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 32126 ? SLs 0:03 0 182 54729 7868 0.0 heartbeat: master control process (Two hour later) 32126 ? SLs 0:08 0 182 55317 8456 0.0 heartbeat: master control process (Four hours later) 32126 ? SLs 0:24 0 182 56673 9812 0.0 heartbeat: master control process * node2 (start) 31928 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 31928 ? SLs 0:02 0 182 54481 7620 0.0 heartbeat: master control process (Two hour later) 31928 ? SLs 0:08 0 182 55353 8492 0.0 heartbeat: master control process (Four hours later) 31928 ? SLs 0:23 0 182 56689 9828 0.0 heartbeat: master control process The state of the memory leak seems to vary according to a node with the quantity of the retransmission. The increase of this memory disappears by applying my patch. And the similar correspondence seems to be necessary in send_reqnodes_msg(), but this is like little leak. Best Regards, Hideo Yamauchi. --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Lars, Thank you for comments. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. Yes. I really measured leak. I can show a result next week. #Japan is a holiday until Tuesday. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. It seems to be necessary to release gsource somehow or other. The similar liberation seems to be carried out in lrmd. Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/27, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, And when it passes more than a full day * node1 32126 ?SLs 79:52 0 182 71189 24328 0.1 heartbeat: master control process * node2 31928 ?SLs 77:01 0 182 70869 24008 0.1 heartbeat: master control process Best Regards, Hideo Yamauchi. --- On Tue, 2012/5/1, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Lars, We confirmed that this problem occurred with v1 mode of Heartbeat. * The problem happens with the v2 mode in the same way. We confirmed a problem in the next procedure. Step 1) Put a special device extinguishing a communication packet of Heartbeat in the network. Step 2) Between nodes, the retransmission of the message is carried out repeatedly. Step 3) Then the memory of the master process increases little by little. As a result of the ps command of the master process -- * node1 (start) 32126 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 32126 ? SLs 0:03 0 182 54729 7868 0.0 heartbeat: master control process (Two hour later) 32126 ? SLs 0:08 0 182 55317 8456 0.0 heartbeat: master control process (Four hours later) 32126 ? SLs 0:24 0 182 56673 9812 0.0 heartbeat: master control process * node2 (start) 31928 ? SLs 0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 31928 ? SLs 0:02 0 182 54481 7620 0.0 heartbeat: master control process (Two hour later) 31928 ? SLs 0:08 0 182 55353 8492 0.0 heartbeat: master control process (Four hours later) 31928 ? SLs 0:23 0 182 56689 9828 0.0 heartbeat: master control process The state of the memory leak seems to vary according to a node with the quantity of the retransmission. The increase of this memory disappears by applying my patch. And the similar correspondence seems to be necessary in send_reqnodes_msg(), but this is like little leak. Best Regards, Hideo Yamauchi. --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Lars, Thank you for comments. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. Yes. I really measured leak. I can show a result next week. #Japan is a holiday until Tuesday. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. It seems to be necessary to release gsource somehow or other. The similar liberation seems to be carried out in lrmd. Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/27, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a patch. Please confirm the contents of the patch. And please reflect a patch in a repository of Heartbeat. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. diff -r 106ca984041b heartbeat/hb_rexmit.c --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900 +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 @@ -164,6 +164,8 @@ seqno_t seq = (seqno_t) ri-seq; struct node_info* node = ri-node; struct ha_msg* hmsg; + unsigned long sourceid; + gpointer value; if (STRNCMP_CONST(node-status, UPSTATUS) != 0 STRNCMP_CONST(node-status, ACTIVESTATUS) !=0) { @@ -196,11 +198,17 @@ node-track.last_rexmit_req = time_longclock(); - if (!g_hash_table_remove(rexmit_hash_table, ri)){ - cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table - for seq/node(%ld %s), - __FUNCTION__, ri-seq, ri-node-nodename); - return FALSE; + value = g_hash_table_lookup(rexmit_hash_table, ri); + if ( value != NULL) { + sourceid = (unsigned long) value; + Gmain_timeout_remove(sourceid); + + if (!g_hash_table_remove(rexmit_hash_table, ri)){ + cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table +
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, We confirmed that this problem occurred with v1 mode of Heartbeat. * The problem happens with the v2 mode in the same way. We confirmed a problem in the next procedure. Step 1) Put a special device extinguishing a communication packet of Heartbeat in the network. Step 2) Between nodes, the retransmission of the message is carried out repeatedly. Step 3) Then the memory of the master process increases little by little. As a result of the ps command of the master process -- * node1 (start) 32126 ?SLs0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 32126 ?SLs0:03 0 182 54729 7868 0.0 heartbeat: master control process (Two hour later) 32126 ?SLs0:08 0 182 55317 8456 0.0 heartbeat: master control process (Four hours later) 32126 ?SLs0:24 0 182 56673 9812 0.0 heartbeat: master control process * node2 (start) 31928 ?SLs0:00 0 182 53989 7128 0.0 heartbeat: master control process (One hour later) 31928 ?SLs0:02 0 182 54481 7620 0.0 heartbeat: master control process (Two hour later) 31928 ?SLs0:08 0 182 55353 8492 0.0 heartbeat: master control process (Four hours later) 31928 ?SLs0:23 0 182 56689 9828 0.0 heartbeat: master control process The state of the memory leak seems to vary according to a node with the quantity of the retransmission. The increase of this memory disappears by applying my patch. And the similar correspondence seems to be necessary in send_reqnodes_msg(), but this is like little leak. Best Regards, Hideo Yamauchi. --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Lars, Thank you for comments. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. Yes. I really measured leak. I can show a result next week. #Japan is a holiday until Tuesday. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. It seems to be necessary to release gsource somehow or other. The similar liberation seems to be carried out in lrmd. Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/27, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a patch. Please confirm the contents of the patch. And please reflect a patch in a repository of Heartbeat. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. diff -r 106ca984041b heartbeat/hb_rexmit.c --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900 +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 @@ -164,6 +164,8 @@ seqno_t seq = (seqno_t) ri-seq; struct node_info* node = ri-node; struct ha_msg* hmsg; + unsigned long sourceid; + gpointer value; if (STRNCMP_CONST(node-status, UPSTATUS) != 0 STRNCMP_CONST(node-status, ACTIVESTATUS) !=0) { @@ -196,11 +198,17 @@ node-track.last_rexmit_req = time_longclock(); - if (!g_hash_table_remove(rexmit_hash_table, ri)){ - cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table - for seq/node(%ld %s), - __FUNCTION__, ri-seq, ri-node-nodename); - return FALSE; + value = g_hash_table_lookup(rexmit_hash_table, ri); + if ( value != NULL) { + sourceid = (unsigned long) value; + Gmain_timeout_remove(sourceid); + + if (!g_hash_table_remove(rexmit_hash_table, ri)){ + cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table + for seq/node(%ld %s), + __FUNCTION__, ri-seq, ri-node-nodename); + return FALSE; + } } schedule_rexmit_request(node, seq, max_rexmit_delay); -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi Lars, Thank you for comments. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. Yes. I really measured leak. I can show a result next week. #Japan is a holiday until Tuesday. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. It seems to be necessary to release gsource somehow or other. The similar liberation seems to be carried out in lrmd. Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/27, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a patch. Please confirm the contents of the patch. And please reflect a patch in a repository of Heartbeat. Have you actually been able to measure that memory leak you observed, and you can confirm this patch will fix it? Because I don't think this patch has any effect. send_rexmit_request() is only used as paramter to Gmain_timeout_add_full, and it returns FALSE always, which should cause the respective sourceid to be auto-removed. diff -r 106ca984041b heartbeat/hb_rexmit.c --- a/heartbeat/hb_rexmit.c Thu Apr 26 19:28:26 2012 +0900 +++ b/heartbeat/hb_rexmit.c Thu Apr 26 19:31:44 2012 +0900 @@ -164,6 +164,8 @@ seqno_t seq = (seqno_t) ri-seq; struct node_info* node = ri-node; struct ha_msg* hmsg; + unsigned long sourceid; + gpointer value; if (STRNCMP_CONST(node-status, UPSTATUS) != 0 STRNCMP_CONST(node-status, ACTIVESTATUS) !=0) { @@ -196,11 +198,17 @@ node-track.last_rexmit_req = time_longclock(); - if (!g_hash_table_remove(rexmit_hash_table, ri)){ - cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table - for seq/node(%ld %s), - __FUNCTION__, ri-seq, ri-node-nodename); - return FALSE; + value = g_hash_table_lookup(rexmit_hash_table, ri); + if ( value != NULL) { + sourceid = (unsigned long) value; + Gmain_timeout_remove(sourceid); + + if (!g_hash_table_remove(rexmit_hash_table, ri)){ + cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table + for seq/node(%ld %s), + __FUNCTION__, ri-seq, ri-node-nodename); + return FALSE; + } } schedule_rexmit_request(node, seq, max_rexmit_delay); -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] The patch which revises memory leak.
Hi All, We gave test that assumed remote cluster environment. And we tested packet lost. The retransmission timer of Heartbeat causes memory leak. I donate a patch. Please confirm the contents of the patch. And please reflect a patch in a repository of Heartbeat. Best Regards, Hideo Yamauchi. rexmit_leak.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] LVM monitor change
Hi Dejan, Thank you for comments. That's not a good reason. Testing if binaries exist on every monitor operation really doesn't make much sense. Why would you expect programs to start disappearing? And if they do, we may have a much more serious problem to deal with. All right. We withdraw this patch. And let me discuss it when we review overall RA next again. Many Thanks, Hideo Yamauchi. --- On Tue, 2012/4/10, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Tue, Apr 10, 2012 at 12:43:00PM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comments. Hi Hideo-san, On Mon, Apr 09, 2012 at 09:18:07AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comments. I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. It may be surely necessary as you say. However, I think validate-all to unify it so that it is always carried out. But why? There is the resource to carry out validate-all every time a lot. We wish it becomes LVM in the same way. That's not a good reason. Testing if binaries exist on every monitor operation really doesn't make much sense. Why would you expect programs to start disappearing? And if they do, we may have a much more serious problem to deal with. Cheers, Dejan How about what the check of vgck/vgdisplay chooses it in a parameter and can carry out? Again, why? It doesn't make any difference for a running resource? We may do this before the start operation, of course. My correction is different from original LVM in big validate-all. There were many mistakes to my patch. And I think about a patch again and send it. Best Regards, Hideo Yamauchi. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Apr 06, 2012 at 10:50:39AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Apr 05, 2012 at 11:32:05AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I agree to your patch. Thank you for the reply. BTW, the monitor was shamelessly stolen from Vladislav. Applied. ocft test passed (after some struggle and eventually fixing the ocft source). Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/4/5, Dejan Muhamedagic de...@suse.de wrote: Hi all, This is a proposed set of two patches which would eliminate use of LVM commands in the monitor path. We already discussed the issue elsewhere and I don't see any point in keeping vgck/vgdisplay given that they don't result in better monitoring under normal circumstances. And if the circumstances are such that the new monitoring fails, I think that there'll be many more problems on the node than a failed volume group. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
Re: [Linux-ha-dev] LVM monitor change
Hi Dejan, Thank you for comments. Hi Hideo-san, On Mon, Apr 09, 2012 at 09:18:07AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comments. I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. It may be surely necessary as you say. However, I think validate-all to unify it so that it is always carried out. But why? There is the resource to carry out validate-all every time a lot. We wish it becomes LVM in the same way. How about what the check of vgck/vgdisplay chooses it in a parameter and can carry out? Again, why? It doesn't make any difference for a running resource? We may do this before the start operation, of course. My correction is different from original LVM in big validate-all. There were many mistakes to my patch. And I think about a patch again and send it. Best Regards, Hideo Yamauchi. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Apr 06, 2012 at 10:50:39AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Apr 05, 2012 at 11:32:05AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I agree to your patch. Thank you for the reply. BTW, the monitor was shamelessly stolen from Vladislav. Applied. ocft test passed (after some struggle and eventually fixing the ocft source). Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/4/5, Dejan Muhamedagic de...@suse.de wrote: Hi all, This is a proposed set of two patches which would eliminate use of LVM commands in the monitor path. We already discussed the issue elsewhere and I don't see any point in keeping vgck/vgdisplay given that they don't result in better monitoring under normal circumstances. And if the circumstances are such that the new monitoring fails, I think that there'll be many more problems on the node than a failed volume group. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] LVM monitor change
Hi Dejan, Thank you for comments. I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. It may be surely necessary as you say. However, I think validate-all to unify it so that it is always carried out. How about what the check of vgck/vgdisplay chooses it in a parameter and can carry out? Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Apr 06, 2012 at 10:50:39AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Isn't it that validate-all may be really necessary only in the start action? The repeating monitor is scheduled only after a successful start. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Apr 05, 2012 at 11:32:05AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I agree to your patch. Thank you for the reply. BTW, the monitor was shamelessly stolen from Vladislav. Applied. ocft test passed (after some struggle and eventually fixing the ocft source). Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/4/5, Dejan Muhamedagic de...@suse.de wrote: Hi all, This is a proposed set of two patches which would eliminate use of LVM commands in the monitor path. We already discussed the issue elsewhere and I don't see any point in keeping vgck/vgdisplay given that they don't result in better monitoring under normal circumstances. And if the circumstances are such that the new monitoring fails, I think that there'll be many more problems on the node than a failed volume group. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] LVM monitor change
Hi Dejan, I change validate-all and want to change it to always carry out validate-all. I abolish vgck/vgdisplay carried out in validate-all and intend to make only the check of the parameter simply. How do you think? Best Regards, Hideo Yamauchi. --- On Fri, 2012/4/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Apr 05, 2012 at 11:32:05AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, I agree to your patch. Thank you for the reply. BTW, the monitor was shamelessly stolen from Vladislav. Applied. ocft test passed (after some struggle and eventually fixing the ocft source). Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Thu, 2012/4/5, Dejan Muhamedagic de...@suse.de wrote: Hi all, This is a proposed set of two patches which would eliminate use of LVM commands in the monitor path. We already discussed the issue elsewhere and I don't see any point in keeping vgck/vgdisplay given that they don't result in better monitoring under normal circumstances. And if the circumstances are such that the new monitoring fails, I think that there'll be many more problems on the node than a failed volume group. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] LVM monitor change
Hi Dejan, I agree to your patch. Best Regards, Hideo Yamauchi. --- On Thu, 2012/4/5, Dejan Muhamedagic de...@suse.de wrote: Hi all, This is a proposed set of two patches which would eliminate use of LVM commands in the monitor path. We already discussed the issue elsewhere and I don't see any point in keeping vgck/vgdisplay given that they don't result in better monitoring under normal circumstances. And if the circumstances are such that the new monitoring fails, I think that there'll be many more problems on the node than a failed volume group. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] Patch for external/vcenter.
Hi All, We used external/vcenter in vSphere5 environment. The external/vcenter tried the use with vCenter server and the ESXi server. And We found some problems. Problem 1) external/vcenter does not support addition of VM. external/vcenter fails in start when there is VM which is not yet made in HOSTLIST. Problem 2) external/vcenter fails in start processing under the influence of the above-mentioned problem even if I add the STONITH resource that went by way of ESXi server in consideration of the stop of the vCenter server. The STONITH resource with VM which does not exist fails in start processing when I set it for ESXi in current external/vcenter. However, VM may move ESXi server by vMotion and DRS. When vCenter server fell, it is necessary to consider STONITH from ESXi server of VM moved to. - In consideration of the trouble of the vCenter server, we put STONITH of the ESXi server. (server) vCenter (192.168.133.40) db1 on ESXi server 1(192.168.133.1) db2 on ESXi server 2(192.168.133.2) (snip) ### Group Configuration ### group grpStonith1 \ prmStonith1-1 \--- for vCetner prmStonith1-2 \--- for ESXi server 1 prmStonith1-3 \--- for ESXi server 2 (snip) primitive prmStonith1-2 stonith:external/vcenter \ params \ priority=3 \ stonith-timeout=60s \ VI_SERVER=192.168.133.1 \ VI_CREDSTORE=/etc/vicredentials.xml \ HOSTLIST=db1;db2 \ --- Because it is VM which there is not to ESXi server, external/vcenter fails in start processing. RESETPOWERON=0 \ op start interval=0s timeout=60s \ op monitor interval=3600s timeout=60s \ op stop interval=0s timeout=60s primitive prmStonith1-3 stonith:external/vcenter \ params \ priority=4 \ stonith-timeout=60s \ VI_SERVER=192.168.133.2 \ VI_CREDSTORE=/etc/vicredentials.xml \ HOSTLIST=db1;db2 \ Because it is VM which there is not to ESXi server, external/vcenter fails in start processing. RESETPOWERON=0 \ op start interval=0s timeout=60s \ op monitor interval=3600s timeout=60s \ op stop interval=0s timeout=60s -- I think that the check of the gethosts processing is unnecessary. It obstructs start processing. When real STONITH is performed, I think external/vcenter to be enough just to check VM.(HOSTLIST) I made a sample patch. This patch returns HOSTLIST like other STONITH modules simply. Please take in this patch. Best Regards, Hideo Yamauchi. vcenter.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New RA: IPredirect
Hi David, I write a demand from me. 1) Please implement the check of the parameter in ipredirect_validate.(For example, it is necessary to check a form and the port of the address are numerical value) 2) And please carry out ipredirect_validate.(I think that I should call it other than meta-data processing.) 3) Please process an error code by practice of iptables. 4) And please give the log at the time of the error. 5) In IPredirect, script should check iptables command is usable. (check_binary $IPTABLES) Best Regards, Hideo Yamauchi. --- On Wed, 2012/2/1, David Gersic dger...@niu.edu wrote: Somewhat based on Dummy, and somewhat based on IPaddr2, here's an RA I put together to do port redirection via iptables. I have an application (Shibboleth Identity Provider) that runs under Tomcat. Because Tomcat runs as a non-root user, the application server can only listen on ports over 1024. But this particular app must be on ports 80 and 443. The only way to do that is to use iptables and redirect traffic to the external ip address to an internal (10.0.0.1) address, changing the port used along the way. In order to manage this from Linux/HA, I needed a way to add and remove the necessary iptables rules as part of my resource group. Setting up the resource group, I have this in it: primitive class=ocf type=IPredirect provider=heartbeat is_managed=true id=IPR_8_2 instance_attributes id=IPR_8_2_instance_attrs attributes nvpair name=interface value=eth3/ nvpair name=external_ip value=131.156.21.44/ nvpair name=external_port value=443/ nvpair name=internal_ip value=10.0.0.1/ nvpair name=internal_port value=8443/ /attributes /instance_attributes operations op name=monitor interval=10 timeout=10 start_delay=10/ op name=start timeout=10/ op name=stop timeout=10/ /operations /primitive to redirect external port 443 traffic to internal port 8443 where the application is actually listening. I'm using two IPaddr2 primitives to bind the external (131.156.21.44) and internal (10.0.0.1) to eth3. This group will have Filesystem and Tomcat primitives as well, to manage the shared storage and application server. Tested here and seems to work. Comments or changes appreciated. #!/bin/sh # # Description: Manages iptables port redirection firewall rules # needed for a resource group under Heartbeat/LinuxHA # control. # # Copyright 2012 Northern Illinois University, David Gersic # All Rights Reserved. # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA # 02110-1301, USA. # # # OCF parameters: # OCF_RESKEY_interface - Which interface to apply the rules to (ie: eth0, eth1, etc.) # OCF_RESKEY_external_ip - External IP address to redirect from # OCF_RESKEY_external_port - External IP port to redirect from # OCF_RESKEY_internal_ip - Internal IP adddress to redirect to # OCF_RESKEY_internal_port - Internal IP port to redirect to # ### # Initialization: . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs ### meta_data() { cat END ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=IPredirect version=0.9 version1.0/version longdesc lang=en This resource agent enables port redirection from an external IP address to an internal IP address. This is useful for applications that must be reachable on a port below 1024, but that must also run as non-root. /longdesc shortdesc lang=enIPredirect resource agent/shortdesc parameters parameter name=interface unique=1 longdesc lang=en Which interface to apply the rules to (ie: eth0, eth1, etc.) /longdesc shortdesc lang=enNetwork interface/shortdesc
Re: [Linux-ha-dev] [Patch] Patch for IPsrcaddr.(2/2)
Hi Dejan, Thank you for comments. OK. Applied that too. The ocft test passes, but cannot work without specifying the existing address. I'm not sure, but I think that ocft cannot ask for user input, so the test is going to be semi-automatic. All right! I confirmed the next contents. * https://github.com/ClusterLabs/resource-agents/commit/9cd054d15112bd7053763c7655059a07e07f4e69 * https://github.com/ClusterLabs/resource-agents/commit/7bfd0597a1d2efcd4cd2f579675510cff725ec17 Many thanks!! Hideo Yamauchi. --- On Sat, 2012/1/28, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Wed, Jan 25, 2012 at 10:09:26AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comments. Now the ocft test fails: 2012/01/23_21:39:40 ERROR: IP address [127.0.0.3] is a loopback address, thus can not be preferred source address Any idea how to update the ocft test case? I try this problem, too. I carried out ocf-tester with three cases. Case1) I carry it out after improving an address by ifconfig command. [root@rh57-3 ClusterLabs-resource-agents-7edbe1d]# ifconfig eth0:1 192.168.40.7 up [root@rh57-3 ClusterLabs-resource-agents-7edbe1d]# ocf-tester -v -n IPsrcaddr -o ipaddress=192.168.40.7 /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPsrcaddr... Testing permissions with uid nobody Testing: meta-data [...] [Note to myself: drop the meta-data output] ERROR: Setup problem: couldn't find command: gawk Install gawk perhaps? I am mysterious...gwak had been already installed, but this error seemed to be given. Sorry, it was my mistake. ocf-tester does this on purpose. The next environment variable(OCF_TESTER_FAIL_HAVE_BINARY) of ocf-tester seems to influence it somehow or other. (snip) OCF_TESTER_FAIL_HAVE_BINARY=1 export OCF_TESTER_FAIL_HAVE_BINARY OCF_RESKEY_CRM_meta_interval=0 test_command monitor (snip) Similar error occurs in IPaddr2. [root@rh57-3 heartbeat]# ocf-tester -v -n IPaddr2 -o ip=192.168.40.8 /usr/lib/ocf/resource.d/heartbeat/IPaddr2 Beginning tests for /usr/lib/ocf/resource.d/heartbeat/IPaddr2... Testing permissions with uid nobody (snip) Checking current state Testing: monitor Testing: monitor ERROR: Setup problem: couldn't find command: ip Testing: start (snip) Is not a correction of ocf-tester necessary? [...] INFO: The ip route has been already set.(192.168.40.0/24, eth0, default via 192.168.40.1 dev eth0 ) Hmm, I saw different stuff: ERROR: command 'ip route replace 10.2.13.0/24 169.254.0.0/16 dev eth0 src 10.2.13.154' failed Debugging: + ip route replace 10.2.13.0/24 169.254.0.0/16 dev eth0 src 10.2.13.154 Error: either to is duplicate, or 169.254.0.0/16 is a garbage. The route list: xen-d:~ # ip route list default via 10.2.13.1 dev eth0 10.2.13.0/24 dev eth0 proto kernel scope link src 10.2.13.54 127.0.0.0/8 dev lo scope link 169.254.0.0/16 dev eth0 scope link It seems like the last entry confuses the new calculation code. In my environment, I set it in NOZEROCONF=yes. Therefore, the last entry does not exist. Right. But it's still better that the RA can handle this situation too. It turns out that the problem is here (nothing to do with your patch): NETWORK=`ip route list dev $INTERFACE scope link|grep -o '^[^ ]*'` Perhaps we should do: NETWORK=`ip route list dev $INTERFACE match $ipaddress scope link|grep -o '^[^ ]*'` Opinions? I think that the method that you showed is more right. OK. Applied that too. The ocft test passes, but cannot work without specifying the existing address. I'm not sure, but I think that ocft cannot ask for user input, so the test is going to be semi-automatic. Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] Patch for IPsrcaddr.(1/2)
Hi Dejan, Thank you for comments. This patch revises the next point. * When route has been already assigned, RA skips an allotment. Is this just a performance improvement? Or did you see anything wrong happen when running the current code? The problem is not taking place. We found this waste in a review. I think that this waste may influence a performance. * Added error log of FINDIF. * Deleted the unused sentence. It would be good to have at least two patches, because we should always try to have patches with single self-contained modification. Sorry.. Because I was small, I did not divide this patch into one patch. Best Regards, Hideo Yamauchi. --- On Sat, 2012/1/14, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, Sorry for picking up this so late. On Tue, Nov 29, 2011 at 02:48:52PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made a patch to IPsrcaddr. This patch revises the next point. * When route has been already assigned, RA skips an allotment. Is this just a performance improvement? Or did you see anything wrong happen when running the current code? * Added error log of FINDIF. * Deleted the unused sentence. It would be good to have at least two patches, because we should always try to have patches with single self-contained modification. Cheers, Dejan Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi diff -r 2107bc4f5c8b heartbeat/IPsrcaddr --- a/heartbeat/IPsrcaddr Thu Nov 24 14:13:11 2011 +0900 +++ b/heartbeat/IPsrcaddr Thu Nov 24 14:13:53 2011 +0900 @@ -167,13 +167,20 @@ srca_start() { srca_read $1 - ip route replace $NETWORK dev $INTERFACE src $1 || \ - errorexit command 'ip route replace $NETWORK dev $INTERFACE src $1' failed + rc=$? + if [ $rc = 0 ]; then + rc=$OCF_SUCCESS + ocf_log info The ip route has been already set.($NETWORK, $INTERFACE, $ROUTE_WO_SRC) + else + ip route replace $NETWORK dev $INTERFACE src $1 || \ + errorexit command 'ip route replace $NETWORK dev $INTERFACE src $1' failed - $CMDCHANGE $ROUTE_WO_SRC src $1 || \ - errorexit command '$CMDCHANGE $ROUTE_WO_SRC src $1' failed + $CMDCHANGE $ROUTE_WO_SRC src $1 || \ + errorexit command '$CMDCHANGE $ROUTE_WO_SRC src $1' failed + rc=$? + fi - return $? + return $rc } # @@ -252,7 +259,6 @@ else true fi -# return $OCF_SUCCESS ;; *) #less than three decimal dots false;; @@ -377,7 +383,6 @@ Linux|SunOS) IF=`find_interface $BASEIP` -# echo $IF if [ -z $IF ]; then return $OCF_NOT_RUNNING fi @@ -455,7 +460,11 @@ findif_out=`$FINDIF -C` rc=$? -[ $rc -ne 0 ] exit $rc +[ $rc -ne 0 ] { + ocf_log err [$FINDIF -C] failed + exit $rc +} + INTERFACE=`echo $findif_out | awk '{print $1}'` NETWORK=`ip route list dev $INTERFACE scope link|grep -o '^[^ ]*'` ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] Patch for IPsrcaddr.(2/2)
Hi Dejan, Thank you for comments. On Tue, Nov 29, 2011 at 02:49:24PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made a patch to IPsrcaddr. This patch revises the next point. * Made modifications to carry out validate_all processing. I'm not necessarily against it, but I wonder why. This would make monitor validate the environment every time. Is that really necessary? What was your motivation for this change? I think that the handling of validate-all should be carried out in the same way as other RA. Therefore we suggested this correction. * All RA is not same, but give readability and conservatism if it is similar constitution. * Undefined and deleted the unused IPROUTE variable OK. * The find_interface_generic processing revised it to search it by ip command. Good. However, we cannot test environment except Linux. Therefore, we limited a condition to carry out processing to environment of Linux. That's fine too. Many Thanks! Hideo Yamauchi. Cheers, Dejan (snip) @@ -458,6 +440,10 @@ ipaddress=$OCF_RESKEY_ipaddress +if [ x$SYSTYPE = xLinux ]; then + srca_validate_all +fi + (snip) Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi diff -r e4d9d86a9577 IPsrcaddr --- a/IPsrcaddr Mon Nov 28 20:02:26 2011 +0900 +++ b/IPsrcaddr Mon Nov 28 20:03:07 2011 +0900 @@ -307,35 +307,14 @@ # find_interface_generic() { - $IFCONFIG $IFCONFIG_A_OPT | - while read ifname linkstuff - do - : Read gave us ifname = $ifname - - read inet addr junk - : Read gave us inet = $inet addr = $addr - - while - read line [ X$line != X ] - do - : Nothing - done - - case $SYSTYPE in - *BSD) - $IFCONFIG | grep $BASEIP -B`$IFCONFIG | grep -c inet` | grep UP, | cut -d : -f 1 - return 0;; - *) - : comparing $BASEIP to $addr (from ifconfig) - case $addr in - addr:$BASEIP) echo $ifname; return $OCF_SUCCESS;; - $BASEIP) echo $ifname; return $OCF_SUCCESS;; - esac - continue;; - esac - - done - return $OCF_ERR_GENERIC + local iface=`$IP2UTIL -o -f inet addr show | grep \ $BASEIP \ + | cut -d ' ' -f2 | grep -v '^ipsec[0-9][0-9]*$'` + if [ -z $iface ]; then + return $OCF_ERR_GENERIC + else + echo $iface + return $OCF_SUCCESS + fi } @@ -409,7 +388,6 @@ srca_validate_all() { check_binary $AWK - check_binary $IPROUTE check_binary $IFCONFIG # The IP address should be in good shape @@ -420,6 +398,10 @@ exit $OCF_ERR_CONFIGURED fi + if ocf_is_probe; then + return $OCF_SUCCESS + fi + # We should serve this IP address of course if ip_status $ipaddress; then : @@ -458,6 +440,10 @@ ipaddress=$OCF_RESKEY_ipaddress +if [ x$SYSTYPE = xLinux ]; then + srca_validate_all +fi + findif_out=`$FINDIF -C` rc=$? [ $rc -ne 0 ] { ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] OCF_RESKEY_CRM_meta_timeout not matching monitor timeout meta-data.
Hi All, I made the patch which revised the old next problem. * http://www.gossamer-threads.com/lists/linuxha/users/70262 In consideration of influence when a parameter was changed, I replace only a value of timeout. Please confirm my patch. And please commit a patch. Best Regards, Hideo Yamauchi. trac1467.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] OCF_RESKEY_CRM_meta_timeout not matching monitor timeout meta-data.
Hi Dejan, Thank you for comment. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. In the case of stop, crmd does not ask lrmd for the substitution of the parameter. . (snip) /* reset the resource's parameters? */ if(op-interval == 0) { if(safe_str_eq(CRMD_ACTION_START, operation) || safe_str_eq(CRMD_ACTION_STATUS, operation)) { op-copyparams = 1; } } (snip) When the parameter of the resource is changed, I think this to be because I influence the stop of the resource of lrmd. It is necessary for the changed parameter not to copy it. My patch is an example when I handle it in lrmd. Is there a better patch? * For example, it may be good to give copyparams a different value. Best Regards, Hideo Yamauchi. --- On Thu, 2011/12/15, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Dec 15, 2011 at 06:21:00PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I made the patch which revised the old next problem. * http://www.gossamer-threads.com/lists/linuxha/users/70262 In consideration of influence when a parameter was changed, I replace only a value of timeout. Please confirm my patch. And please commit a patch. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] OCF_RESKEY_CRM_meta_timeout not matching monitor timeout meta-data.
Hi Andrew, Thank you for comment. When stopping, you always want to use the old parameters (think of someone changing 'ip' for an IPaddr resource). Options that are interpreted by the crmd or lrmd are a different matter which resulted in: https://github.com/ClusterLabs/pacemaker/commit/fcfe6fe522138343e4138248829926700fac213e All right. Will you apply this correction to 1.0 of Pacemaker? Best Regards, Hideo Yamauchi. --- On Fri, 2011/12/16, Andrew Beekhof and...@beekhof.net wrote: On Thu, Dec 15, 2011 at 8:45 PM, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. In the case of stop, crmd does not ask lrmd for the substitution of the parameter. . (snip) /* reset the resource's parameters? */ if(op-interval == 0) { if(safe_str_eq(CRMD_ACTION_START, operation) || safe_str_eq(CRMD_ACTION_STATUS, operation)) { op-copyparams = 1; } } (snip) When the parameter of the resource is changed, I think this to be because I influence the stop of the resource of lrmd. It is necessary for the changed parameter not to copy it. When stopping, you always want to use the old parameters (think of someone changing 'ip' for an IPaddr resource). Options that are interpreted by the crmd or lrmd are a different matter which resulted in: https://github.com/ClusterLabs/pacemaker/commit/fcfe6fe522138343e4138248829926700fac213e My patch is an example when I handle it in lrmd. Is there a better patch? * For example, it may be good to give copyparams a different value. Best Regards, Hideo Yamauchi. --- On Thu, 2011/12/15, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Dec 15, 2011 at 06:21:00PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I made the patch which revised the old next problem. * http://www.gossamer-threads.com/lists/linuxha/users/70262 In consideration of influence when a parameter was changed, I replace only a value of timeout. Please confirm my patch. And please commit a patch. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch] OCF_RESKEY_CRM_meta_timeout not matching monitor timeout meta-data.
Hi Andrew, All right. Will you apply this correction to 1.0 of Pacemaker? Sure. We'll pick it up for .13 Many Thanks!! Hideo Yamauchi. --- On Fri, 2011/12/16, Andrew Beekhof and...@beekhof.net wrote: On Fri, Dec 16, 2011 at 1:21 PM, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comment. When stopping, you always want to use the old parameters (think of someone changing 'ip' for an IPaddr resource). Options that are interpreted by the crmd or lrmd are a different matter which resulted in: https://github.com/ClusterLabs/pacemaker/commit/fcfe6fe522138343e4138248829926700fac213e All right. Will you apply this correction to 1.0 of Pacemaker? Sure. We'll pick it up for .13 Best Regards, Hideo Yamauchi. --- On Fri, 2011/12/16, Andrew Beekhof and...@beekhof.net wrote: On Thu, Dec 15, 2011 at 8:45 PM, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. In the case of stop, crmd does not ask lrmd for the substitution of the parameter. . (snip) /* reset the resource's parameters? */ if(op-interval == 0) { if(safe_str_eq(CRMD_ACTION_START, operation) || safe_str_eq(CRMD_ACTION_STATUS, operation)) { op-copyparams = 1; } } (snip) When the parameter of the resource is changed, I think this to be because I influence the stop of the resource of lrmd. It is necessary for the changed parameter not to copy it. When stopping, you always want to use the old parameters (think of someone changing 'ip' for an IPaddr resource). Options that are interpreted by the crmd or lrmd are a different matter which resulted in: https://github.com/ClusterLabs/pacemaker/commit/fcfe6fe522138343e4138248829926700fac213e My patch is an example when I handle it in lrmd. Is there a better patch? * For example, it may be good to give copyparams a different value. Best Regards, Hideo Yamauchi. --- On Thu, 2011/12/15, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Thu, Dec 15, 2011 at 06:21:00PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, I made the patch which revised the old next problem. * http://www.gossamer-threads.com/lists/linuxha/users/70262 In consideration of influence when a parameter was changed, I replace only a value of timeout. Please confirm my patch. And please commit a patch. It looks like a wrong place for a fix. Shouldn't crmd send all environment? It is only by chance that we have the timeout value available in this function. Cheers, Dejan Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] Patch for LVM.(2/3)
Hi All, This patch revises the next point. * Correction of the log wrong at the time of status practice. Please confirm my patch. And please commit a patch. Best Regards, Hideo Yamauchidiff -r 46f87af89d20 heartbeat/LVM --- a/heartbeat/LVM Mon Dec 05 19:21:11 2011 +0900 +++ b/heartbeat/LVM Mon Dec 05 19:21:44 2011 +0900 @@ -162,12 +162,14 @@ fi # Report on LVM volume status to stdout... - if -echo $VGOUT | grep -i 'Access.*read/write' /dev/null - then -ocf_log debug Volume $1 is available read/write (running) - else -ocf_log debug Volume $1 is available read-only (running) + if [ $rc -eq 0 ]; then +if + echo $VGOUT | grep -i 'Access.*read/write' /dev/null +then + ocf_log debug Volume $1 is available read/write (running) +else + ocf_log debug Volume $1 is available read-only (running) +fi fi return $OCF_SUCCESS ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] Patch for LVM.(3/3)
Hi All, This patch revises the next point. * Deletion of the unused statement. Please confirm my patch. And please commit a patch. Best Regards, Hideo Yamauchidiff -r a85a5ba1712f heartbeat/LVM --- a/heartbeat/LVM Mon Dec 05 22:43:03 2011 +0900 +++ b/heartbeat/LVM Mon Dec 05 22:44:59 2011 +0900 @@ -325,9 +325,7 @@ if [ -z $OCF_RESKEY_volgrpname ] then -# echo You must identify the volume group name! ocf_log err You must identify the volume group name! -# usage exit $OCF_ERR_CONFIGURED fi ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch] Patch for IPsrcaddr.(2/2)
Hi All, We made a patch to IPsrcaddr. This patch revises the next point. * Made modifications to carry out validate_all processing. * Undefined and deleted the unused IPROUTE variable * The find_interface_generic processing revised it to search it by ip command. However, we cannot test environment except Linux. Therefore, we limited a condition to carry out processing to environment of Linux. (snip) @@ -458,6 +440,10 @@ ipaddress=$OCF_RESKEY_ipaddress +if [ x$SYSTYPE = xLinux ]; then + srca_validate_all +fi + (snip) Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi diff -r e4d9d86a9577 IPsrcaddr --- a/IPsrcaddr Mon Nov 28 20:02:26 2011 +0900 +++ b/IPsrcaddr Mon Nov 28 20:03:07 2011 +0900 @@ -307,35 +307,14 @@ # find_interface_generic() { - $IFCONFIG $IFCONFIG_A_OPT | - while read ifname linkstuff - do -: Read gave us ifname = $ifname - -read inet addr junk -: Read gave us inet = $inet addr = $addr - -while - read line [ X$line != X ] -do - : Nothing -done - -case $SYSTYPE in - *BSD) - $IFCONFIG | grep $BASEIP -B`$IFCONFIG | grep -c inet` | grep UP, | cut -d : -f 1 - return 0;; - *) - : comparing $BASEIP to $addr (from ifconfig) - case $addr in - addr:$BASEIP) echo $ifname; return $OCF_SUCCESS;; - $BASEIP) echo $ifname; return $OCF_SUCCESS;; - esac - continue;; -esac - - done - return $OCF_ERR_GENERIC + local iface=`$IP2UTIL -o -f inet addr show | grep \ $BASEIP \ +| cut -d ' ' -f2 | grep -v '^ipsec[0-9][0-9]*$'` +if [ -z $iface ]; then +return $OCF_ERR_GENERIC +else +echo $iface +return $OCF_SUCCESS +fi } @@ -409,7 +388,6 @@ srca_validate_all() { check_binary $AWK -check_binary $IPROUTE check_binary $IFCONFIG # The IP address should be in good shape @@ -420,6 +398,10 @@ exit $OCF_ERR_CONFIGURED fi + if ocf_is_probe; then + return $OCF_SUCCESS + fi + # We should serve this IP address of course if ip_status $ipaddress; then : @@ -458,6 +440,10 @@ ipaddress=$OCF_RESKEY_ipaddress +if [ x$SYSTYPE = xLinux ]; then + srca_validate_all +fi + findif_out=`$FINDIF -C` rc=$? [ $rc -ne 0 ] { ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Remove unnecessary loop handling of data_directory for postfix.
Hi Raoul, About the second patch which I contributed, how do you think? Best Regards, Hideo Yamauchi. --- On Mon, 2011/11/21, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. Because postfix check did not give back the details to a result as for RA, I recognized that the details of the log were necessary. I changed a check of data_directory. And I abolish a suggestion street in front, the loop. This is because the plural setting is not admitted because it added a check. Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi. --- On Sat, 2011/11/19, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2011-11-16 11:36, renayama19661...@ybb.ne.jp wrote: I think that the same check has been already carried out in a resource agent. (snip) # run Postfix internal check, if not probing if ! ocf_is_probe; then $binary $OPTIONS check/dev/null 21 ret=$? if [ $ret -ne 0 ]; then ocf_log err Postfix 'check' failed. $ret return $OCF_ERR_GENERIC fi fi (snip) That means, after all is not the loop check of data_directory unnecessary? postfix check is called after all other checks have passed and, you're right, it also checks the required directories. i think i had some issues though: # check spool/queue and data directories (if applicable) # this is required because postfix check does not catch all errors but i cannot remember the exact problems anymore. anyways, postfix check will return a OCF_ERR_GENERIC which is regarded as a soft error (!) [1] and will a. not hint the user or a gui application to the exact problem and b. will lead to a restart of the failed resource on the same node the more in-depth check will fail with OCF_ERR_INSTALLED [2] or OCF_ERR_PERM [3] and will c. give more information in this regard and d. migrates the resource to a different node, which makes sense if i.e. the shared queue directory (nfs, etc.) isn't available. i think that this behavior is good and checking the most commonly modified directories separately has been very helpful in my setups. but of course, i'm open for comments. #Sorry...Because English is weak, I may understand your opinion by mistake. no worries. english isn't my first language either and until now we managed to work things out, right? :) cheers, raoul [1] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_generic_literal_1.html [2] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_installed_literal_5.html [3] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_perm_literal_4.html -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] ocf:heartbeat:postfix postfix_running (was: Re: [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.)
Hi Raoul, Thank you for comment. https://github.com/raoulbhatia/resource-agents/commit/4a5afaa217 All right! I confirmed it about your modified contents. Cheers, Hideo Yamauchi. --- On Tue, 2011/11/22, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2011-11-21 03:07, renayama19661...@ybb.ne.jp wrote: It is judged that postfix works definitely and stops. The version that I confirmed is 2.6.6 on RHEL6.1. There seems to be a mistake with one patch. The postfix status command does not seem to return a detailed result. This is the same as postfix check command. I think that next is more right. * I abolished output and omitted output from log. (snip) postfix_running() { local loglevel loglevel=${1:-err} # run Postfix status if available if ocf_is_true $status_support; then $binary $OPTION_CONFIG_DIR status 21 ret=$? if [ $ret -ne 0 ]; then ocf_log $loglevel Postfix status: $ret fi return $ret fi (snip) i applied this change and also updated the other of ocf_log lines a little bit: https://github.com/raoulbhatia/resource-agents/commit/4a5afaa217 i would like to resolve another issue though: if we expect to log an error, e.g.: 1. postfix stop 2. call postfix_running to see if postfix actually stopped. so there is an expected error if postfix_running which will get logged and will possibly trouble the admin, right? thinking about how to solve this for the postfix ra (e.g. using a -q parameter) i thought about using the ocf_run function. but the ocf_run function will log an error too... so i'll leave this issue until my other email is answered ;) cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] add an option to ocf_run to surpress *all* output
Hi Raoul, I think that the optional addition of Raoul is good. Surely the optional addition will be useful in future. Best Regards, Hideo Yamauchi. --- On Tue, 2011/11/22, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi all! i'm using the following logic in my postfix ra: 1. stop postfix 2. check if postfix is actually stopped by checking it's status. if so, everything is working as intended! i now wanted to switch to using ocf_run in my postfix ra but there is no parameter to completely suppress the entire output of a command. what about adding a special option, e.g. -qq, to not log *anything* even if the command to run returns an error? thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Remove unnecessary loop handling of data_directory for postfix.
Hi Raoul, Thank you for comment. Because postfix check did not give back the details to a result as for RA, I recognized that the details of the log were necessary. I changed a check of data_directory. And I abolish a suggestion street in front, the loop. This is because the plural setting is not admitted because it added a check. Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi. --- On Sat, 2011/11/19, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2011-11-16 11:36, renayama19661...@ybb.ne.jp wrote: I think that the same check has been already carried out in a resource agent. (snip) # run Postfix internal check, if not probing if ! ocf_is_probe; then $binary $OPTIONS check/dev/null 21 ret=$? if [ $ret -ne 0 ]; then ocf_log err Postfix 'check' failed. $ret return $OCF_ERR_GENERIC fi fi (snip) That means, after all is not the loop check of data_directory unnecessary? postfix check is called after all other checks have passed and, you're right, it also checks the required directories. i think i had some issues though: # check spool/queue and data directories (if applicable) # this is required because postfix check does not catch all errors but i cannot remember the exact problems anymore. anyways, postfix check will return a OCF_ERR_GENERIC which is regarded as a soft error (!) [1] and will a. not hint the user or a gui application to the exact problem and b. will lead to a restart of the failed resource on the same node the more in-depth check will fail with OCF_ERR_INSTALLED [2] or OCF_ERR_PERM [3] and will c. give more information in this regard and d. migrates the resource to a different node, which makes sense if i.e. the shared queue directory (nfs, etc.) isn't available. i think that this behavior is good and checking the most commonly modified directories separately has been very helpful in my setups. but of course, i'm open for comments. #Sorry...Because English is weak, I may understand your opinion by mistake. no worries. english isn't my first language either and until now we managed to work things out, right? :) cheers, raoul [1] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_generic_literal_1.html [2] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_installed_literal_5.html [3] http://www.linux-ha.org/doc/dev-guides/_literal_ocf_err_perm_literal_4.html -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 diff -r aaf72a017c98 postfix --- a/postfix Mon Nov 21 10:32:33 2011 +0900 +++ b/postfix Mon Nov 21 10:34:08 2011 +0900 @@ -264,7 +264,13 @@ fi if ocf_is_true $status_support; then -data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +orig_data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +data_dir=`echo $orig_data_dir | tr ',' ' '` +dcount=`echo $data_dir | wc -w` +if [ $dcount -gt 1 ]; then +ocf_log err Postfix data directory '$orig_data_dir' cannot set plural parameters. +return $OCF_ERR_PERM +fi if [ ! -d $data_dir ]; then if ocf_is_probe; then ocf_log info Postfix data directory '$data_dir' not readable during probe. @@ -278,16 +284,14 @@ # check directory permissions if ocf_is_true $status_support; then user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` -for dir in $data_dir; do -if ! su -s /bin/sh - $user -c test -w $dir; then -if ocf_is_probe; then -ocf_log info Directory '$dir' is not writable by user '$user' during probe. -else -ocf_log err Directory '$dir' is not writable by user '$user'. -return $OCF_ERR_PERM; -fi +if ! su -s /bin/sh - $user -c test -w $data_dir; then +if ocf_is_probe; then +ocf_log info Directory '$data_dir' is not writable by user '$user' during probe. +else +ocf_log err Directory '$data_dir' is not writable by user '$user'. +return $OCF_ERR_PERM; fi -done +fi fi fi
Re: [Linux-ha-dev] ocf:heartbeat:postfix postfix_running (was: Re: [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.)
Hi Raoul, 2. we log an error (rc 1) which actually is expected and good (postfix is not running; we're eligible to start it) the same happens upon stopping postfix: Nov 18 15:01:07 m01 crmd: [2063]: info: do_lrm_rsc_op: Performing key=116:55885:0:9582c8d2-c69a-4d79-91f6-04ea7bbe1853 op=m-mail-postfix_stop_0 ) Nov 18 15:01:07 m01 lrmd: [2060]: info: rsc:m-mail-postfix stop[175] (pid 10420) Nov 18 15:01:08 m01 postfix/postfix-script[10632]: the Postfix mail system is not running Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix status: ''. 1 Nov 18 15:01:08 m01 postfix/postfix-script[10652]: the Postfix mail system is not running Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix status: ''. 1 Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix stopped. Nov 18 15:01:08 m01 lrmd: [2060]: info: operation stop[175] on m-mail-postfix for client 2063: pid 10420 exited with return code 0 this is still a pending issue. In my environment, the same log does not appear. Nov 21 10:48:22 rhel6-1 attrd: [5964]: info: attrd_ha_callback: Update relayed from rhel6-2 Nov 21 10:48:22 rhel6-1 attrd: [5964]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (1321840102) Nov 21 10:48:22 rhel6-1 attrd: [5964]: info: attrd_perform_update: Sent update 8: shutdown=1321840102 Nov 21 10:48:23 rhel6-1 lrmd: [5962]: info: cancel_op: operation monitor[4] on prmDummy1 for client 5965, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.1] CRM_meta_on_fail=[restart] CRM_meta_interval=[1] CRM_meta_timeout=[2] cancelled Nov 21 10:48:23 rhel6-1 crmd: [5965]: info: do_lrm_rsc_op: Performing key=6:2:0:bf49e695-7079-40fd-803b-f732619084f4 op=prmDummy1_stop_0 ) Nov 21 10:48:23 rhel6-1 lrmd: [5962]: info: rsc:prmDummy1 stop[5] (pid 6488) Nov 21 10:48:23 rhel6-1 crmd: [5965]: info: process_lrm_event: LRM operation prmDummy1_monitor_1 (call=4, status=1, cib-update=0, confirmed=true) Cancelled Nov 21 10:48:26 rhel6-1 postfix(prmDummy1)[6488]: [6637]: INFO: Postfix status: ''. 1 Nov 21 10:48:26 rhel6-1 postfix(prmDummy1)[6488]: [6639]: INFO: Postfix stopped. Nov 21 10:48:26 rhel6-1 lrmd: [5962]: info: operation stop[5] on prmDummy1 for client 5965: pid 6488 exited with return code 0 Nov 21 10:48:26 rhel6-1 crmd: [5965]: info: process_lrm_event: LRM operation prmDummy1_stop_0 (call=5, rc=0, cib-update=14, confirmed=true) ok Nov 21 10:48:27 rhel6-1 crmd: [5965]: info: handle_request: Shutting down It is judged that postfix works definitely and stops. The version that I confirmed is 2.6.6 on RHEL6.1. There seems to be a mistake with one patch. The postfix status command does not seem to return a detailed result. This is the same as postfix check command. I think that next is more right. * I abolished output and omitted output from log. (snip) postfix_running() { local loglevel loglevel=${1:-err} # run Postfix status if available if ocf_is_true $status_support; then $binary $OPTION_CONFIG_DIR status 21 ret=$? if [ $ret -ne 0 ]; then ocf_log $loglevel Postfix status: $ret fi return $ret fi (snip) Best Regards, Hideo Yamauchi. --- On Sat, 2011/11/19, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 2011-11-18 15:16, Raoul Bhatia [IPAX] wrote: 1. we do not capture the the Postfix mail system is not running output. maybe this is a result from running in an interactive shell? i can answer this myself. postfix, at least on debian, only displays the output to stdout if there is a connected terminal. e.g. # postfix blar; echo $? postfix/postfix-script: fatal: usage: postfix start (or stop, reload, abort, flush, check, status, set-permissions, upgrade-configuration) 1 # ssh localhost postfix blar; echo $? 1 so i do not know whether there is any sense in logging the postfix_running output. 2. we log an error (rc 1) which actually is expected and good (postfix is not running; we're eligible to start it) the same happens upon stopping postfix: Nov 18 15:01:07 m01 crmd: [2063]: info: do_lrm_rsc_op: Performing key=116:55885:0:9582c8d2-c69a-4d79-91f6-04ea7bbe1853 op=m-mail-postfix_stop_0 ) Nov 18 15:01:07 m01 lrmd: [2060]: info: rsc:m-mail-postfix stop[175] (pid 10420) Nov 18 15:01:08 m01 postfix/postfix-script[10632]: the Postfix mail system is not running Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix status: ''. 1 Nov 18 15:01:08 m01 postfix/postfix-script[10652]: the Postfix mail system is not running Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix status: ''. 1 Nov 18 15:01:08 m01 postfix[10420]: INFO: Postfix stopped. Nov 18 15:01:08 m01 lrmd: [2060]: info: operation stop[175] on m-mail-postfix for client 2063: pid 10420 exited with return code 0 this is still a pending issue. thanks, raoul -- DI (FH) Raoul
Re: [Linux-ha-dev] [Patch]Remove unnecessary loop handling of data_directory for postfix.
Hi Raoul, Thank you for comment. On 2011-11-16 01:16, renayama19661...@ybb.ne.jp wrote: I judged that I could not set plural data_directory parameters from these results and contributed a patch. Is my judgment wrong? to my knowledge, you're correct. multiple data_directories are not possible (and make imho make no sense) All right! Thanks! (Exapmle) It is postfix2.6.6 on RHEL6 that I confirmed. * Step1 : I set two directories in main.cf. (snip) # The data_directory parameter specifies the location of Postfix-writable # data files (caches, random numbers). This directory must be owned # by the mail_owner account (see below). # data_directory = /var/lib/postfix,/var/lib/postfix2 (snip) so you set the directory to the single value /var/lib/postfix,/var/lib/postfix2 which is not tokenized/split into an array. * Step2 : I make a directory and give access permission. [root@rhel6-1 ~]# mkdir /var/lib/postfix2 [root@rhel6-1 ~]# chown postfix:postfix /var/lib/postfix2 * Step3 : I execute postfix chek command.(ERROR) [root@rhel6-1 ~]# postfix check mkdir: cannot create directory `/var/lib/postfix,/var/lib/postfix2': No such file or directory postfix/postfix-script: fatal: unable to create missing queue directories [root@rhel6-1 ~]# echo $? 1 that is expected and the resource agent should check the same. I think that the same check has been already carried out in a resource agent. (snip) # run Postfix internal check, if not probing if ! ocf_is_probe; then $binary $OPTIONS check /dev/null 21 ret=$? if [ $ret -ne 0 ]; then ocf_log err Postfix 'check' failed. $ret return $OCF_ERR_GENERIC fi fi (snip) That means, after all is not the loop check of data_directory unnecessary? #Sorry...Because English is weak, I may understand your opinion by mistake. Best Regards, Hideo Yamauchi. -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.
Hi Raoul, Thank you for comment. 1. simply break the loop when postfix isn't running anymore. 2. ocf_log info Postfix stopped. will be called at the end of the postfix_stop() method. any objections? All right. I think that the correction that you suggested is right. I approve of it. Thanks, Hideo Yamauchi. --- On Tue, 2011/11/15, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi Hideo-san! On 2011-11-15 01:14, renayama19661...@ybb.ne.jp wrote: Hi Raoul, why do you want to return here and not simply break and let the postfix_stop() continue it's work? ok, so i would change the patch to: --- a/heartbeat/postfix +++ b/heartbeat/postfix @@ -173,6 +173,8 @@ postfix_stop() for i in 1 2 3 4 5; do if postfix_running info; then sleep 1 + else + break fi done 1. simply break the loop when postfix isn't running anymore. 2. ocf_log info Postfix stopped. will be called at the end of the postfix_stop() method. any objections? thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Remove unnecessary loop handling of data_directory for postfix.
Hi Raoul, Thank you for comment. Because I do not know a lot about setting of postfix, I may be a wrong opinion. The data_directory of main.cf of postfix can set the directories more than two. However, at the time of this setting, the postfix check command returns an error. Because the resource agent of postfix executes postfix check command in the same way, the validate processing returns an error. I judged that I could not set plural data_directory parameters from these results and contributed a patch. Is my judgment wrong? (Exapmle) It is postfix2.6.6 on RHEL6 that I confirmed. * Step1 : I set two directories in main.cf. (snip) # The data_directory parameter specifies the location of Postfix-writable # data files (caches, random numbers). This directory must be owned # by the mail_owner account (see below). # data_directory = /var/lib/postfix,/var/lib/postfix2 (snip) * Step2 : I make a directory and give access permission. [root@rhel6-1 ~]# mkdir /var/lib/postfix2 [root@rhel6-1 ~]# chown postfix:postfix /var/lib/postfix2 * Step3 : I execute postfix chek command.(ERROR) [root@rhel6-1 ~]# postfix check mkdir: cannot create directory `/var/lib/postfix,/var/lib/postfix2': No such file or directory postfix/postfix-script: fatal: unable to create missing queue directories [root@rhel6-1 ~]# echo $? 1 Best Regards, Hideo Yamauchi. --- On Tue, 2011/11/15, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 2011-11-15 03:09, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Hi All, I removed unnecessary loop handling of data_directory. This patch is applied to pass after the next patch was applied. * http://www.gossamer-threads.com/lists/linuxha/dev/76354 Please please confirm my correction. And please commit a correction. the reason i kept this loop is that if we need to check another directory for write permissions in the future we only need to add this directory to the loop. i used to have two loops in the ra: - one for checking if important directories exists and - one for checking if important directories are writable. see https://github.com/raoulbhatia/resource-agents/commit/136dd79 after your status_support patches in mid 2011, the first loop got unfolded. i kept the second loop on purpuse. what are your thoughts? did you simply remove the loop because it is unnecessary or did you have anything else in mind? if it is not too much of a problem, i'd like to keep the write check loop intact just in case we need it for another directory. cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.
Hi Raoul, why do you want to return here and not simply break and let the postfix_stop() continue it's work? No, I do not have any problem even if I use the break sentence. It is my preference to have used the return sentence. Cheers, Hideo Yamauchi. --- On Mon, 2011/11/14, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi! thanks for your contribution! On 2011-11-14 07:04, renayama19661...@ybb.ne.jp wrote: diff -r 52dcb4318e21 heartbeat/postfix --- a/heartbeat/postfix Mon Nov 14 14:46:36 2011 +0900 +++ b/heartbeat/postfix Mon Nov 14 14:47:03 2011 +0900 ... @@ -168,14 +171,17 @@ # grant some time for shutdown and recheck 5 times for i in 1 2 3 4 5; do - if postfix_running; then + if postfix_running info; then sleep 1 + else + ocf_log info Postfix stopped. + return $OCF_SUCCESS fi done why do you want to return here and not simply break and let the postfix_stop() continue it's work? besides that, your patch looks fine upon the first check. cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch]Remove unnecessary loop handling of data_directory for postfix.
Hi Raoul, Hi All, I removed unnecessary loop handling of data_directory. This patch is applied to pass after the next patch was applied. * http://www.gossamer-threads.com/lists/linuxha/dev/76354 Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi.diff -r b2a771cba975 heartbeat/postfix --- a/heartbeat/postfix Tue Nov 15 10:53:38 2011 +0900 +++ b/heartbeat/postfix Tue Nov 15 10:57:10 2011 +0900 @@ -278,16 +278,14 @@ # check directory permissions if ocf_is_true $status_support; then user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` -for dir in $data_dir; do -if ! su -s /bin/sh - $user -c test -w $dir; then -if ocf_is_probe; then -ocf_log info Directory '$dir' is not writable by user '$user' during probe. -else -ocf_log err Directory '$dir' is not writable by user '$user'. -return $OCF_ERR_PERM; -fi +if ! su -s /bin/sh - $user -c test -w $data_dir; then +if ocf_is_probe; then +ocf_log info Directory '$data_dir' is not writable by user '$user' during probe. +else +ocf_log err Directory '$data_dir' is not writable by user '$user'. +return $OCF_ERR_PERM; fi -done +fi fi fi ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.
Hi Raoul, Hi All, I send the modified patch of the resource agent of postfix. The correction is two points. * Change of the log level in conjunction with the monitor processing. * Deletion of an unnecessary loop by the stop processing Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi.diff -r 52dcb4318e21 heartbeat/postfix --- a/heartbeat/postfix Mon Nov 14 14:46:36 2011 +0900 +++ b/heartbeat/postfix Mon Nov 14 14:47:03 2011 +0900 @@ -96,12 +96,15 @@ } postfix_running() { +local loglevel +loglevel=${1:-err} + # run Postfix status if available if ocf_is_true $status_support; then output=`$binary $OPTION_CONFIG_DIR status 21` ret=$? if [ $ret -ne 0 ]; then -ocf_log err Postfix status: '$output'. $ret +ocf_log $loglevel Postfix status: '$output'. $ret fi return $ret fi @@ -121,7 +124,7 @@ postfix_start() { # if Postfix is running return success -if postfix_running; then +if postfix_running info; then ocf_log info Postfix already running. return $OCF_SUCCESS fi @@ -140,7 +143,7 @@ while true; do sleep 1 # break if postfix is up and running; log failure otherwise -postfix_running break +postfix_running info break ocf_log info Postfix failed initial monitor action. $ret done @@ -152,7 +155,7 @@ postfix_stop() { # if Postfix is not running return success -if ! postfix_running; then +if ! postfix_running info; then ocf_log info Postfix already stopped. return $OCF_SUCCESS fi @@ -168,14 +171,17 @@ # grant some time for shutdown and recheck 5 times for i in 1 2 3 4 5; do -if postfix_running; then +if postfix_running info; then sleep 1 +else +ocf_log info Postfix stopped. +#return $OCF_SUCCESS fi done # escalate to abort if we did not stop by now # @TODO shall we loop here too? -if postfix_running; then +if postfix_running info; then ocf_log err Postfix failed to stop. Escalating to 'abort'. $binary $OPTIONS abort /dev/null 21; ret=$? @@ -202,7 +208,14 @@ postfix_monitor() { -if postfix_running; then +local status_loglevel=err + +# Set loglevel to info during probe +if ocf_is_probe; then +status_loglevel=info +fi + +if postfix_running $status_loglevel; then return $OCF_SUCCESS fi ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]The patch which revises log and an unnecessary loop for postfix resource agent.
Hi All, Sorry Because there was an error to the patch, I send it again. Best Regards, Hideo Yamauchi. --- On Mon, 2011/11/14, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Hi All, I send the modified patch of the resource agent of postfix. The correction is two points. * Change of the log level in conjunction with the monitor processing. * Deletion of an unnecessary loop by the stop processing Please please confirm my correction. And please commit a correction. Best Regards, Hideo Yamauchi.diff -r 52dcb4318e21 heartbeat/postfix --- a/heartbeat/postfix Mon Nov 14 14:46:36 2011 +0900 +++ b/heartbeat/postfix Mon Nov 14 14:47:03 2011 +0900 @@ -96,12 +96,15 @@ } postfix_running() { +local loglevel +loglevel=${1:-err} + # run Postfix status if available if ocf_is_true $status_support; then output=`$binary $OPTION_CONFIG_DIR status 21` ret=$? if [ $ret -ne 0 ]; then -ocf_log err Postfix status: '$output'. $ret +ocf_log $loglevel Postfix status: '$output'. $ret fi return $ret fi @@ -121,7 +124,7 @@ postfix_start() { # if Postfix is running return success -if postfix_running; then +if postfix_running info; then ocf_log info Postfix already running. return $OCF_SUCCESS fi @@ -140,7 +143,7 @@ while true; do sleep 1 # break if postfix is up and running; log failure otherwise -postfix_running break +postfix_running info break ocf_log info Postfix failed initial monitor action. $ret done @@ -152,7 +155,7 @@ postfix_stop() { # if Postfix is not running return success -if ! postfix_running; then +if ! postfix_running info; then ocf_log info Postfix already stopped. return $OCF_SUCCESS fi @@ -168,14 +171,17 @@ # grant some time for shutdown and recheck 5 times for i in 1 2 3 4 5; do -if postfix_running; then +if postfix_running info; then sleep 1 +else +ocf_log info Postfix stopped. +return $OCF_SUCCESS fi done # escalate to abort if we did not stop by now # @TODO shall we loop here too? -if postfix_running; then +if postfix_running info; then ocf_log err Postfix failed to stop. Escalating to 'abort'. $binary $OPTIONS abort /dev/null 21; ret=$? @@ -202,7 +208,14 @@ postfix_monitor() { -if postfix_running; then +local status_loglevel=err + +# Set loglevel to info during probe +if ocf_is_probe; then +status_loglevel=info +fi + +if postfix_running $status_loglevel; then return $OCF_SUCCESS fi ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi Dejan, Many Thanks!! Hideo Yamauchi. --- On Tue, 2011/10/4, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Fri, Sep 30, 2011 at 11:17:19AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Sorry I sent the main body which was not a patch. I send it again. Patch applied. Cheers, Dejan Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Sorry There was still a mistake to the patch which I sent a while ago. With the patch which I sent a while ago, precious detailed log is canceled. Furthermore, I send the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE 1024 Alloc PE / Size 150 / 600.00 KiB Free PE / Size 874 / 3.41 MiB VG UUID csVKm6-Bzdp-s40E-9O2S-uttx-PrcW-fq6Wtz --- Logical volume --- LV Name /dev/ocft-vg/ocft-lv VG Name ocft-vg LV UUID XjMtXj-DLzy-J8Rb-6Bfb-HNoM-7o6x-VOPnMG LV Write Access read/write LV Status NOT available LV Size 600.00 KiB Current LE 150 Segments 1 Allocation inherit Read ahead sectors auto --- Physical volumes --- PV Name /dev/loop0 PV UUID z6deWo-42uN-HPrZ-nLC4-wrba-34IZ-N98cmL PV Status allocatable Total PE / Free PE 1024 / 874 2011/09/29_17:00:49 INFO: LVM Volume ocft-vg is offline That's for double stop, I think. OTOH, ocf-tester says that it passed all tests. Somebody's lying :) I do not know a lot about ocft. I carried out ocft with -v option. * It is LVM which applied the patch which I attached to this email to have carried out. [root@bl460g1a heartbeat]# /usr/sbin/ocft test -v LVM Initialing LVM...done (snip) Starting 'LVM' case 7 'monitor when running': Setting agent environment: export OCFT_pv=/var/run/resource-agents/ocft-LVM-pv Setting agent environment: export OCFT_vg=ocft-vg Setting agent environment: export OCFT_lv=ocft-lv Setting agent environment: export OCFT_loop=/dev/loop0 Setting agent environment: export OCF_RESKEY_volgrpname=ocft-vg Running agent: ./LVM stop ? Running agent: ./LVM monitor Checking return value: FAILED. The return value 'OCF_NOT_RUNNING' != 'OCF_SUCCESS'. See details below: 2011/09/30_10:16:49 INFO: LVM Volume ocft-vg is offline (snip) After stop of LVM was carried out on 'monitor when running' test, monitor seems to be carried out. Is not it a problem of ocft? When I tried by hand to stop a running VG: # OCF_RESKEY_volgrpname=$OCFT_vg /usr/lib/ocf/resource.d/heartbeat/LVM stop INFO: Deactivating volume group ocft-vg INFO: 0 logical volume(s) in volume group ocft-vg now active ERROR: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg ... # echo $? 0 The exit code is OK, but there's an error message. Further stops produced the same. Can you please verify this. Hence, there seems to be a problem with the ocft test case. This was a mistake of my patch. I attached the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. I confirm your information and revise a patch again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon,
Re: [Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi Dejan, Thank you for comment. I confirm your information and revise a patch again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Sep 12, 2011 at 02:44:22PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made the patch of the LVM resource agent at the next point of view. Point 1) The LVM resource agent outputs the details of the log at the time of the error for a system administrator. Point 2) The LVM resource agent uses OCF variable for a return code. Point 3) With a patch, the LVM resource agent merge status processing and report_status processing. * We did not revise it about TODO of vgimport/vgexport in the LVM resource agent. Please examine this patch. ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE 1024 Alloc PE / Size 150 / 600.00 KiB Free PE / Size 874 / 3.41 MiB VG UUID csVKm6-Bzdp-s40E-9O2S-uttx-PrcW-fq6Wtz --- Logical volume --- LV Name /dev/ocft-vg/ocft-lv VG Name ocft-vg LV UUID XjMtXj-DLzy-J8Rb-6Bfb-HNoM-7o6x-VOPnMG LV Write Access read/write LV Status NOT available LV Size 600.00 KiB Current LE 150 Segments 1 Allocation inherit Read ahead sectors auto --- Physical volumes --- PV Name /dev/loop0 PV UUID z6deWo-42uN-HPrZ-nLC4-wrba-34IZ-N98cmL PV Status allocatable Total PE / Free PE 1024 / 874 2011/09/29_17:00:49 INFO: LVM Volume ocft-vg is offline That's for double stop, I think. OTOH, ocf-tester says that it passed all tests. Somebody's lying :) When I tried by hand to stop a running VG: # OCF_RESKEY_volgrpname=$OCFT_vg /usr/lib/ocf/resource.d/heartbeat/LVM stop INFO: Deactivating volume group ocft-vg INFO: 0 logical volume(s) in volume group ocft-vg now active ERROR: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg ... # echo $? 0 The exit code is OK, but there's an error message. Further stops produced the same. Can you please verify this. Hence, there seems to be a problem with the ocft test case. Cheers, Dejan Best Regards, Hideo Yamauchi. diff -r fc1e82852f7a heartbeat/LVM --- a/heartbeat/LVM Wed Aug 31 01:39:02 2011 +0900 +++ b/heartbeat/LVM Mon Sep 12 14:29:36 2011 +0900 @@ -123,22 +123,17 @@ # Return LVM status (silently) # LVM_status() { - if - [ $LVM_MAJOR -eq 1 ] - then - vgdisplay $1 21 | grep -i 'Status.*available' 21 /dev/null - return $? - else - vgdisplay -v $1 21 | grep -i 'Status[ \t]*available' 21 /dev/null - return $? + local rc + loglevel=debug + + # Set the log level of the error message + if [ X${2} == X ]; then + loglevel=err + if ocf_is_probe; then + loglevel=warn + fi fi -} - -# -# Report on LVM volume status to stdout... -# -LVM_report_status() { - + if [ $LVM_MAJOR -eq 1 ] then @@ -150,16 +145,16 @@ echo $VGOUT | grep -i 'Status[ \t]*available' /dev/null rc=$? fi - - if - [ $rc -eq 0 ] - then - : Volume $1 is available - else - ocf_log debug LVM Volume $1 is not available (stopped) - return $OCF_NOT_RUNNING + if [ $rc -ne 0 ]; then + ocf_log $loglevel LVM Volume $1 is not available (stopped). ${VGOUT} + fi + + if [ X${2} == X ]; then + # status call return + return $rc fi + # Report on LVM volume status to stdout... if echo $VGOUT | grep -i 'Access.*read/write' /dev/null then @@ -167,8 +162,9 @@ else ocf_log debug Volume $1 is available read-only (running) fi - + return $OCF_SUCCESS + } # @@ -176,6 +172,7 @@ # # LVM_monitor() { + local rc if LVM_status $1 then @@ -185,9 +182,14 @@ return $OCF_NOT_RUNNING fi - vgck $1 /dev/null 21 + VGOUT=`vgck $1 21` + rc=$? + if [ $rc -ne 0 ]; then + ocf_log err LVM Volume $1 is not found. ${VGOUT}:${rc} + return $OCF_ERR_GENERIC + fi - return $? +
Re: [Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi Dejan, ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV0 Cur LV1 Open LV 0 Max PV0 Cur PV1 Act PV1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE 1024 Alloc PE / Size 150 / 600.00 KiB Free PE / Size 874 / 3.41 MiB VG UUID csVKm6-Bzdp-s40E-9O2S-uttx-PrcW-fq6Wtz --- Logical volume --- LV Name/dev/ocft-vg/ocft-lv VG Nameocft-vg LV UUIDXjMtXj-DLzy-J8Rb-6Bfb-HNoM-7o6x-VOPnMG LV Write Accessread/write LV Status NOT available LV Size600.00 KiB Current LE 150 Segments 1 Allocation inherit Read ahead sectors auto --- Physical volumes --- PV Name /dev/loop0 PV UUID z6deWo-42uN-HPrZ-nLC4-wrba-34IZ-N98cmL PV Status allocatable Total PE / Free PE1024 / 874 2011/09/29_17:00:49 INFO: LVM Volume ocft-vg is offline That's for double stop, I think. OTOH, ocf-tester says that it passed all tests. Somebody's lying :) I do not know a lot about ocft. I carried out ocft with -v option. * It is LVM which applied the patch which I attached to this email to have carried out. [root@bl460g1a heartbeat]# /usr/sbin/ocft test -v LVM Initialing LVM...done (snip) Starting 'LVM' case 7 'monitor when running': Setting agent environment:export OCFT_pv=/var/run/resource-agents/ocft-LVM-pv Setting agent environment:export OCFT_vg=ocft-vg Setting agent environment:export OCFT_lv=ocft-lv Setting agent environment:export OCFT_loop=/dev/loop0 Setting agent environment:export OCF_RESKEY_volgrpname=ocft-vg Running agent:./LVM stop ? Running agent:./LVM monitor Checking return value:FAILED. The return value 'OCF_NOT_RUNNING' != 'OCF_SUCCESS'. See details below: 2011/09/30_10:16:49 INFO: LVM Volume ocft-vg is offline (snip) After stop of LVM was carried out on 'monitor when running' test, monitor seems to be carried out. Is not it a problem of ocft? When I tried by hand to stop a running VG: # OCF_RESKEY_volgrpname=$OCFT_vg /usr/lib/ocf/resource.d/heartbeat/LVM stop INFO: Deactivating volume group ocft-vg INFO: 0 logical volume(s) in volume group ocft-vg now active ERROR: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg ... # echo $? 0 The exit code is OK, but there's an error message. Further stops produced the same. Can you please verify this. Hence, there seems to be a problem with the ocft test case. This was a mistake of my patch. I attached the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. I confirm your information and revise a patch again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Sep 12, 2011 at 02:44:22PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made the patch of the LVM resource agent at the next point of view. Point 1) The LVM resource agent outputs the details of the log at the time of the error for a system administrator. Point 2) The LVM resource agent uses OCF variable for a return code. Point 3) With a patch, the LVM resource agent merge status processing and report_status processing. * We did not revise it about TODO of vgimport/vgexport in the LVM resource agent. Please examine this patch. ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE
Re: [Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi Dejan, Sorry There was still a mistake to the patch which I sent a while ago. With the patch which I sent a while ago, precious detailed log is canceled. Furthermore, I send the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE 1024 Alloc PE / Size 150 / 600.00 KiB Free PE / Size 874 / 3.41 MiB VG UUID csVKm6-Bzdp-s40E-9O2S-uttx-PrcW-fq6Wtz --- Logical volume --- LV Name /dev/ocft-vg/ocft-lv VG Name ocft-vg LV UUID XjMtXj-DLzy-J8Rb-6Bfb-HNoM-7o6x-VOPnMG LV Write Access read/write LV Status NOT available LV Size 600.00 KiB Current LE 150 Segments 1 Allocation inherit Read ahead sectors auto --- Physical volumes --- PV Name /dev/loop0 PV UUID z6deWo-42uN-HPrZ-nLC4-wrba-34IZ-N98cmL PV Status allocatable Total PE / Free PE 1024 / 874 2011/09/29_17:00:49 INFO: LVM Volume ocft-vg is offline That's for double stop, I think. OTOH, ocf-tester says that it passed all tests. Somebody's lying :) I do not know a lot about ocft. I carried out ocft with -v option. * It is LVM which applied the patch which I attached to this email to have carried out. [root@bl460g1a heartbeat]# /usr/sbin/ocft test -v LVM Initialing LVM...done (snip) Starting 'LVM' case 7 'monitor when running': Setting agent environment: export OCFT_pv=/var/run/resource-agents/ocft-LVM-pv Setting agent environment: export OCFT_vg=ocft-vg Setting agent environment: export OCFT_lv=ocft-lv Setting agent environment: export OCFT_loop=/dev/loop0 Setting agent environment: export OCF_RESKEY_volgrpname=ocft-vg Running agent: ./LVM stop ? Running agent: ./LVM monitor Checking return value: FAILED. The return value 'OCF_NOT_RUNNING' != 'OCF_SUCCESS'. See details below: 2011/09/30_10:16:49 INFO: LVM Volume ocft-vg is offline (snip) After stop of LVM was carried out on 'monitor when running' test, monitor seems to be carried out. Is not it a problem of ocft? When I tried by hand to stop a running VG: # OCF_RESKEY_volgrpname=$OCFT_vg /usr/lib/ocf/resource.d/heartbeat/LVM stop INFO: Deactivating volume group ocft-vg INFO: 0 logical volume(s) in volume group ocft-vg now active ERROR: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg ... # echo $? 0 The exit code is OK, but there's an error message. Further stops produced the same. Can you please verify this. Hence, there seems to be a problem with the ocft test case. This was a mistake of my patch. I attached the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. I confirm your information and revise a patch again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Sep 12, 2011 at 02:44:22PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made the patch of the LVM resource agent at the next point of view. Point 1) The LVM resource agent outputs the details of the log at the time of the error for a system administrator. Point 2) The LVM resource agent uses OCF variable for a return code. Point 3) With a patch, the LVM resource agent merge status processing and report_status processing. * We did not revise it about TODO of vgimport/vgexport in the LVM resource agent. Please examine this patch. ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line
Re: [Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi Dejan, Sorry I sent the main body which was not a patch. I send it again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Sorry There was still a mistake to the patch which I sent a while ago. With the patch which I sent a while ago, precious detailed log is canceled. Furthermore, I send the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, ocft test reports this: 'LVM' case 7: FAILED. Agent returns unexpected value: 'OCF_NOT_RUNNING'. See details below: 2011/09/29_17:00:49 WARNING: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg --- Volume group --- VG Name ocft-vg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 4.00 MiB PE Size 4.00 KiB Total PE 1024 Alloc PE / Size 150 / 600.00 KiB Free PE / Size 874 / 3.41 MiB VG UUID csVKm6-Bzdp-s40E-9O2S-uttx-PrcW-fq6Wtz --- Logical volume --- LV Name /dev/ocft-vg/ocft-lv VG Name ocft-vg LV UUID XjMtXj-DLzy-J8Rb-6Bfb-HNoM-7o6x-VOPnMG LV Write Access read/write LV Status NOT available LV Size 600.00 KiB Current LE 150 Segments 1 Allocation inherit Read ahead sectors auto --- Physical volumes --- PV Name /dev/loop0 PV UUID z6deWo-42uN-HPrZ-nLC4-wrba-34IZ-N98cmL PV Status allocatable Total PE / Free PE 1024 / 874 2011/09/29_17:00:49 INFO: LVM Volume ocft-vg is offline That's for double stop, I think. OTOH, ocf-tester says that it passed all tests. Somebody's lying :) I do not know a lot about ocft. I carried out ocft with -v option. * It is LVM which applied the patch which I attached to this email to have carried out. [root@bl460g1a heartbeat]# /usr/sbin/ocft test -v LVM Initialing LVM...done (snip) Starting 'LVM' case 7 'monitor when running': Setting agent environment: export OCFT_pv=/var/run/resource-agents/ocft-LVM-pv Setting agent environment: export OCFT_vg=ocft-vg Setting agent environment: export OCFT_lv=ocft-lv Setting agent environment: export OCFT_loop=/dev/loop0 Setting agent environment: export OCF_RESKEY_volgrpname=ocft-vg Running agent: ./LVM stop ? Running agent: ./LVM monitor Checking return value: FAILED. The return value 'OCF_NOT_RUNNING' != 'OCF_SUCCESS'. See details below: 2011/09/30_10:16:49 INFO: LVM Volume ocft-vg is offline (snip) After stop of LVM was carried out on 'monitor when running' test, monitor seems to be carried out. Is not it a problem of ocft? When I tried by hand to stop a running VG: # OCF_RESKEY_volgrpname=$OCFT_vg /usr/lib/ocf/resource.d/heartbeat/LVM stop INFO: Deactivating volume group ocft-vg INFO: 0 logical volume(s) in volume group ocft-vg now active ERROR: LVM Volume ocft-vg is not available (stopped). Using volume group(s) on command line Finding volume group ocft-vg ... # echo $? 0 The exit code is OK, but there's an error message. Further stops produced the same. Can you please verify this. Hence, there seems to be a problem with the ocft test case. This was a mistake of my patch. I attached the patch which I revised. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Thank you for comment. I confirm your information and revise a patch again. Best Regards, Hideo Yamauchi. --- On Fri, 2011/9/30, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Sep 12, 2011 at 02:44:22PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, We made the patch of the LVM resource agent at the next point of view. Point 1) The LVM resource agent outputs the details of the log at the time of the error for a system administrator. Point 2) The LVM resource agent uses OCF variable for a return code. Point 3) With a patch, the LVM resource agent merge status processing and report_status processing. * We did not revise it about TODO of
Re: [Linux-ha-dev] [Patch 3]Change avoiding the stop error of the mysql resource agent.
Hi Raoul, thanks for clearing this for me! i've commited your change: https://github.com/raoulbhatia/resource-agents/commit/d828b7f91abff87e930b11097e6543e2bdc87023 thank you for your contribution! Many thanks!! Best Regards, Hideo Yamauchi. --- On Wed, 2011/9/21, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hello hideo-san! On 09/21/2011 02:28 AM, renayama19661...@ybb.ne.jp wrote: No. Because it repeats with status processing, it should delete it that RA checks pid file. - if [ ! -f $OCF_RESKEY_pid ]; then - ocf_log info MySQL is not running - return $OCF_SUCCESS + mysql_status info + rc=$? + if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS fi thanks for clearing this for me! i've commited your change: https://github.com/raoulbhatia/resource-agents/commit/d828b7f91abff87e930b11097e6543e2bdc87023 thank you for your contribution! raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 2]Change of the output level of the log of the resource agent of mysql.
Hi Raoul, I agree to modified contents. I will confirm movement tomorrow just to make sure in this resource agent. Please wait until tomorrow. Best Regards, Hideo Yamauchi. --- On Tue, 2011/9/20, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 09/20/2011 02:11 AM, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. The log level of my patch is not decision at suggestion. We think the log level to be allowed to unify it in WARN or INFO either other than ERROR. If log in status carried out by start and stop does not come to appear at ERROR level, the system administrator is not confused. i've commited your patch with a slight modification in log leves. please verify: https://github.com/raoulbhatia/resource-agents/commit/65b7b4202549bc087d3759dc9636b4966e2dafd2 thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 3]Change avoiding the stop error of the mysql resource agent.
Hi Raoul, Thank you for comment. ok, but why do you ommit the check if the pidfile exists and then cat this very file. if you cat a non existing file, you'll get errors. basically, the ra does the following: kill `cat /tmp/mysql.pid 2/dev/null` /dev/null; echo $? kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] 1 so the ra will exit with: ocf_log err MySQL couldn't be stopped return $OCF_ERR_GENERIC However, it is not necessary for the stop to become the error because mysql falls. The error of the stop restrains FO in some cases. * The similar processing to this problem is carried out in pgsql. shouldn't the following patch be enough (just adding the mysql_status check but not removing the pid check?) diff --git a/heartbeat/mysql b/heartbeat/mysql index e449de4..474f62e 100755 --- a/heartbeat/mysql +++ b/heartbeat/mysql @@ -898,6 +898,12 @@ mysql_stop() { $CRM_MASTER -D fi +mysql_status info +rc=$? +if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS +fi + if [ ! -f $OCF_RESKEY_pid ]; then ocf_log info MySQL is not running return $OCF_SUCCESS No. Because it repeats with status processing, it should delete it that RA checks pid file. -if [ ! -f $OCF_RESKEY_pid ]; then -ocf_log info MySQL is not running -return $OCF_SUCCESS +mysql_status info +rc=$? +if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS fi Sorry...My understanding to your comment may be wrong. Best Regards, Hideo Yamauchi. --- On Tue, 2011/9/20, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hello hideo-san! On 09/20/2011 02:19 AM, renayama19661...@ybb.ne.jp wrote: ... - if [ ! -f $OCF_RESKEY_pid ]; then - ocf_log info MySQL is not running - return $OCF_SUCCESS + mysql_status info + rc=$? + if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS fi ... i'm sorry but i do not understand the problem you're addressing. can you please describe with different words? Sorry For example, mysql fails in stop processing and causes an error when the next trouble happens. Step1 ) For example, for switch over, stop handling of Mysql begins. Step2 ) However, Mysql fell by process trouble just after that. The pid file is left. Step3 ) When pid file is left by the current stop processing, an error happens. * The stop processing is finished normally by checking pid file and the existence of the process by this patch definitely. * And the switch over excess succeeds. ok, but why do you ommit the check if the pidfile exists and then cat this very file. if you cat a non existing file, you'll get errors. basically, the ra does the following: kill `cat /tmp/mysql.pid 2/dev/null` /dev/null; echo $? kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] 1 so the ra will exit with: ocf_log err MySQL couldn't be stopped return $OCF_ERR_GENERIC shouldn't the following patch be enough (just adding the mysql_status check but not removing the pid check?) diff --git a/heartbeat/mysql b/heartbeat/mysql index e449de4..474f62e 100755 --- a/heartbeat/mysql +++ b/heartbeat/mysql @@ -898,6 +898,12 @@ mysql_stop() { $CRM_MASTER -D fi + mysql_status info + rc=$? + if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS + fi + if [ ! -f $OCF_RESKEY_pid ]; then ocf_log info MySQL is not running return $OCF_SUCCESS cheers, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 2]Change of the output level of the log of the resource agent of mysql.
Hi Raoul, i've commited your patch with a slight modification in log leves. please verify: https://github.com/raoulbhatia/resource-agents/commit/65b7b4202549bc087d3759dc9636b4966e2dafd2 I confirmed movement of RA. I confirmed log of RA. There is no problem. Thanks, Hideo Yamauchi. --- On Tue, 2011/9/20, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, I agree to modified contents. I will confirm movement tomorrow just to make sure in this resource agent. Please wait until tomorrow. Best Regards, Hideo Yamauchi. --- On Tue, 2011/9/20, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 09/20/2011 02:11 AM, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. The log level of my patch is not decision at suggestion. We think the log level to be allowed to unify it in WARN or INFO either other than ERROR. If log in status carried out by start and stop does not come to appear at ERROR level, the system administrator is not confused. i've commited your patch with a slight modification in log leves. please verify: https://github.com/raoulbhatia/resource-agents/commit/65b7b4202549bc087d3759dc9636b4966e2dafd2 thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 2]Change of the output level of the log of the resource agent of mysql.
Hi Raoul, Thank you for comment. The log level of my patch is not decision at suggestion. We think the log level to be allowed to unify it in WARN or INFO either other than ERROR. If log in status carried out by start and stop does not come to appear at ERROR level, the system administrator is not confused. Best Regards, Hideo Yamauchi. --- On Tue, 2011/9/20, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hello hideo-san! i just got to review your patch and got one or two questions (see below) On 08/30/2011 10:20 AM, renayama19661...@ybb.ne.jp wrote: mysql.1651-2.patch diff -r a2d0d723bc62 heartbeat/mysql --- a/heartbeat/mysql Wed Aug 31 01:32:38 2011 +0900 +++ b/heartbeat/mysql Wed Aug 31 01:38:08 2011 +0900 @@ -807,7 +817,7 @@ # Let the CRM/LRM time us out if required start_wait=1 while [ $start_wait = 1 ]; do - mysql_status + mysql_status warn rc=$? if [ $rc = $OCF_SUCCESS ]; then start_wait=0 @@ -908,7 +918,7 @@ count=0 while [ $count -lt $shutdown_timeout ] do - mysql_status + mysql_status info rc=$? if [ $rc = $OCF_NOT_RUNNING ]; then break in mysql_start() you use the warn level in mysql_stop() you use the info level. shouldn't these two be the same levels? (e.g. both warn?) @@ -918,7 +928,7 @@ ocf_log debug MySQL still hasn't stopped yet. Waiting... done - mysql_status + mysql_status info if [ $? != $OCF_NOT_RUNNING ]; then ocf_log info MySQL failed to stop after ${shutdown_timeout}s using SIGTERM. Trying SIGKILL... /bin/kill -KILL $pid /dev/null while reviewing the log leves, should we set this last ocf_log line to warn? (sorry for mixing this in here - but maybe you can comment on that too :) ) cheers, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 3]Change avoiding the stop error of the mysql resource agent.
Hi Raoul, Thank you for comment. diff -r cb5f9b84cc5f heartbeat/mysql --- a/heartbeat/mysqlWed Aug 31 01:38:15 2011 +0900 +++ b/heartbeat/mysqlWed Aug 31 01:38:55 2011 +0900 @@ -897,9 +897,10 @@ $CRM_MASTER -D fi -if [ ! -f $OCF_RESKEY_pid ]; then -ocf_log info MySQL is not running -return $OCF_SUCCESS +mysql_status info +rc=$? +if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS fi pid=`cat $OCF_RESKEY_pid 2 /dev/null ` i'm sorry but i do not understand the problem you're addressing. can you please describe with different words? Sorry For example, mysql fails in stop processing and causes an error when the next trouble happens. Step1 ) For example, for switch over, stop handling of Mysql begins. Step2 ) However, Mysql fell by process trouble just after that. The pid file is left. Step3 ) When pid file is left by the current stop processing, an error happens. * The stop processing is finished normally by checking pid file and the existence of the process by this patch definitely. * And the switch over excess succeeds. Best Regards, Hideo Yamauchi. --- On Tue, 2011/9/20, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hello hideo-san! On 08/30/2011 10:20 AM, renayama19661...@ybb.ne.jp wrote: Hi, When a process of mysql falls just after the check of the pid file of monitor, the mysql resource agent causes an error by a stop. This is caused by the fact that a resource agent checks pid of the mysql process that fell at the time of a stop. The resource agent should check the effectiveness of the pid file before a check again. ... diff -r cb5f9b84cc5f heartbeat/mysql --- a/heartbeat/mysql Wed Aug 31 01:38:15 2011 +0900 +++ b/heartbeat/mysql Wed Aug 31 01:38:55 2011 +0900 @@ -897,9 +897,10 @@ $CRM_MASTER -D fi - if [ ! -f $OCF_RESKEY_pid ]; then - ocf_log info MySQL is not running - return $OCF_SUCCESS + mysql_status info + rc=$? + if [ $rc = $OCF_NOT_RUNNING ]; then + return $OCF_SUCCESS fi pid=`cat $OCF_RESKEY_pid 2 /dev/null ` i'm sorry but i do not understand the problem you're addressing. can you please describe with different words? thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 2]Change of the output level of the log of the resource agent of mysql.
Hi Raoul, How about the modified patch of this place? Best Regards, Hidoe Yamauchi. --- On Tue, 2011/8/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi, The resource agent of mysql outputs error log every time in probe,start,stop. Aug 11 13:38:31 ib01 mysql[15764]: ERROR: MySQL is not running When a resource does not start, the resource agent changes the level of the log and should output it. Otherwise the operator is confused by an error. I modelled it on other resource agents and changed a level of the log. I send a patch. Please examine this patch. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 3]Change avoiding the stop error of the mysql resource agent.
Hi Raoul, How about the modified patch of this place? Best Regards, Hidoe Yamauchi. --- On Tue, 2011/8/30, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi, When a process of mysql falls just after the check of the pid file of monitor, the mysql resource agent causes an error by a stop. This is caused by the fact that a resource agent checks pid of the mysql process that fell at the time of a stop. The resource agent should check the effectiveness of the pid file before a check again. I send a patch. Please examine this patch. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch]Patch for LVM resource agents.
Hi All, We made the patch of the LVM resource agent at the next point of view. Point 1) The LVM resource agent outputs the details of the log at the time of the error for a system administrator. Point 2) The LVM resource agent uses OCF variable for a return code. Point 3) With a patch, the LVM resource agent merge status processing and report_status processing. * We did not revise it about TODO of vgimport/vgexport in the LVM resource agent. Please examine this patch. Best Regards, Hideo Yamauchi.diff -r fc1e82852f7a heartbeat/LVM --- a/heartbeat/LVM Wed Aug 31 01:39:02 2011 +0900 +++ b/heartbeat/LVM Mon Sep 12 14:29:36 2011 +0900 @@ -123,22 +123,17 @@ # Return LVM status (silently) # LVM_status() { - if -[ $LVM_MAJOR -eq 1 ] - then - vgdisplay $1 21 | grep -i 'Status.*available' 21 /dev/null - return $? - else - vgdisplay -v $1 21 | grep -i 'Status[ \t]*available' 21 /dev/null - return $? + local rc + loglevel=debug + + # Set the log level of the error message + if [ X${2} == X ]; then + loglevel=err + if ocf_is_probe; then + loglevel=warn + fi fi -} - -# -# Report on LVM volume status to stdout... -# -LVM_report_status() { - + if [ $LVM_MAJOR -eq 1 ] then @@ -150,16 +145,16 @@ echo $VGOUT | grep -i 'Status[ \t]*available' /dev/null rc=$? fi - - if -[ $rc -eq 0 ] - then -: Volume $1 is available - else -ocf_log debug LVM Volume $1 is not available (stopped) -return $OCF_NOT_RUNNING + if [ $rc -ne 0 ]; then + ocf_log $loglevel LVM Volume $1 is not available (stopped). ${VGOUT} + fi + + if [ X${2} == X ]; then + # status call return + return $rc fi + # Report on LVM volume status to stdout... if echo $VGOUT | grep -i 'Access.*read/write' /dev/null then @@ -167,8 +162,9 @@ else ocf_log debug Volume $1 is available read-only (running) fi - + return $OCF_SUCCESS + } # @@ -176,6 +172,7 @@ # # LVM_monitor() { + local rc if LVM_status $1 then @@ -185,9 +182,14 @@ return $OCF_NOT_RUNNING fi - vgck $1 /dev/null 21 + VGOUT=`vgck $1 21` + rc=$? + if [ $rc -ne 0 ]; then +ocf_log err LVM Volume $1 is not found. ${VGOUT}:${rc} +return $OCF_ERR_GENERIC + fi - return $? + return $OCF_SUCCESS } # @@ -232,10 +234,10 @@ vgdisplay $1 21 | grep 'Volume group .* not found' /dev/null { ocf_log info Volume group $1 not found -return 0 +return $OCF_SUCCESS } ocf_log info Deactivating volume group $1 - ocf_run vgchange -a ln $1 || return 1 + ocf_run vgchange -a ln $1 || return $OCF_ERR_GENERIC if LVM_status $1 @@ -256,10 +258,10 @@ check_binary $AWK # Off-the-shelf tests... - vgck $VOLUME /dev/null 21 + VGOUT=`vgck ${VOLUME} 21` if [ $? -ne 0 ]; then - ocf_log err Volume group [$VOLUME] does not exist or contains error! + ocf_log err Volume group [$VOLUME] does not exist or contains error! ${VGOUT} exit $OCF_ERR_GENERIC fi @@ -267,13 +269,13 @@ if [ $LVM_MAJOR -eq 1 ] then - vgdisplay $VOLUME /dev/null 21 + VGOUT=`vgdisplay ${VOLUME} 21` else - vgdisplay -v $VOLUME /dev/null 21 + VGOUT=`vgdisplay -v ${VOLUME} 21` fi if [ $? -ne 0 ]; then - ocf_log err Volume group [$VOLUME] does not exist or contains error! + ocf_log err Volume group [$VOLUME] does not exist or contains error! ${VGOUT} exit $OCF_ERR_GENERIC fi @@ -350,7 +352,7 @@ stop)LVM_stop $VOLUME exit $?;; - status) LVM_report_status $VOLUME + status) LVM_status $VOLUME $1 exit $?;; monitor) LVM_monitor $VOLUME ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, Hi Florian, Thank you for the change of the repository. Best Regards, Hideo Yamauchi. --- On Thu, 2011/9/8, Florian Haas f.g.h...@gmx.net wrote: On 09/08/11 10:34, Raoul Bhatia [IPAX] wrote: On 09/08/2011 04:49 AM, renayama19661...@ybb.ne.jp wrote: do not apply a patch even if you apply this patch, there is not the big problem. I am lacking in my explanation, and I'm sorry. ok. i just updated my pull request. https://github.com/ClusterLabs/resource-agents/pull/20 dejan, can you please review and apply our patches? Taking the liberty to step in for Dejan, I've merged and pushed your changes. Thanks for your contribution! Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, does it hurt if we leave this patch in? i do not see any problem with that code. Even if you do not apply a patch even if you apply this patch, there is not the big problem. I am lacking in my explanation, and I'm sorry. Best Regards, Hideo Yamauchi. --- On Wed, 2011/9/7, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi Hideo-san! On 09/07/2011 01:50 AM, renayama19661...@ybb.ne.jp wrote: However, my patch made a mistake. I do not seem to get the result of postfix status. It is necessary to watch log of postfix in the details of the problem after all. Therefore, I withdraw the patch of the part of postfix status. diff -r 19c97e0021f0 postfix --- a/postfix Thu Jun 16 21:45:53 2011 +0900 +++ b/postfix Thu Jun 16 21:46:01 2011 +0900 @@ -98,12 +98,8 @@ postfix_running() { # run Postfix status if available if ocf_is_true $status_support; then - output=`$binary $OPTION_CONFIG_DIR status 21` - ret=$? - if [ $ret -ne 0 ]; then - ocf_log err Postfix status: '$output'. $ret - fi - return $ret + $binary $OPTION_CONFIG_DIR status 21 + return $? fi # manually check Postfix's pid [...] I thought that output could acquire the details of the problem of postfix status with a former patch. And I thought the output of the details of the problem to be useful for an operator. However, the details of the problem only were really reflected on log of postfix in the environment that I tried. Therefore I want to withdraw the suggestion of the patch of this part. does it hurt if we leave this patch in? i do not see any problem with that code. thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch 1]Change of the monitor log of the resource agent of mysql.
Hi Raoul, All right. Thanks!! Hideo Yamauchi. --- On Wed, 2011/9/7, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 08/30/2011 10:19 AM, renayama19661...@ybb.ne.jp wrote: Hi, The log that a resource agent of mysql outputs with a monitor is a noise very much. Aug 11 13:40:01 ib01 mysql[18164]: INFO: COUNT(*) 4 Aug 11 13:40:01 ib01 mysql[18164]: INFO: MySQL monitor succeeded - repeat monitor log. I suggest the next patch. * The addition of the -q option to ocf_run. * Change the log of the monitor completion to debug. I send a patch. Please examine this patch. ack. i applied it to my mysql branch: https://github.com/raoulbhatia/resource-agents/commits/mysql thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Patch]Mistake of the table name variable.
Hi Raoul, All right. Thanks!! Hideo Yamauchi. --- On Wed, 2011/9/7, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 08/30/2011 08:58 AM, renayama19661...@ybb.ne.jp wrote: Hi, I contribute a patch revising the mistake of the variable of the resource agent of mysql. ack. i applied it to my mysql branch: https://github.com/raoulbhatia/resource-agents/commits/mysql thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, thanks for testing my ra. i'll check the ra and will then issue a pull request. Okay. We hope that a correction is included in the next release of the resource agent. However, my patch made a mistake. I do not seem to get the result of postfix status. It is necessary to watch log of postfix in the details of the problem after all. Therefore, I withdraw the patch of the part of postfix status. diff -r 19c97e0021f0 postfix --- a/postfix Thu Jun 16 21:45:53 2011 +0900 +++ b/postfix Thu Jun 16 21:46:01 2011 +0900 @@ -98,12 +98,8 @@ postfix_running() { # run Postfix status if available if ocf_is_true $status_support; then -output=`$binary $OPTION_CONFIG_DIR status 21` -ret=$? -if [ $ret -ne 0 ]; then -ocf_log err Postfix status: '$output'. $ret -fi -return $ret +$binary $OPTION_CONFIG_DIR status 21 +return $? fi # manually check Postfix's pid it's been a while since i looked into the code. why do you want to issue postfix status if /usr/sbin/postfix does not support this command? I thought that output could acquire the details of the problem of postfix status with a former patch. And I thought the output of the details of the problem to be useful for an operator. However, the details of the problem only were really reflected on log of postfix in the environment that I tried. Therefore I want to withdraw the suggestion of the patch of this part. Best Regards, Hideo Yamauchi. --- On Wed, 2011/9/7, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 06/16/2011 05:48 AM, renayama19661...@ybb.ne.jp wrote: The postfix ra worked well. thanks for testing my ra. i'll check the ra and will then issue a pull request. However, my patch made a mistake. I do not seem to get the result of postfix status. It is necessary to watch log of postfix in the details of the problem after all. Therefore, I withdraw the patch of the part of postfix status. diff -r 19c97e0021f0 postfix --- a/postfix Thu Jun 16 21:45:53 2011 +0900 +++ b/postfix Thu Jun 16 21:46:01 2011 +0900 @@ -98,12 +98,8 @@ postfix_running() { # run Postfix status if available if ocf_is_true $status_support; then - output=`$binary $OPTION_CONFIG_DIR status 21` - ret=$? - if [ $ret -ne 0 ]; then - ocf_log err Postfix status: '$output'. $ret - fi - return $ret + $binary $OPTION_CONFIG_DIR status 21 + return $? fi # manually check Postfix's pid it's been a while since i looked into the code. why do you want to issue postfix status if /usr/sbin/postfix does not support this command? thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch]Mistake of the table name variable.
Hi, I contribute a patch revising the mistake of the variable of the resource agent of mysql. Best Regards, Hideo Yamauchi. mysql.1662.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch 1]Change of the monitor log of the resource agent of mysql.
Hi, The log that a resource agent of mysql outputs with a monitor is a noise very much. Aug 11 13:40:01 ib01 mysql[18164]: INFO: COUNT(*) 4 Aug 11 13:40:01 ib01 mysql[18164]: INFO: MySQL monitor succeeded - repeat monitor log. I suggest the next patch. * The addition of the -q option to ocf_run. * Change the log of the monitor completion to debug. I send a patch. Please examine this patch. Best Regards, Hideo Yamauchi. mysql.1651-1.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch 2]Change of the output level of the log of the resource agent of mysql.
Hi, The resource agent of mysql outputs error log every time in probe,start,stop. Aug 11 13:38:31 ib01 mysql[15764]: ERROR: MySQL is not running When a resource does not start, the resource agent changes the level of the log and should output it. Otherwise the operator is confused by an error. I modelled it on other resource agents and changed a level of the log. I send a patch. Please examine this patch. Best Regards, Hideo Yamauchi. mysql.1651-2.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Patch 3]Change avoiding the stop error of the mysql resource agent.
Hi, When a process of mysql falls just after the check of the pid file of monitor, the mysql resource agent causes an error by a stop. This is caused by the fact that a resource agent checks pid of the mysql process that fell at the time of a stop. The resource agent should check the effectiveness of the pid file before a check again. I send a patch. Please examine this patch. Best Regards, Hideo Yamauchi. mysql.1651-3.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Repost][Patch]The revision of noize log of the iSCSITarget resource.
Hi, I make a former patch again for agent3.9.2 and contribute it. The patch performed the check of the portals parameter before setting a default value. When users use tgt, warning log is not output by iSCSI target-RA when users do not set portals parameter. The log of the next noise is not output by a patch. Jul 12 19:15:54 srv01 iSCSITarget[13839]: WARNING: Configuration parameter portals is not supported by the iSCSI implementation and will be ignored. Jul 12 19:16:04 srv01 iSCSITarget[13957]: WARNING: Configuration parameter portals is not supported by the iSCSI implementation and will be ignored. Jul 12 19:16:14 srv01 iSCSITarget[14060]: WARNING: Configuration parameter portals is not supported by the iSCSI implementation and will be ignored. Jul 12 19:16:25 srv01 iSCSITarget[14189]: WARNING: Configuration parameter portals is not supported by the iSCSI implementation and will be ignored. * The link of an old email becomes next * http://www.gossamer-threads.com/lists/linuxha/dev/70274 Please please confirm the contents of the patch. Best Regards, Hideo Yamauchi.diff -r f4df06073f4d iSCSITarget --- a/iSCSITarget Tue Jul 12 19:37:14 2011 +0900 +++ b/iSCSITarget Tue Jul 12 19:37:40 2011 +0900 @@ -42,9 +42,6 @@ fi : ${OCF_RESKEY_implementation=${OCF_RESKEY_implementation_default}} -# Listen on 0.0.0.0:3260 by default -OCF_RESKEY_portals_default=0.0.0.0:3260 -: ${OCF_RESKEY_portals=${OCF_RESKEY_portals_default}} # Lockfile, used for selecting a target ID LOCKFILE=${HA_RSCTMP}/iSCSITarget-${OCF_RESKEY_implementation}.lock @@ -552,6 +549,7 @@ case $1 in meta-data) + OCF_RESKEY_portals_default=0.0.0.0:3260 meta_data exit $OCF_SUCCESS ;; @@ -564,6 +562,10 @@ # Everything except usage and meta-data must pass the validate test iSCSITarget_validate +# Listen on 0.0.0.0:3260 by default +OCF_RESKEY_portals_default=0.0.0.0:3260 +: ${OCF_RESKEY_portals=${OCF_RESKEY_portals_default}} + case $__OCF_ACTION in start) iSCSITarget_start;; stop) iSCSITarget_stop;; ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Prototypic declaration is insufficient.
Hi all, Because there is not prototypic declaration, in the top of the source of glue, I cannot compile it. diff -r 7d9a54d5da6c main.c --- a/main.cFri Jun 17 18:34:21 2011 +0900 +++ b/main.cFri Jun 17 18:34:55 2011 +0900 @@ -78,6 +78,7 @@ void log_buf(int severity, char *buf); void log_msg(int severity, const char * fmt, ...)G_GNUC_PRINTF(2,3); void trans_log(int priority, const char * fmt, ...)G_GNUC_PRINTF(2,3); +void setup_cl_log(void); static int pil_loglevel_to_syslog_severity[] = { /* Indices: none=0, PIL_FATAL=1, PIL_CRIT=2, PIL_WARN=3, Best Regards, Hideo Yamauch. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, I'm sorry. I was weak in English, and it confused you. please refetch one last time from https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix i think i got the probing issue fixed! I confirm movement and will inform it of a result tomorrow. Best Regards, Hideo Yamauchi --- On Wed, 2011/6/15, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 06/15/2011 10:53 AM, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. please test the postfix ra from my repository: https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix there is a minor issue regarding probes and a resulting double start, which is left to be resolved. no other issues in my production environment so far. so i'd be glad if you could give it a shot! All right. I confirm movement in postfix which you showed. i'm sorry but i do not understand what you mean by that. can you please rephrase that? Because our environment is RHEL, I report a test result on RHEL5 and RHEL6. perfect! please refetch one last time from https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix i think i got the probing issue fixed! thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, I confirmed movement of postfix in the next environment. * RHEL5 - postfix 2.3.3 * RHEL6 - postfix 2.6.6 The postfix ra worked well. However, my patch made a mistake. I do not seem to get the result of postfix status. It is necessary to watch log of postfix in the details of the problem after all. Therefore, I withdraw the patch of the part of postfix status. diff -r 19c97e0021f0 postfix --- a/postfix Thu Jun 16 21:45:53 2011 +0900 +++ b/postfix Thu Jun 16 21:46:01 2011 +0900 @@ -98,12 +98,8 @@ postfix_running() { # run Postfix status if available if ocf_is_true $status_support; then -output=`$binary $OPTION_CONFIG_DIR status 21` -ret=$? -if [ $ret -ne 0 ]; then -ocf_log err Postfix status: '$output'. $ret -fi -return $ret +$binary $OPTION_CONFIG_DIR status 21 +return $? fi # manually check Postfix's pid Best Regards, Hideo Yamauchi. --- On Wed, 2011/6/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, I'm sorry. I was weak in English, and it confused you. please refetch one last time from https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix i think i got the probing issue fixed! I confirm movement and will inform it of a result tomorrow. Best Regards, Hideo Yamauchi --- On Wed, 2011/6/15, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 06/15/2011 10:53 AM, renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. please test the postfix ra from my repository: https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix there is a minor issue regarding probes and a resulting double start, which is left to be resolved. no other issues in my production environment so far. so i'd be glad if you could give it a shot! All right. I confirm movement in postfix which you showed. i'm sorry but i do not understand what you mean by that. can you please rephrase that? Because our environment is RHEL, I report a test result on RHEL5 and RHEL6. perfect! please refetch one last time from https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix i think i got the probing issue fixed! thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, to my knowledge, the ra's output is logged by pacemaker. moreover, postfix logs to the mail facility itself. what are the reasons for separately capturing and logging all output? When a problem occurred, the output of detailed log helps an operator. In addition, pacemaker can give only the log that ra output in std. My the third patch was wrong. And log of postfix helps a manager enough. Please abandon my the third patch to a trash box. Best Regards, Hideo Yamauchi. --- On Thu, 2011/6/9, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for the merge of the patch. to my knowledge, the ra's output is logged by pacemaker. moreover, postfix logs to the mail facility itself. what are the reasons for separately capturing and logging all output? When a problem occurred, the output of detailed log helps an operator. In addition, pacemaker can give only the log that ra output in std. Best Regards, Hideo Yamauchi. --- On Thu, 2011/6/9, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 07.06.2011 04:40, renayama19661...@ybb.ne.jp wrote: Hi All, I contribute my last patch.(patch3) This is a patch for the sources which applied patch 1. It is the patch which output the details of the error in log. hi! to my knowledge, the ra's output is logged by pacemaker. moreover, postfix logs to the mail facility itself. what are the reasons for separately capturing and logging all output? (mainly patch3) thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
Hi Raoul, Thank you for the merge of the patch. to my knowledge, the ra's output is logged by pacemaker. moreover, postfix logs to the mail facility itself. what are the reasons for separately capturing and logging all output? When a problem occurred, the output of detailed log helps an operator. In addition, pacemaker can give only the log that ra output in std. Best Regards, Hideo Yamauchi. --- On Thu, 2011/6/9, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 07.06.2011 04:40, renayama19661...@ybb.ne.jp wrote: Hi All, I contribute my last patch.(patch3) This is a patch for the sources which applied patch 1. It is the patch which output the details of the error in log. hi! to my knowledge, the ra's output is logged by pacemaker. moreover, postfix logs to the mail facility itself. what are the reasons for separately capturing and logging all output? (mainly patch3) thanks, raoul ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi Raoul, Hideo-san, i updated your postfix.patch2 the way i would improve it. any objections? No. Thanks! Best Regards, Hideo Yamauchi. --- On Mon, 2011/6/6, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hideo-san, i updated your postfix.patch2 the way i would improve it. any objections? cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi Dejan, Thank you for comment. In the latest version of ocf-shellfuncs there is some support for version checks. I did not know that there was the check handling of version in new ocf-shellfuncs. I renew a patch to use the processing. Thanks. Hideo Yamauchi. --- On Mon, 2011/6/6, Dejan Muhamedagic de...@suse.de wrote: Hi Hideo-san, On Mon, Jun 06, 2011 at 01:36:01PM +0900, renayama19661...@ybb.ne.jp wrote: Hi All, Sorry + if [ ${ver_str[0]} -le 2 -a ${ver_str[1]} -le 5 ]; then I missed. + if [ ${ver_str[0]} -lt 2 -o ${ver_str[0]} -eq 2 -a ${ver_str[1]} -lt 5 ]; then In the latest version of ocf-shellfuncs there is some support for version checks. Cheers, Dejan Thanks. Hideo Yamauchi. --- On Mon, 2011/6/6, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment * Change of the parameter check * Error log when status processing failed * Value set of the ret variable I send the patch of other corrections later. Please comment on all of you for the patch. Best Regards, Hideo Yamauchi. --- On Fri, 2011/6/3, Dejan Muhamedagic de...@suse.de wrote: On Fri, Jun 03, 2011 at 12:03:20PM +0200, Raoul Bhatia [IPAX] wrote: On 06/03/2011 11:45 AM, Dejan Muhamedagic wrote: Regressions are bad. You have to keep in mind that not everybody runs the latest release of postfix. This really needs to be fixed before the release. it's no regression but has been like that since the initial release. see commit e7af463d or https://github.com/ClusterLabs/resource-agents/blame/master/heartbeat/postfix#LID100 i didn't know this until Noah brought this to my/our attention: http://www.gossamer-threads.com/lists/linuxha/pacemaker/72379#72379 OK. I misunderstood the post, it seemed to me as if status had been introduced in the latest set of patches. This is another matter then. Cheers, Dejan thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi Raoul, Thank you for comment. i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. I think so. However, I do not know a lot about postfix so. I want the opinion of the detailed person. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) Sorry... It is my typo. for the version check, i think we should try using the ocf internal function. Ok. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) Thanks! * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. I made modifications to set unsettled ret variable in an original resource agent. But I am unsettled, the ret variable may not have to output it in log. Best Regards, Hideo Yamauchi. --- On Mon, 2011/6/6, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 06/06/2011 04:51 AM, renayama19661...@ybb.ne.jp wrote: Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) for the version check, i think we should try using the ocf internal function. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. please comment on my suggestions and/or update the ra in this regard. thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi All, I revised the first patch. Please confirm contents. Best Regards, Hideo Yamauchi. --- On Tue, 2011/6/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. I think so. However, I do not know a lot about postfix so. I want the opinion of the detailed person. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) Sorry... It is my typo. for the version check, i think we should try using the ocf internal function. Ok. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) Thanks! * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. I made modifications to set unsettled ret variable in an original resource agent. But I am unsettled, the ret variable may not have to output it in log. Best Regards, Hideo Yamauchi. --- On Mon, 2011/6/6, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 06/06/2011 04:51 AM, renayama19661...@ybb.ne.jp wrote: Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) for the version check, i think we should try using the ocf internal function. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. please comment on my suggestions and/or update the ra in this regard. thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ diff -r a18d987956c7 postfix --- a/postfix Tue Jun 07 11:07:11 2011 +0900 +++ b/postfix Tue Jun 07 11:13:16 2011 +0900 @@ -97,10 +97,25 @@ running() { # run Postfix status -$binary $OPTION_CONFIG_DIR status /dev/null 21 +local rcode +if ocf_is_true $status_support; then +output=`$binary $OPTION_CONFIG_DIR status` +rcode=$? +if [ $rcode -ne 0 ]; then +ocf_log err Postfix status: $output +fi +return $rcode +else +PIDFILE=${queue_dir}/pid/master.pid +if [ -f $PIDFILE ]; then + PID=`head -n 1 $PIDFILE` + kill -s 0 $PID /dev/null 21 [ `ps -p $PID | grep master | wc -l` -eq 1 ] + return $? +fi +false +fi } - postfix_status() { running @@ -219,25 +234,42 @@ fi fi +# check postfix version +status_support=false +output=`postconf $OPTION_CONFIG_DIR -h mail_version` +if [ $? -ne 0 ]; then +ocf_log err Postfix config mail_version does not exist. $output +fi +ocf_version_cmp $output 2.5.0 +if [ $? -ne 0 ]; then +status_support=true +fi + # check spool/queue and data directories # this is required because postfix check does not catch all errors queue_dir=`postconf $OPTION_CONFIG_DIR -h queue_directory 2/dev/null` -data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` -for dir in $queue_dir $data_dir; do -if [ ! -d $dir ]; then -ocf_log err Postfix directory '$queue_dir' does not exist. $ret +if [ ! -d $queue_dir ]; then +ocf_log err Postfix directory '$queue_dir' does not exist. +return $OCF_ERR_INSTALLED +fi +if ocf_is_true $status_support; then +data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +if [ ! -d $data_dir ]; then +ocf_log err Postfix directory '$data_dir' does not exist. return $OCF_ERR_INSTALLED fi -done +fi # check permissions -
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi All, I contribute my last patch.(patch3) This is a patch for the sources which applied patch 1. It is the patch which output the details of the error in log. Best Regards, Hideo Yamauchi. --- On Tue, 2011/6/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, I revised the first patch. Please confirm contents. Best Regards, Hideo Yamauchi. --- On Tue, 2011/6/7, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Raoul, Thank you for comment. i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. I think so. However, I do not know a lot about postfix so. I want the opinion of the detailed person. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) Sorry... It is my typo. for the version check, i think we should try using the ocf internal function. Ok. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) Thanks! * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. I made modifications to set unsettled ret variable in an original resource agent. But I am unsettled, the ret variable may not have to output it in log. Best Regards, Hideo Yamauchi. --- On Mon, 2011/6/6, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: Hi Hideo-san! On 06/06/2011 04:51 AM, renayama19661...@ybb.ne.jp wrote: Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment i think we could safely do the kill -s 0 for *any* version and call postfix status only if available. btw. quickly looking at your patch, i spotted 1 typo: status_suuport instead of status_support (douple u/p) for the version check, i think we should try using the ocf internal function. * Change of the parameter check the checks are basically fine. i would slightly update the logging information. (i can do this when i apply your patches) * Error log when status processing failed * Value set of the ret variable i don't think that the use of $ret is correct. please comment on my suggestions and/or update the ra in this regard. thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ diff -r 303d9d19eb61 postfix --- a/postfix Tue Jun 07 11:22:15 2011 +0900 +++ b/postfix Tue Jun 07 11:35:52 2011 +0900 @@ -102,7 +102,7 @@ output=`$binary $OPTION_CONFIG_DIR status` rcode=$? if [ $rcode -ne 0 ]; then -ocf_log err Postfix status: $output +ocf_log err Postfix status: $rcode : $output fi return $rcode else @@ -130,11 +130,11 @@ fi # start Postfix -$binary $OPTIONS start /dev/null 21 +output=`$binary $OPTIONS start /dev/null 21` ret=$? if [ $ret -ne 0 ]; then -ocf_log err Postfix returned error. $ret +ocf_log err Postfix returned error. $ret : $output return $OCF_ERR_GENERIC fi @@ -163,11 +163,11 @@ fi # stop Postfix -$binary $OPTIONS stop /dev/null 21 +output=`$binary $OPTIONS stop /dev/null 21` ret=$? if [ $ret -ne 0 ]; then -ocf_log err Postfix returned an error while stopping. $ret +ocf_log err Postfix returned an error while stopping. $ret : $output return $OCF_ERR_GENERIC fi @@ -201,7 +201,12 @@ { if postfix_status; then ocf_log info Reloading Postfix. -$binary $OPTIONS reload +output=`$binary $OPTIONS reload` +ret=$? +if [ $ret -ne 0 ]; then +ocf_log err Postfix reload error. $ret : $output +fi +return $ret fi } @@ -237,8 +242,9 @@ # check postfix version status_support=false output=`postconf $OPTION_CONFIG_DIR -h mail_version` -if [ $? -ne 0 ]; then -ocf_log err Postfix config mail_version does not exist.
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment * Change of the parameter check * Error log when status processing failed * Value set of the ret variable I send the patch of other corrections later. Please comment on all of you for the patch. Best Regards, Hideo Yamauchi. --- On Fri, 2011/6/3, Dejan Muhamedagic de...@suse.de wrote: On Fri, Jun 03, 2011 at 12:03:20PM +0200, Raoul Bhatia [IPAX] wrote: On 06/03/2011 11:45 AM, Dejan Muhamedagic wrote: Regressions are bad. You have to keep in mind that not everybody runs the latest release of postfix. This really needs to be fixed before the release. it's no regression but has been like that since the initial release. see commit e7af463d or https://github.com/ClusterLabs/resource-agents/blame/master/heartbeat/postfix#LID100 i didn't know this until Noah brought this to my/our attention: http://www.gossamer-threads.com/lists/linuxha/pacemaker/72379#72379 OK. I misunderstood the post, it seemed to me as if status had been introduced in the latest set of patches. This is another matter then. Cheers, Dejan thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ diff -r fd372ca4d647 postfix --- a/postfix Mon Jun 06 11:45:51 2011 +0900 +++ b/postfix Mon Jun 06 11:46:32 2011 +0900 @@ -97,10 +97,22 @@ running() { # run Postfix status -$binary $OPTION_CONFIG_DIR status /dev/null 21 +if [ $status_suuport ]; then +output=`$binary $OPTION_CONFIG_DIR status` +if [ $? -ne 0 ]; then +ocf_log err Postfix status. %s $output +fi +else +PIDFILE=${queue_dir}/pid/master.pid +if [ -f $PIDFILE ]; then + PID=`head -n 1 $PIDFILE` + kill -s 0 $PID /dev/null 21 [ `ps -p $PID | grep master | wc -l` -eq 1 ] + return $? +fi +false +fi } - postfix_status() { running @@ -219,25 +231,44 @@ fi fi +# check postfix version +status_support=true +output=`postconf $OPTION_CONFIG_DIR -h mail_version` +if [ $? -ne 0 ]; then +ocf_log err Postfix config mail_version does not exist. %s $output +fi +ver_str=(`echo $output | tr '.' ' '`) +if [ ${ver_str[0]} -le 2 -a ${ver_str[1]} -le 5 ]; then +status_support=false +fi + # check spool/queue and data directories # this is required because postfix check does not catch all errors queue_dir=`postconf $OPTION_CONFIG_DIR -h queue_directory 2/dev/null` -data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` -for dir in $queue_dir $data_dir; do -if [ ! -d $dir ]; then -ocf_log err Postfix directory '$queue_dir' does not exist. $ret +ret=$? +if [ ! -d $queue_dir ]; then +ocf_log err Postfix directory '$queue_dir' does not exist. $ret +return $OCF_ERR_INSTALLED +fi +if [ ! $status_support ]; then +data_dir=`postconf $OPTION_CONFIG_DIR -h data_directory 2/dev/null` +ret=$? +if [ ! -d $data_dir ]; then +ocf_log err Postfix directory '$data_dir' does not exist. $ret return $OCF_ERR_INSTALLED fi -done +fi # check permissions -user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` -for dir in $data_dir; do -if ! su -s /bin/sh - $user -c test -w $dir; then -ocf_log err Directory '$dir' is not writable by user '$user'. -exit $OCF_ERR_PERM; -fi -done +if [ ! $status_support ]; then +user=`postconf $OPTION_CONFIG_DIR -h mail_owner 2/dev/null` +for dir in $data_dir; do +if ! su -s /bin/sh - $user -c test -w $dir; then +ocf_log err Directory '$dir' is not writable by user '$user'. +exit $OCF_ERR_PERM; +fi +done +fi # run Postfix internal check $binary $OPTIONS check /dev/null 21 @@ -355,3 +386,4 @@ exit $OCF_ERR_UNIMPLEMENTED ;; esac + ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
Re: [Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
Hi All, The next patch supports a loop of the waiting of the start processing successively. The start processing revised it like other resource agents to wait on for start. Best Regards, Hideo Yamauchi. --- On Mon, 2011/6/6, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi All, I send a patch in conjunction with the status processing. It is made the following modifications. * Carry out status processing in a version judgment * Change of the parameter check * Error log when status processing failed * Value set of the ret variable I send the patch of other corrections later. Please comment on all of you for the patch. Best Regards, Hideo Yamauchi. --- On Fri, 2011/6/3, Dejan Muhamedagic de...@suse.de wrote: On Fri, Jun 03, 2011 at 12:03:20PM +0200, Raoul Bhatia [IPAX] wrote: On 06/03/2011 11:45 AM, Dejan Muhamedagic wrote: Regressions are bad. You have to keep in mind that not everybody runs the latest release of postfix. This really needs to be fixed before the release. it's no regression but has been like that since the initial release. see commit e7af463d or https://github.com/ClusterLabs/resource-agents/blame/master/heartbeat/postfix#LID100 i didn't know this until Noah brought this to my/our attention: http://www.gossamer-threads.com/lists/linuxha/pacemaker/72379#72379 OK. I misunderstood the post, it seemed to me as if status had been introduced in the latest set of patches. This is another matter then. Cheers, Dejan thanks, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ diff -r 6f405d0b697b postfix --- a/postfix Mon Jun 06 11:56:47 2011 +0900 +++ b/postfix Mon Jun 06 12:04:58 2011 +0900 @@ -139,12 +139,16 @@ sleep 2 # initial monitoring action -running -ret=$? -if [ $ret -ne $OCF_SUCCESS ]; then -ocf_log err Postfix failed initial monitor action. $ret -return $OCF_ERR_GENERIC -fi +while : +do +running +ret=$? +if [ $ret -eq $OCF_SUCCESS ]; then +break; +fi +sleep 1 +ocf_log debug Postfix failed initial monitor action. $ret +done ocf_log info Postfix started. return $OCF_SUCCESS ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/