FW: MQ Channel Production Issue

Woodcox, Janice Engle (DIS) Mon, 15 Mar 2004 10:15:56 -0800

Title: Message

Excellent! Thank you again :-)

=========== Janice (Engle) Woodcox ===================
Department of Information Services / Computer Services Division
s/390 Customer Technical Support
Help Desk Phone: 360-753-2454
Direct Line: 360-902-3102
[EMAIL PROTECTED]
Group Web Site: http://sww.wa.gov/dis/csd_ctss/
=====================================================

-----Original Message-----
From: Potkay, Peter M (PLC, IT) [mailto:[EMAIL PROTECTED]
Sent: Monday, March 15, 2004 9:52 AM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

No, these are different animals completely. They are not going to effect your original problem at all.

Please take a look at the Intercommunication Manual:

http://publibfp.boulder.ibm.com/epubs/html/csqzae08/csqzae08tfrm.htm

Chapter 6.

Rather then me rehashing the manual, take a look here and let me know if you have more specific questions for these 2 parameters.

Basically, the higher you make BATCHINT, the longer your channel will hold back messages before committing them at the receiving side (but is saves CPU cycles on a per message basis, since you are not constantly committing). I use 0 for my BATCHINT, since I value speed of overall messaging more.

The higher the BATCHINT value is the more important BATCHHB is. I have mine at zero as well, since I don't have open batches for long periods of time.

-----Original Message-----
From: Woodcox, Janice Engle (DIS) [mailto:[EMAIL PROTECTED]
Sent: Monday, March 15, 2004 12:20 PM
To: [EMAIL PROTECTED]
Subject: MQ Channel Production Issue

Hi Peter. Thank you very *much* for your explanation and suggestions of settings for the "disconnect and heartbeat interval". We are changing and testing this week. Would it also be adviseable to change the "batch interval and batch heartbeat interval" to the same values?

meaning:
Disconnect interval . . . . : 300
Heartbeat interval . . . . : 30

Batch interval . . . . . . : 300
Batch heartbeat interval . : 30

=========== Janice (Engle) Woodcox ===================
Department of Information Services / Computer Services Division
s/390 Customer Technical Support
Help Desk Phone: 360-753-2454
Direct Line: 360-902-3102
[EMAIL PROTECTED]
Group Web Site: http://sww.wa.gov/dis/csd_ctss/
=====================================================

-----Original Message-----
From: Yeske, Judy [mailto:[EMAIL PROTECTED]
Sent: Friday, March 12, 2004 9:13 AM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

Thanks again Peter !    In our development system, we were able to recreate the problem. I then revised the ADOPTMCA parm from NO to YES. We reran our test, way too cool!    The NT Server sender channel went from retrying to running (without manual intervention of having to drop the socket). The channel was adopted, the original socket was gone and a new socket was created.    This is perfect, thank you again !!

Judy

-----Original Message-----
From: MQSeries List [mailto:[EMAIL PROTECTED] On Behalf Of Potkay, Peter M (PLC, IT)
Sent: Friday, March 12, 2004 11:08 AM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

I have never dealt with AIX, but I have for NT and the mainframe.

From a channel perspective and AdoptNewMCA, KeepAlive, Heartbeats, and DISCINT, I would consider AIX and NT identical, as they are both considered "distributed" platforms.

Maybe some one with AIX experience will contradict this, but I don't think so.

-----Original Message-----
From: Yeske, Judy [mailto:[EMAIL PROTECTED]
Sent: Friday, March 12, 2004 10:03 AM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

Peter,

Thank you very much, this is extremely helpful. We are attemtping to recreate our problem on development nodes. We're using an AIX box, connecting to my Mainframe MQ.    We've attempted several disconnects (restarting MQ on the AIX, breaking the network connection). For every test between the AIX and the mainframe, the mainframe channel is ending abnormally and then restarting - no hung sockets.    In Production, when this occurs between the NT server and the mainframe, the mainframe channel remains active and we have a hung socket. We're in the process of getting an NT development server. In the meantime, I have a question.

In Janice's note below, she mentions problems with NT Server.   I'm a total mainframe person, what is the difference between an AIX box and an NT server and how does MQ differ between the two ?

Thanks,

Judy

-----Original Message-----
From: MQSeries List [mailto:[EMAIL PROTECTED] On Behalf Of Potkay, Peter M (PLC, IT)
Sent: Wednesday, March 10, 2004 8:27 PM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

When the SNDR goes away, the RCVR is left twiddling its thumbs, waiting for the SNDR channel to send it something, anything. If heartbeats are turned on, the HBs will flow back and forth at regular intervals, when no real MQ application messages are flowing. Both ends of the channel know when to expect the next heartbeat, and if they don't get it, they assume a network outage, and thus gracefully end the channel and put it into an INACTIVE state. In the case of RCVRs, this INACTIVE state means it is ready to except a new connection.

However, if the connection is broken and the SNDR channel goes retrying, the SNDR will keep trying to connect. If the network comes back up before the RCVR had a chance to put itself in INACTIVE state, the channel will fail, because the SNDR finds the RCVR still hung up on the old socket.

There are 3 things to look at here:

1. DISCINT: Make it smaller, so when the channel does not have any work to do, it will go INACTIVE. An INACTIVE channel is not susceptible to the network going down and causing this type of problem.

2. Heartbeats: Make the value smaller, like maybe 15 or 30. The more often the HBs flows, the better likelihood that the HBs (or lack of them actually) will catch a broken connection and put the RCVR in INACTIVE state, ready for a new connection.

3.ADOPTMCA: Even if both of the above parameters are on and correctly set, sometimes the connection will be broken and the SNDR channel will successfully be able to try reconnecting, but finds the RCVR still hung up on the old socket. If AdoptMCA and its related parameters are turned on, then the QM will know to ignore the old RCVR channel, and start up (adopt) a new instance of the RCVR channel. The connection is allowed and the channel successfully starts up again. (Its called AdoptNewMCA on non z/OS platforms).

All three parameters used properly together go a long way in making your channels bulletproof. (A fourth, KeepAlive, is somewhat related, but is really more useful for SRVRCONN channels outside of a Get with Wait call, or when one QM does not support HBs for all other channels)

Details are in the OS/390 System Setup Guide and the Distributed System Admin Guide.

-----Original Message-----
From: Woodcox, Janice Engle (DIS) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 10, 2004 7:38 PM
To: [EMAIL PROTECTED]
Subject: Re: MQ Channel Production Issue

We have experienced the same problem but didn't realize it was due to a dropped TCP socket. thank you! We are running WMQ v5.3.1 in z/OS 1.4 connecting to a customers NT server. Because of your diagnosis (of the cause of the SEND channel going into RETRY status), I contacted our customer and confirmed they'd rebooted their firewall node about the same time their (and our) SEND channel went into RETRY status.

On this SEND channel the subsystem's channel events log reported:

+CSQX500I channel started (when the subsystem came up about 3AM)

+CSQX545I channel closed because disconnect interval expired (this is set to 600 seconds)

- about 6 hours later -

+CSQX500I channel started

+CSQX526E message sequence error

+CSQX506E message receipt confirmation not received

+CSQX599E channel ended abnormally

I'm still learning WMQ on the mainframe side only. Does anyone know....when the NT server is being rebooted isn't there a shutdown/startup script of some kind that can include a WMQ command to stop/start all channels respectively? Is there a WMQ command that will stop/start all channels? If there is, would this be the way to resolve this problem?

Janice Woodcox

State of Washington

Department of Information Services

-----Original Message-----
From: Yeske, Judy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 10, 2004 7:36 AM
To: [EMAIL PROTECTED]
Subject: MQ Channel Production Issue

Hello,

We are running MQ Version 2.1 on the Mainframe - z/OS Version 1.4.    We have channels that run between an NT Production Server and our Mainframe MQ. The NT server channels are dropping due to a reboot of a Firewall node, this is occurring several times during the day.    The NT Server Sender channel goes into a retry state, to correct this we need to drop the hung TCP socket on our mainframe MQ system.   We have dealt with this in the past and determined that channels between midrange/NT nodes and the Mainframe do not clean up.   The Midrange people states this is a 'bug' on the mainframe MQ.   I disagree. Has anyone experienced this ?    By the way, the heartbeat interval on our Receiver channel is set at 300.   I'm revising this to 60, although I'm not sure how well this will work if it doesn't clean up the hung socket upon disconnects.

I would appreciate your thoughts / advice.

Thank you,
Judy Yeske

This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited. If
you are not the intended recipient, please notify the sender
immediately by return email and delete this communication and destroy all copies.

FW: MQ Channel Production Issue

Reply via email to