Is this also a problem for the 3.1.8 (ADSM) client? I recently set up the
ADSM client for a NT SQL cluster, but the owners for the server are still
gearing up (I suspect that they haven't really taxed the server yet or
finished their app load).
Damon Burkhart
Operations Analyst
Server Operations
Kmart Corporation
(248)614-0629
-----Original Message-----
From: Del Hoobler [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 31, 2000 9:47 AM
To: [EMAIL PROTECTED]
Subject: Re: MS Hotfix to problem of TSM memory leak in MSCS cluster
Neil,
We can give you some insight...since we (IBM/Tivoli) were the
ones who drove this problem with Microsoft.
TDP for SQL has been working on this problem with Microsoft since June.
It was finally resolved to Q244509, however, the symptoms of Q244509 do
not adequately reflect the symptoms seen with TDP for SQL.
We have been, and still are, waiting for the publication of Q268835 to
reflect the symptoms seen by TDP for SQL. Q268835 will reference Q244509.
Unfortunately, the TDP code that exposed this problem is common to most, if
not all, Windows clients. It is executed as part of our file subsystem
initialization.
We have only seen the problem on the failover, non-quorum, server and only
for disk resources.
The root of the problem for TDP is its calls to ClusterResourceControl()
to enumerate cluster disk resources. ClusterResourceControl() causes
an 8k leak of 'pool nonpaged bytes'.
I hope this helps. For more details, you may want to call Microsoft.
Thanks,
Del
----------------------------------------------------
Del Hoobler
IBM Corporation
[EMAIL PROTECTED]
Neil Schofield <[EMAIL PROTECTED]>@VM.MARIST.EDU> on
10/31/2000 05:58:00 AM
Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
cc:
Subject: MS Hotfix to problem of TSM memory leak in MSCS cluster
We have recently suffered a problem with the TSM client for Windows running
in
an MSCS cluster caused by a memory leak in the non-paged pool.
The depletion of the non-paged memory pool only occured when disk resources
were
distributed in certain ways between the two nodes of the cluster. By
enabling
pool tagging and using the Microsoft PoolMon.EXE utility, we were able to
determine that the depletion of the non-paged pool occured in the pool tag
'None' and only occured when a TSM process (eg DSMC.EXE, SQLDSMC.EXE,
scheduled
incremental backup) started.
Since we perform hourly TDP transaction log backups of a virtual SQL Server
in
this cluster, the non-paged pool was quickly exhausted and the cluster
nodes
would fail after a period of about 2 weeks.
The problem was caused by a bug in NT that has not currently been fixed in
any
service pack. It relates specifically to memory that is not released if an
attempt to read a partition table returns an error. The MS Knowledge Base
article relating to the problem is Q244509 and a hotfix is available from
Microsoft.
After we applied the hotfix, the memory leak stopped. However there are a
few
things I don't understand about the problem. Firstly, the TSM process
didn't
have to perform a backup to cause a depletion of the non-paged pool.
Indeed,
simply running DSMC QUIT would leak memory. Secondly, running a TSM process
on
one node of the cluster would cause the memory leak to occur on both nodes.
I would be interested in any comments that any of the developers may have
about
these experiences.
For info, the MS KB article relating to diagnosing memory leaks using
PoolMon is
Q177415.
Neil Schofield