This thread was:
Re: [ND] ND4 License & server stability problems
I've renamed it as I narrow down the problem.
Here's some new info:
1. The license error I saw in the "core" dump file was referencing a Visigenic
license:
"WARNING: Your licensed period for the use of this EVALUATION version of the
Software has expired. Further use of the Software is prohibited without the
purchase of a permanent license. You are bound by the terms of the Visigenic
Software, Inc. license agreement that you consented to when you ordered or
installed this Software."
Our ND SysAdmin support person here says we can disregard this harmless message.
Is that true? I'm inclined to believe him because our server was stable for a
few consecutive day with no crashes.
2. Even though we bounce the ND4 app server every night, we still have the
problem where one particular ND4 app gives us the generic "Could not connect to
the app server..." error; while the other ND4 apps on the same box that talk to
the very same database, work just fine.
I believe I have narrowed the problem down to 2 particular pages in the
problematic app. As long as i don't make any changes on those 2 pages everything
is fine. But if I then modify one of those pages on my devel NT box and FTP it
over to the prod HP-UX box, 90% of the time it will immediately give the "could
not connect..." message page after I bounce the server. However I notice that
sometimes it won't give me this error immed, instead it will let me FTP them
over and work fine, then about 2 days later it starts giving the "could not
connect..." message page.
Could this have something do with the # of workers, and how the classes are
loaded?
CURT wrote:
You might want to try project pre-loading. That way, the app server builds
all the objects before it takes on session processing.
-- Curt
Could some one tell me if preloading might help me? If so, how do I do it?
3. One other thing I notice on these 2 problematic pages is that they both
contain CSpTransaction code. Although it compiles fine and runs with no errors,
maybe there is something bad lurking around that only comes out after both pages
are accessed.
Thanks ahead for any comments or advice.
Janet
(Embedded
image moved "Frank Staheli" <[EMAIL PROTECTED]>
to file: 09/28/99 02:42 PM
pic03906.pcx)
To: [EMAIL PROTECTED], Janet
Traub/IS/SSC/THD
cc:
Subject: Re: [ND] ND4 License & server stability problems
Janet,
Although I'm not sure we are experiencing the same problem, we are
seeing some of the same symptoms you describe with some of our ND apps
accessing an Informix database. This problem seems to crop up about
every 3-5 days, however. Also, we are not in the habit of bouncing the
app server each night as you do.
What means do you use to connect to Informix? We use OpenLink ODBC,
and were able to clear up the problems we were seeing significantly by
changing some OpenLink configuration parameters, but it still crops up
every few days. The ND log indicates that the "Connection [was]
rejected by datasource." At that point, restarting just the RDBMS
service usually solves the problem.
Interestingly, we have two other ND apps on the same box, one which
accesses SQL Server and one which has native access to Oracle (although
the Oracle and SQL Server databases are on different boxes than the
Informix (Oracle and Informix are on HP-UX)). Almost always when the
apps connecting to Informix have lost their marbles, the SQL-Server and
Oracle apps still respond normally.
As a comparison, when the machine where Oracle resides is rolled, ND
reconnects fine using dbRecoveryPlan when Oracle is back up. This is
not the case with the dbRecoveryPlan for Informix when the machine
where Informix resides is rolled.
Another difference in our environment, is that our production box runs
NT. Is this advisable?
Also, is restarting the app server each night a good habit that we
should be getting into?
Frank Staheli
Brigham Young University
>>> <[EMAIL PROTECTED]> 9/28/1999 11:55:56 AM >>>
Hi folks, (please excuse this long note. I had the detail already typed
up for
an internal email.)
Last week, there was a thread about that generic "could not connect to
app
server" msg. It seemed to be related to the ND4 app server not creating
a .ser
file. I had that problem. And was finally about to get a "stable"
version of the
app out to production by playing around with the order in which I made
my
changes (2 pages were trouble, the other pages and DOs were all fine to
tweak
confidently knowing the crazy error would not pop up). I know that
sounds like
bogus superstition, but that's how it went.
Now the app is in Pilot (with only 15 users) running on our production
HP-UX
box talking to Informix7 which also lives on the same box. (perhaps
that's a
problem in itself , as the box isn't a supercruncher by any means.)
Although we have been bouncing the app server every night we still
encounter a
consistent problem every morning around 10:15AM. The app server does
not crash,
however our application loses its abilility to communicate with the
database
server.
NOTE: When this db connectivity prob occurs, oddly, it doesn't affect
our other
3 ND4 apps that run on the same box using the exact same db driver to
talk to
the exact same database. These other apps continue working just fine.
Athough
they currently have little to no traffic. This puzzles me. And it also
scares me
since this is that "tempermental" app that was giving my problems in
development
with the .sid business.
Below are my findings to date about it. I hoping the expired Visigenic
license
is the sole problem but it still seems odd that only the one ND4 app is
affected
when the problem occurs.
If nothing else, I'd like to know:
1. Is it unusual that the ND4 app server's database connection to only
ONE app
would die? While the other apps, work fine?
2. below you'll see I get the warning 'truncating cursor rows fetched
from 9999
to 5000' in the log. The 9999 is the default I set the MaxDisplayRows
property
to, in both the .sdo file, and the corresponding .spg file. Is that a
bad thing
to do?Could it cause problems if ALL my "unlimited length"Repeateds
(which never
return more than 300 rows, and have a reasonable # of columns) have
this
default. Should I be more careful, and use a lower # like 999 when
appropriate?
Thank you kindly for any help. As you can tell, I'm new to this ND4
server admin
stuff.
Janet
The tome continues below...!
Reading thru it you'll learn there appear to be 2 potential problem
areas: an
expired license(see #1 below. I'm waiting for some one here to
investigate it),
configuration problems with the # of workers, clients, etc. (see #5
below).
1. When the app server was bounced last night using ndappsrv stop/start
, a core
dump file was written.
The core was also "touched" when the database connection was lost this
AM.
The core dump mentions an expiring Visigenic license (see the screen
shot pasted
below).
2. I found a reference in the beginning of our ND4 log from way back in
Feb '99
(for a different ND4 app) that also mentioned an expiring license.
The dump
mentions a 6 months license period--and this is approx 6/7 months later
than
Feb.
3. The log indicates that the crash on Friday 10:40AM involved a
generic OS
network error:
System error code is 27 ([CANTOPEN] Unable to open : Exec format
error)
. Vendor error#1 code is 0 ()
Vendor error#2 code is 0 ()
where's the Informix error message for 27 :
-27 - Operating system error
An operating-system error code with the meaning shown was unexpectedly
returned to the database server. Check the documentation for your
operating system to find out what too large might mean in the context
of the current operation.
Prior to getting that generic error, this database warning written to
the log:
'truncating cursor rows fetched from 9999 to 5000'
4. The log indicates that the crash on Monday shortly after 10:06AM,
was
preceeded with this warning:
"no free workers "
then these errors: "Database server is currently processing SQL
task."
"-NetdynPartition DefaultPartitionObjectFactoryImpl has
terminated
abnormally java.lang.ThreadDeath"
then at 11:38AM it was giving the same generic error #27(shown above
from
Friday).
Before today's crash, the same warning message about the truncating
cursor rows
appearred.
5. Here's a thread about traffic between NetDynamics and the
database.
NALINI: In the past I have found that in Java applications, when a
large number
of rows are being processed, somehow, the client loses the cursor.
Could that
have been the reason in this case?
JANET: in our app the largest number of rows we are returning is 250.
But then
we loop thru each one of those rows, and make another call to the
database. Thus
lots of db traffic. (this could be partly what's killing us. I'll look
into it.)
NALINI: In the error document you had enclosed Janet, there is a point
where the
error message mentions something about 'truncating cursor rows fetched
from 9999
to 5000' .
JANET: the 9999 value is a NetDynamics common default for the "max #
rows" that
a query returns. But good catch. I'll look into where we are using it,
and
change it to a smaller number where possible.
NALINI:Claudine, is there not a variable for this on the SQLNK server
side.
Would it be worth looking at?
CLAUDINE: Yes, there is a CURSORHOLD variable which can be set in the
Sequelink
configuration files. The 'hold=yes' retains cursors across
transactions, while a
no hold option allows the databases to close the cursors after a commit
or
rollback.
NALINI: Is there a hard limit on the Net Dynamics Server as to the
maximum
number of 'empid' connections? I have looked at the pris08
configuration and
that is set to 100 . The Informix server does not have any entries on
exceeding
no. of user connections.
JANET: Tthe NetDynamics (ND) server also has a number of parameters
related to
#of database connections (# workers, # client/per worker, Maximum
database
connections per process, etc). ND uses database connection pooling and
shares
those connections between users. I'll look into how we have it
configured. We
need to evaluate what these params should be set to, once we estimate #
of
concurrent users, db traffic volume, etc. I think there's a Statistics
Viewer
utility we can run to help us get a handle on the client to worker
ratio.
Thanks for your help! I'll be out of the office until later tonight.
Janet
(Embedded image moved to file: pic13208.pcx)
pic03906.pcx