[FW-1] Delay with traffic going through non-pivot member of an LS Unicast cluster.

Sergio Alvarez Tue, 23 Aug 2011 14:56:52 -0700

Hello. I have in my hands a very weird issue, that have never seen before,
and was hoping some of you guys might have suggestions about it.


Scenario: Two-member Load Sharing Unicast cluster running R75.10 over open
servers running SPLAT.

Cluster has worked like this for months without any problems but today
received report about problems with a new application that requires traffic
to go through the cluster. This new application is running on a
DMZ interface, the following info was provided about it:

Web Servers en DMZ: IBM HTTP Server version 7.0.0.11 (build cf111021.10)
over AIX 6.1-02
Portal Servers on the intranet: IBM WebSphere Application Server – ND
version 7.0.0.11 (build
cf111021.10) over AIX 6.1-02

Traffic comes from web services located in other network segments, through
the firewall and to the DMZ in question. After the deployment of this
application, noticed important delays with traffic through the cluster, but
those appeared some times and some times not. Decided to do some tests,
among those, enabled "fw monitor" captures on both cluster members and found
out when traffic goes through the Pivot member of the Unicast cluster,
everything works perfect, but when it is handled by the other cluster
member, the delays occur.

Here are multiple pieces of info that might be of help:

- Traffic goes over TCP ports 10039, 10040, 10050.
- Only affecting this new app.
- No drops are shown in the logs.
- Cluster "advanced" configuration is set to handle load sharing by "IPs"
only and "use sticky decision function" is selected.
- If the cluster is changed to HA instead of LS Unicast operation mode,
everything works perfect
- Checked the cluster status with multiple commands, but one in particular
caused interest: "cphaprob syncstat".

SK34475 document says the following:

Lost sync connection (num of events)... SHOULD be 0 - positive value
indicates connectivity problems
Not held due to no members............. SHOULD be 0 - positive value
indicates connectivity problem between the members

Running the command on both cluster members in fact showed positive values
in both variables (for
example: 2144 in the primary member for "lost sync"). Noticed changing from
LS unicast to HA causes increases on these values, so currently unsure if
the positive value is normal given multiple changes in the cluster operation
mode.

Given the fact the issue is affecting only one application, it appears to me
it might not be related with a general cluster problem, but thought it might
be useful info.

Any ideas on how to get this one resolved will be very appreciated.

Regards

-- 
Sergio Alvarez
CISSP | CCSE+

=================================================
To set vacation, Out-Of-Office, or away messages,
send an email to [email protected]
in the BODY of the email add:
set fw-1-mailinglist nomail
=================================================
To unsubscribe from this mailing list,
please see the instructions at
http://www.checkpoint.com/services/mailing.html
=================================================
If you have any questions on how to change your
subscription options, email
[email protected]
=================================================

Scanned by Check Point Total Security Gateway.

[FW-1] Delay with traffic going through non-pivot member of an LS Unicast cluster.

Reply via email to