We are running an IPsec, ESP VPN through two Check Point firewalls,
"Check Point VPN-1(TM) & FireWall-1(R) NG with Application Intelligence
(R55) - Build 121." Note that we are running the VPN _through_ the
firewalls, they are not endpoints. The firewalls shouldn't care what
this traffic is; it should be generic IP traffic that happens to be
IP protocol 50 as far as the firewalls are concerned.

The VPN works, but we are getting a lot of "virtual defragmentation
errors" which we believe may be adversely affecting the performance.
Here's a diagram of the situation,

               [VPN endpoint, aaa.bbb.107.50]
                              |
                   [CP FW-1, gibraltar]
                              |
                       {            }
                       {  Internet  }
                       {            }
                              |
                    [CP FW-1, jebelmusa]
                              |
               [VPN endpoint, ccc.ddd.103.193]

What seems to happen is that the firewall at the "far" end gets the
errors. That is, we see errors at the "bottom" firewall when the traffic
is from aaa.bbb.107.50 to ccc.ddd.103.193, "down" the diagram, and errors
at the "top" firewall for traffic flowing "up" the diagram.

The firewalls are Solaris 8, and we have collected traffic using "snoop"
to see if fragments are being lost, but we generally see all fragments
arrive at the interface (some detailed data presented below). I have
tried using "fw monitor" to see what is going on, but it appears that
there is no visiblilty in to the reassembly engine using fw monitor
(which agrees with what I've been able to find in the documentation).
The packets that produce errors never show up in the monitor output at
all, even though snoop will catch them arriving at the interface.

For example, here is a log entry for a fragmentation error,

Number:          248256
Date:               26May2004
Time:               11:04:22
Product:           VPN-1 & FireWall-1
Interface:         lo0
Origin:             jebelmusa (ccc.ddd.103.130)
Type:               Log
Action:             Drop
Source:            MILP-VPN (aaa.bbb.107.50)
Destination:    EDH-VPN (ccc.ddd.103.193)
Protocol:          esp
Information:     message: Virtual defragmentation error: Timeout
                        ip_id: 1258
                        ip_len: 0
                        ip_offset: 0
                        fragments_dropped: 4
                        during_sec: 60

The error is from the bottom firewall in the diagram. Here is the
snoop output from that firewall,

  jebelmusa# snoop -i /tmp/esp.snp -V -t a 'ip[4:2] = 1258'
________________________________
  1 11:04:21.99893 aaa.bbb.107.50 -> ccc.ddd.103.193 ETHER Type=0800 (IP), size = 74 
bytes
  1 11:04:21.99893 aaa.bbb.107.50 -> ccc.ddd.103.193 ESP IP fragment ID=1258 
Offset=1480 MF=0
________________________________
  2 11:04:22.00744 aaa.bbb.107.50 -> ccc.ddd.103.193 ETHER Type=0800 (IP), size = 1514 
bytes
  2 11:04:22.00744 aaa.bbb.107.50 -> ccc.ddd.103.193 ESP IP fragment ID=1258 Offset=0  
  MF=1

And for completeness, from the top firewall in the diagram,

  gibraltar# snoop -i /tmp/esp.snp -V -t a 'ip[4:2] = 1258'
________________________________
  1 11:04:21.89189 aaa.bbb.107.50 -> ccc.ddd.103.193 ETHER Type=0800 (IP), size = 1514 
bytes
  1 11:04:21.89189 aaa.bbb.107.50 -> ccc.ddd.103.193 ESP IP fragment ID=1258 Offset=0  
  MF=1
________________________________
  2 11:04:21.89191 aaa.bbb.107.50 -> ccc.ddd.103.193 ETHER Type=0800 (IP), size = 74 
bytes
  2 11:04:21.89191 aaa.bbb.107.50 -> ccc.ddd.103.193 ESP IP fragment ID=1258 
Offset=1480 MF=0

Note that the order of the packets gets reversed as they traverse the
Internet, but FW-1 should be able to handle that. And looking at the
snoop output, that happens to most of the fragments, and the majority
of these are successfully reassembled. The drops are a few percent
(again, the VPN works, but we are concerned the virtual defragmentation
drops may be causing performance issues).

Before anyone asks, there is plenty of CPU on these systems (typically
85-95% idle) for the amount of traffic. We don't seem to be under any
sort of fragmentation attack. Anyway, if it was an attack or flood of
fragments, I would expect (hope) the firewall would log that fragments
are being dropped due to buffers running out rather than saying the
60 second timeout expired.

Given that it looks like the fragments all arrive at the firewall well
within the timeout, that the virtual defragmentation works for most
packets, and that we do not know how to look inside FW-1 to see the
defragmentation processing, we need some help in figuring out what the
problem might be and how we can fix it. What is the problem FW-1 is
having with these fragmented datagrams and how can we fix the problem
(short of stopping the defragmentation at the VPN endpoints)?
--
Crist J. Clark                               [EMAIL PROTECTED]
Globalstar Communications                                (408) 933-4387

The information contained in this e-mail message is confidential,
intended only for the use of the individual or entity named above.
If the reader of this e-mail is not the intended recipient, or the
employee or agent responsible to deliver it to the intended recipient,
you are hereby notified that any review, dissemination, distribution or
copying of this communication is strictly prohibited.  If you have
received this e-mail in error, please contact [EMAIL PROTECTED]

=================================================
To set vacation, Out-Of-Office, or away messages,
send an email to [EMAIL PROTECTED]
in the BODY of the email add:
set fw-1-mailinglist nomail
=================================================
To unsubscribe from this mailing list,
please see the instructions at
http://www.checkpoint.com/services/mailing.html
=================================================
If you have any questions on how to change your
subscription options, email
[EMAIL PROTECTED]
=================================================

Reply via email to