[ 
https://bro-tracker.atlassian.net/browse/BIT-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303#comment-16303
 ] 

Jon Siwek commented on BIT-348:
-------------------------------

{quote}
I did some more testing on a large trace, and I am seeing differences in the 
duration of a few connections. Have you seen that as well / do you have an idea 
where that's coming from? 

Here's an example:

{code}
before: 1359400833.646398       CHrUQS2wXshH59NEb6    XXXX 35752 YYYY    80  
tcp     http    3.497443        380     428 SF   429     ShADaFfRR       8      
 712     4       172     (empty)
after:  1359400833.646398       C9yL7F1JnxhPtfoLMi    XXXX 35752 YYYY    80  
tcp     http    0.240429        380     428 SF   429     ShADaFfRR       8      
 712     4       172     (empty)
{code}
{quote}

This is tricky to explain, hang in there... here's basically what the lead up 
to the FIN exchange looks like:

responder: Seq=1, ACK=381 (everything looks fine up to and including this 
packet)
originator: Seq=381, ACK=430 (ack'd an unseen segment... still "fine", Bro can 
deal with that)
originator: Seq=381, ACK=430, FIN (still "fine")
responder: Seq=1, ACK=382, FIN (what?  It either backed down the sequence 
number or the peer's last ACK was wrong)

The sequence/ack number tracking in both the old code versus the new code 
behaves similarly: the responder's TCP reassembler will think that everything 
before 430 is "old" (in this case it's been reported to analyzers as 
Undelivered) because of the weird ACK, and that the last sequence seen is 1 
(which is true, we only know about the SYN, but have not seen a packet carrying 
data for this endpoint).

The difference in the old code versus the new code that matters is in 
TCP_Reassembler::DataPending: 
https://github.com/bro/bro/compare/topic;jsiwek;bit-348#diff-a06b37d4ebafffdc8f9f39cca155b753R649

There's a new check that if the the last sequence seen is less than the last 
sequence that the reassembler is to treat as "old", then consider that case as 
"no data pending".  The pcap you sent me matches this case in the new code on 
certain calls and ends up returning a different value than the old code.  This 
matters in determining the connection duration because the connection will stop 
updating it's "last time" if both endpoints are treated as "closed with no 
pending data".  In the old code, the responder endpoint is prevented from 
satisfying the "no pending data" requirement and so always updates the "last 
time" on every packet.  In the new code, as soon as the FIN exchange is 
complete, both endpoints meet the "closed with no pending data" requirement and 
"last time" of the connection no longer updates with subsequent packets.

My justification for adding the additional check in 
TCP_Reassembler::DataPending was/is that the intention of that method is to 
tell whether there's any holes in the sequence space that could possibly be 
filled by a later TCP packet delivered out of order.  I didn't anticipate this 
particular scenario, but the logic still holds: it's never possibly to 
"deliver" any such data because the reassembler has moved on... if it gets 
stuff from the sequence space it considers "old", it will be dropped.  In that 
sense, there's "no data pending" and I think the new code is more "correct".

And the way in which duration is measured seems a bit finicky/arbitrary to 
begin with, not sure the difference for cases like this are important?

Also, just to note: the same connection but w/ sane seq/ack numbers would be 
handled the same way between code versions, and that way would produce results 
that are the same as what the new code produces.

> Reassembler integer overflow issues. Data not delivered after 2GB
> -----------------------------------------------------------------
>
>                 Key: BIT-348
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-348
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>    Affects Versions: git/master
>            Reporter: gregor
>            Assignee: Robin Sommer
>            Priority: High
>              Labels: inttypes
>             Fix For: 2.3
>
>
> {noformat}
> #!rst
> The TCP Reassembler does not deliver any data to analyzers after the first 
> 2GB due to signed integer overflow (Actually it will deliver again between 
> 4--6GB, etc.) This happens silently, i.e., without content_gap events or 
> Undelivered calls. 
> This report superseded BIT-315, BIT-137
> The TCP Reassembler (and Reassem) base class use ``int`` to keep track of 
> sequence numbers and ``seq_delta`` to check for differences. If a connection 
> exceeds 2GB, the relative sequence numbers (int) used by the Reassembler 
> become negative. While many parts of the Reassembler still work (because 
> seq_delta still reports the correct difference) some parts do not. In 
> particular ``seq_to_skip`` is broken (and fails silently). There might well 
> be other parts of the Reassembler that fail 
> silently as well, that I haven't found yet. 
> See Comments in TCP_Reassembler.cc for more details. 
> The Reassembler should use int64. However this will require deep changes to 
> the Reassembler and the TCP Analyzer and TCP_Endpoint classes (since we also 
> store sequence numbers there). Also, the analyzer framework will need tweaks 
> as well (e.g., Undelivered uses ``int`` for sequence numbers, also has to go 
> to 64 bit)
> As a hotfix that seems to work I disabled the ``seq_to_skip`` features. It 
> wasn't used by any analyzer or policy script (Note, that seq_to_skip is 
> different from skip_deliveries). Hotfix is in 
> topic/gregor/reassembler-hotfix
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3-OD-03-012#6321)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to