[ 
https://bro-tracker.atlassian.net/browse/BIT-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20309#comment-20309
 ] 

Jon Siwek commented on BIT-1376:
--------------------------------

Maybe consider the patch in topic/jsiwek/bit-1376

Basically what I think happened goes like:

child->parent sends MSG_LOG "selects=X canwrites=Y"
- first chunk saying here comes MSG_LOG gets accepted
- second chunk says "selects=X canwrites=Y" gets rejected because the hard cap 
is reached

the queues drain a bit and then you get a similar message sent...

child->parent sends MSG_LOG "selects=X canwrites=Y"
- first chunk saying here comes MSG_LOG gets accepted
- second chunk says "selects=X canwrites=Y" gets accepted because we're under 
the hard cap again

now on the parent side, it reads a chunk that says MSG_LOG, then reads another 
chunk that says MSG_LOG and misinterprets that as the data that goes with the 
first message log (which ends up being something like \x13\x00\x00\x00...), but 
then it reads a chunk with contents "selects=X canwrites=Y" and interprets that 
as the chunk containing message type information.  The message type is found in 
the first byte and that is 's', whose value is 115 and not valid.

Rejecting arbitrary chunks on the child-parent path seems like asking for 
things to get in a weird state, so the patch just now relies only on shutting 
down child-child (remote peers) connections to try and deal with overload.  In 
the test case, the memory situation looked stable, but the peers end up 
thrashing -- so again the user would probably need to intervene and put a 
higher-level solution in place (e.g. more proxies, etc), except now the signal 
for overload problems isn't a crash, but just messages in communication.log 
(maybe something better like a notice can be done).

> method to reproduce "internal error: unknown msg type 115 in Poll()"
> --------------------------------------------------------------------
>
>                 Key: BIT-1376
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1376
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>            Reporter: Jon Siwek
>
> Justin found a modification to Bro and a script that triggers the "unknown 
> msg type 115" bug.  This method seems to reproduce the problem fairly 
> reliably and between two bro processes started via command-line.
> Patch:
> {code}
> diff --git a/src/ChunkedIO.h b/src/ChunkedIO.h
> index b590453..39af9b1 100644
> --- a/src/ChunkedIO.h
> +++ b/src/ChunkedIO.h
> @@ -223,10 +223,10 @@ private:
>  
>         // We report that we're filling up when there are more than this 
> number
>         // of pending chunks.
> -       static const uint32 MAX_BUFFERED_CHUNKS_SOFT = 400000;
> +       static const uint32 MAX_BUFFERED_CHUNKS_SOFT = 40;
>  
>         // Maximum number of chunks we store in memory before rejecting 
> writes.
> -       static const uint32 MAX_BUFFERED_CHUNKS = 500000;
> +       static const uint32 MAX_BUFFERED_CHUNKS = 50;
>  
>         char* read_buffer;
>         uint32 read_len;
> {code}
> Start a bro process like this:
> {code}
> $ cat test.bro 
> @load frameworks/communication/listen
> redef Communication::nodes += {
>     ["foo"] = [$host = 127.0.0.1, $sync=T]
> };
> global counters: table[string] of count &synchronized &default=0;
> event do_some (n:count)
> {
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters["thecounter"];
>     ++counters[peer_description];
>     if(counters["thecounter"] % 10000 == 0 ) {
>         Reporter::warning(fmt("I am %s and The counter is %d. my counter is 
> %d", peer_description, counters["thecounter"], counters[peer_description]));
>     }
>     if(n != 0) {
>         schedule 1msec { do_some(n-1) };
>     } else {
>         Reporter::warning(fmt("The counter is %d", counters["thecounter"]));
>     }
> }
> event bro_init()
> {
>     schedule 1sec { do_some(1000000) };
>     schedule 2sec { do_some(1000000) };
>     schedule 3sec { do_some(1000000) };
> }
> $ bro -b ./test.bro
> {code}
> Then start another like this:
> {code}
> $ cat test.bro 
> @load base/frameworks/communication
> redef Communication::nodes += {
>     ["foo"] = [$host = 127.0.0.1, $events = /.*/, $connect=T, $sync=T,
>                $retry=5sec]
> };
> global counters: table[string] of count &synchronized &default=0;
> event check ()
>       {
>       print counters["thecounter"];
>         schedule 5sec { check() };
>       }
> event bro_init()
>       {
>         schedule 5sec { check() };
>       }
> $ bro -b ./test.bro 
> processing suspended
> processing continued
> 55069
> 58963
> 62831
> 66636
> internal error: unknown msg type 115 in Poll()
> Abort trap: 6
> {code}



--
This message was sent by Atlassian JIRA
(v6.4-OD-16-006#64014)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to