Here is a snapshot of 6213 collapsing:

outputBytes/hour

Time    Events  Value
2:00    2       3572
3:00    3960    875207
4:00    10879   6104331
5:00    926     265150
6:00    9814    3087231
7:00    9818    4884084
8:00    8088    2428783
9:00    9315    2787015
10:00   7463    1606426
11:00   10363   4189291
12:00   8813    2653230
1:00    8851    2771283
2:00    10739   3918438
3:00    10582   4374534
4:00    10564   3641628
5:00    6121    1278197
6:00    629     129938
7:00    14      2431

localQueryTraffic/hour

Time    Tries   Succ    Ratio
3:00    283     249     0.8798586572438163
4:00    428     420     0.9813084112149533
5:00    1       1       1.0
6:00    481     456     0.9480249480249481
7:00    634     607     0.9574132492113565
8:00    607     594     0.9785831960461285
9:00    639     616     0.9640062597809077
10:00   752     724     0.9627659574468085
11:00   883     855     0.9682899207248018
12:00   856     826     0.9649532710280374
1:00    1045    1022    0.9779904306220095
2:00    1347    1280    0.9502598366740905
3:00    1427    1152    0.8072880168185004
4:00    794     491     0.6183879093198993
5:00    1060    193     0.1820754716981132
6:00    225     0       0.0
7:00    54      0       0.0

Now, when we go find out why it died (from env):

Class                                                           Threads used
Checkpoint: Connection opener                                   52
freenet.interfaces.LocalNIOInterface$ConnectionShell            2
freenet.interfaces.PublicNIOInterface$ConnectionShell           5
freenet.node.states.data.DataStateInitiator                     1
freenet.node.states.data.TrailerWriteCallbackMessage:true:true  1

:-( I don't have the memory to burn on 1000s of threads (unless Y is
significantly better than Q).  And the effect it has (from general):

Pooled threads running jobs             60 (133.3%)
Reason for refusing connections: activeThreads(60) >= maximumThreads (45)


:-(  By running my node out of all threads, it just shuts down.

I get the feeling that chewing up threads for things that block
indefinitely is a bad idea.  Connections are either not timing out, or
we are trying to contact a class that cannot be contacted (firewalls,
NATs), or we are timing out too slowly, or...  Using threads to serve
content from my store is more important than opening a random
connection.

The cute thing is my node looks like it is trying to announce to get
things moving again, but, at 133% load, that just isn't gonna happen.
Looked again just now, now we are up to 144% load.

I'm sure it can recover, but, better would be to handle the situation
better.  Leave 2 threads to serve content from the local datastore,
and accept those Qs that can be served out of the local store, then at
least we can still saturate the upstream serving content when this
sort of thing happens.  Consider killing stalled opens.  Figure out if
there is a class that just cannot be opened, and find ways to avoid
trying to open them.  And last, NIOize opens.

Another strategy would be to figure out how much stack you need for a
thread and trim them down to not chew up the memory as fast, that
might allow me to allocate more threads.
_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to