Re: Re[4]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-17 Thread Joel Reymont


On Dec 16, 2005, at 1:41 PM, Bulat Ziganshin wrote:

JR I do not have several fixed waiting periods, they are  
determined by

JR the user.

by the user of library? by the poker player? what you exactly mean?


By the user of the library. Timers are used imprecisely, to send a  
timeout event if the server did not respond in X seconds or to send  
something after Y seconds.


Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-17 Thread Tomasz Zielonka
On Fri, Dec 16, 2005 at 04:41:05PM +0300, Bulat Ziganshin wrote:
 Hello Joel,
 
 Friday, December 16, 2005, 3:22:46 AM, you wrote:
 
  TZ You don't have to check every few seconds. You can determine
  TZ exactly how much you have to sleep - just check the timeout/ 
  event with
  TZ the lowest ClockTime.
 
 JR The scenario above does account for the situation that you are  
 JR describing.
 
 to be exact - Tomasz's variant don't work proper in this situation,
 but your code (which is not use this technique) is ok

Well, what I said was just a sketch. Of course you have to somehow
handle timeout requests coming during the sleep.

Best regards
Tomasz

-- 
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-16 Thread Einar Karttunen
On 16.12 07:03, Tomasz Zielonka wrote:
 On 12/16/05, Einar Karttunen ekarttun@cs.helsinki.fi wrote:
  To matters nontrivial all the *nix variants use a different
  more efficient replacement for poll.
 
 So we should find a library that offers a unified
 interface for all of them, or implement one ourselves.
 
 I am pretty sure such a library exists. It should fall back to select()
 or poll() on platforms that don't have better alternatives.

network-alt has select(2), epoll, blocking and very experimental kqueue
(the last one is not yet committed but I can suply patches
if someone is interested.

- Einar
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-16 Thread Lennart Augustsson

John Meacham wrote:

On Thu, Dec 15, 2005 at 02:02:02PM -, Simon Marlow wrote:


With 2k connections the overhead of select() is going to start to be a
problem.  You would notice the system time going up.  -threaded may help
with this, because it calls select() less often.



we should be using /dev/poll on systems that support it.


And kqueue for systems that support that.  Much, much more efficient
than select.

-- Lennart

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-16 Thread Simon Marlow
On 16 December 2005 15:19, Lennart Augustsson wrote:

 John Meacham wrote:
 On Thu, Dec 15, 2005 at 02:02:02PM -, Simon Marlow wrote:
 
 With 2k connections the overhead of select() is going to start to
 be a problem.  You would notice the system time going up. 
 -threaded may help with this, because it calls select() less often.
 
 
 we should be using /dev/poll on systems that support it.
 
 And kqueue for systems that support that.  Much, much more efficient
 than select.

Yeah, yeah.  We know.  We just haven't got around to doing anything
about it :-(  It's actually quite fiddly to hook this up to Handles -
see Einar's implementation in Network.Alt for instance.

Cheers,
Simon (who wished he hadn't mentioned select() again)
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-16 Thread Bulat Ziganshin
Hello Simon,

Thursday, December 15, 2005, 4:53:27 PM, you wrote:

SM The 3k threads are still GC'd, but they are not actually *copied* during
SM GC.

SM It'll increase the memory overhead per thread from 2k (1k * 2 for
SM copying) to 4k (4k block, no overhead for copying).

Simon, why not to include this in the base package? either change
something so that a 1k-threads will be not copied during GC, or at
least increment default stack size? this will improve performance of
other hyper-threaded programs. memory expenses seems not so great

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[4]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-16 Thread Bulat Ziganshin
Hello Joel,

Friday, December 16, 2005, 3:22:46 AM, you wrote:

 TZ You don't have to check every few seconds. You can determine
 TZ exactly how much you have to sleep - just check the timeout/ 
 event with
 TZ the lowest ClockTime.

JR The scenario above does account for the situation that you are  
JR describing.

to be exact - Tomasz's variant don't work proper in this situation,
but your code (which is not use this technique) is ok

 i repeat my thought - if you have one or several fixed waiting periods
 (say, 1 sec, 3 sec and 1 minute), then you don't need even to sort
 requests - just use one waking thread for each waiting period and
 requests will be arrive already sorted. in this way, you can really
 sleep as Tomasz suggests

JR I do not have several fixed waiting periods, they are determined by  
JR the user.

by the user of library? by the poker player? what you exactly mean?





-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont
Well, my understanding is that once I do a takeMVar I must do a  
putMVar under any circumstances. This is why I was blocking checkTimers.


On Dec 15, 2005, at 12:08 AM, Einar Karttunen wrote:


Is there a reason you need block for checkTimers?
What you certainly want to do is ignore exceptions
from the timer actions.


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Tomasz Zielonka
On Thu, Dec 15, 2005 at 09:32:38AM +, Joel Reymont wrote:
 Well, my understanding is that once I do a takeMVar I must do a  
 putMVar under any circumstances. This is why I was blocking checkTimers.

Perhaps you could use modifyMVar:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-MVar.html#v%3AmodifyMVar

  modifyMVar_ :: MVar a - (a - IO a) - IO ()

  A safe wrapper for modifying the contents of an MVar. Like withMVar,
  modifyMVar will replace the original contents of the MVar if an
  exception is raised during the operation.

  modifyMVar :: MVar a - (a - IO (a, b)) - IO b

  A slight variation on modifyMVar_ that allows a value to be returned (b)
  in addition to the modified value of the MVar.

Best regards
Tomasz

-- 
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont


On Dec 15, 2005, at 12:08 AM, Einar Karttunen wrote:


timeout = 500 -- 1 second


Is that correct?


I think so. threadDelay takes microseconds.


Here is a nice trick for you:


Thanks!


--- The filter expression is kind of long...
stopTimer :: String - IO ()
stopTimer name =
block $ do t - takeMVar timers
   putMVar timers $
   M.filterWithKey (\(_, k) _ - k /= name) t


And slow. This is O(size_of_map)


Any way to optimize it? I need timer ids so that I can remove a timer  
before it expires. And I need ClockTime as key to so that I don't  
have to wake up every second, etc.


Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont
Here are statistics that I gathered. I'm almost done modifying the  
program to use 1 timer thread instead of 1 per bot as well as writing  
to the socket from the writer thread. This should reduce the number  
of threads from 6k (2k x 3) to 2k plus change.


It appears that +RTS -k3k does make a difference. As per Simon, 2-4k  
avoids the thread being garbage collected because each thread gets  
its own block in the storage manager. Simon, did I get that right?


BTW, how does garbage-collecting a thread works in this scenario? My  
threads are very long-running.


The total is the number of bots launched, lobby is how many bots  
connected to the lobby. Failed is mostly due to connection reset by  
peer errors. The Windows C++ server uses IOCP and running a firewall  
was apparently interfering with that somehow. I hate Windows :-(.


--- Test#1 +RTS -k3k as per Simon. Keep-alive timeout of 9 minutes.

Total:   1961, Lobby:   1961, Failed:  0
Total:   2000, Lobby:   2000, Failed:  1

This test went smoothly and got to 2k connections very quickly. Maybe  
within 30 minutes or so. I did not gather CPU usage, etc. statistics.


--- Test #2, No thread stack increase, 1 minute keep-alive timeout,  
more network traffic


With a 1 minute timeout things run veeery slow. 86 physical and 158Mb  
of VM with 1k bots, CPU 50-60%. Data sent/received is 60-70 packets  
and 6-7kb/sec. Killed after a while.


The statistics are phys/VM, CPU usage in % and #packets/transfer speed

Total:   1345, Lobby:   1326, Failed:  0, 102/184, 50%, 90/8kb
Total:   1395, Lobby:   1367, Failed:  2
Total:   1421, Lobby:   1394, Failed:  4
Total:   1490, Lobby:   1463, Failed:  4, 108/194, 50%, 110/11Kb
Total:   1574, Lobby:   1546, Failed:  4, 113/202, 50%, 116/11kb

--- Test #3, Rebuilding app with basic logging only (level 10). Stil  
veeery slow. Started ~6pm


Total:   121, Lobby:   118, Failed:  1
Total:   521, Lobby:   509, Failed:  13, 46/104, 20-30%, 35/3kb
Total:   1055, Lobby:   1044, Failed:  13, 94/168, 50%
Total:   1325, Lobby:   1313, Failed:  13
Total:   1566, Lobby:   1553, Failed:  13, 126/215, 70-80%,
Total:   1692, Lobby:   1680, Failed:  13, 136/228, 80%
Total:   1728, Lobby:   1715, Failed:  13, 140/234, 85%
Total:   1746, Lobby:   1733, Failed:  13, 140/235, 50-85%, 6:39pm
Total:   1818, Lobby:   1805, Failed:  13, 145/240, 60-85%,
Total:   1896, Lobby:   1883, Failed:  13, 153/250, 60-85%, 7:01pm
Total:   1933, Lobby:   1919, Failed:  13, 155/255, 70-85%, 7:12pm

System has 216Mb of spare physical memory at this point but the app  
seems to spend most of the time collecting garbage.


Total:   1999, Lobby:   1986, Failed:  13, 162/262, 65-86%, 7:41pm

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Timers (was Re: [Haskell-cafe] Optimizing a high-traffic network architecture)

2005-12-15 Thread Joel Reymont
After a chat with Einar on #haskell I realized that I would have,  
say, 4k expiring timers and maybe 12k timers that are started and  
then killed. That would make a 16k element map on which 3/4 of the  
operations are O(n=16k) (Einar).


I need a better abstraction I guess. I also need to be able to find  
timers by id instead of by name like now since each bot will use the  
same timer name for the same operation. I should have starTimer  
return X and then kill the timer using the same X.


I'm looking for suggestions. Here's the improved code:

---
{-# OPTIONS_GHC -fglasgow-exts -fno-cse #-}
module Timer
(
startTimer,
stopTimer
)
where

import qualified Data.Map as M
import System.Time
import System.IO.Unsafe
import Control.Exception
import Control.Concurrent

--- Map timer name and kick-off time to action
type Timers = M.Map (ClockTime, String) (IO ())

timeout :: Int
timeout = 500 -- 1 second

{-# NOINLINE timers #-}
timers :: MVar Timers
timers = unsafePerformIO $ do mv - newMVar M.empty
  forkIO $ checkTimers
  return mv

--- Not sure if this is the most efficient way to do it
startTimer :: String - Int - (IO ()) - IO ()
startTimer name delay io =
do stopTimer name
   now - getClockTime
   let plus = TimeDiff 0 0 0 0 0 delay 0
   future = addToClockTime plus now
   block $ do t - takeMVar timers
  putMVar timers $ M.insert (future, name) io t

--- The filter expression is kind of long...
stopTimer :: String - IO ()
stopTimer name =
block $ do t - takeMVar timers
   putMVar timers $
   M.filterWithKey (\(_, k) _ - k /= name) t

--- Now runs unblocked
checkTimers :: IO ()
checkTimers =
do t - readMVar timers -- takes it and puts it back
   case M.size t of
 -- no timers
 0 - threadDelay timeout
 -- some timers
 _ - do let (key@(time, _), io) = M.findMin t
 now - getClockTime
 if (time = now)
then do modifyMVar_ timers $ \a -
return $! M.delete key a
try $ io -- don't think we care
return ()
else threadDelay timeout
   checkTimers



--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Timers (was Re: [Haskell-cafe] Optimizing a high-traffic network architecture)

2005-12-15 Thread Joel Reymont
One idea would be to index the timer on ThreadId and name and stick  
Nothing into the timer action once the timer has been fired/stopped.  
Since timers are restarted with the same name quite often this would  
just keep one relatively big map in memory. The additional ThreadId  
would help distinguish the timers and avoid clashes.


On Dec 15, 2005, at 10:41 AM, Joel Reymont wrote:

After a chat with Einar on #haskell I realized that I would have,  
say, 4k expiring timers and maybe 12k timers that are started and  
then killed. That would make a 16k element map on which 3/4 of the  
operations are O(n=16k) (Einar).


I need a better abstraction I guess. I also need to be able to find  
timers by id instead of by name like now since each bot will use  
the same timer name for the same operation. I should have starTimer  
return X and then kill the timer using the same X.


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Timers (was Re: [Haskell-cafe] Optimizing a high-traffic network architecture)

2005-12-15 Thread Tomasz Zielonka
On Thu, Dec 15, 2005 at 10:46:55AM +, Joel Reymont wrote:
 One idea would be to index the timer on ThreadId and name and stick  
 Nothing into the timer action once the timer has been fired/stopped.  
 Since timers are restarted with the same name quite often this would  
 just keep one relatively big map in memory. The additional ThreadId  
 would help distinguish the timers and avoid clashes.

I don't know how you use your timers, but perhaps startTimer could
return a cancel action? It's type would be

startTimer :: Int - (IO ()) - IO (IO ())

and you would use it like this

cancel - startTimer delay action

...

cancel

How cancelling was implemented would be entirely startTimer's business.

Best regards
Tomasz

-- 
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Simon Marlow
On 15 December 2005 10:21, Joel Reymont wrote:

 Here are statistics that I gathered. I'm almost done modifying the
 program to use 1 timer thread instead of 1 per bot as well as writing
 to the socket from the writer thread. This should reduce the number
 of threads from 6k (2k x 3) to 2k plus change.
 
 It appears that +RTS -k3k does make a difference. As per Simon, 2-4k
 avoids the thread being garbage collected because each thread gets
 its own block in the storage manager. Simon, did I get that right?

The 3k threads are still GC'd, but they are not actually *copied* during
GC.

It'll increase the memory overhead per thread from 2k (1k * 2 for
copying) to 4k (4k block, no overhead for copying).

Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Simon Marlow
On 15 December 2005 10:21, Joel Reymont wrote:

 Here are statistics that I gathered. I'm almost done modifying the
 program to use 1 timer thread instead of 1 per bot as well as writing
 to the socket from the writer thread. This should reduce the number
 of threads from 6k (2k x 3) to 2k plus change.
 
 It appears that +RTS -k3k does make a difference. As per Simon, 2-4k
 avoids the thread being garbage collected because each thread gets
 its own block in the storage manager. Simon, did I get that right?
 
 BTW, how does garbage-collecting a thread works in this scenario? My
 threads are very long-running.
 
 The total is the number of bots launched, lobby is how many bots
 connected to the lobby. Failed is mostly due to connection reset by
 peer errors. The Windows C++ server uses IOCP and running a firewall
 was apparently interfering with that somehow. I hate Windows :-(.
 
 --- Test#1 +RTS -k3k as per Simon. Keep-alive timeout of 9 minutes.
 
 Total:   1961, Lobby:   1961, Failed:  0
 Total:   2000, Lobby:   2000, Failed:  1
 
 This test went smoothly and got to 2k connections very quickly. Maybe
 within 30 minutes or so. I did not gather CPU usage, etc. statistics.
 
 --- Test #2, No thread stack increase, 1 minute keep-alive timeout,
 more network traffic
 
 With a 1 minute timeout things run veeery slow. 86 physical and 158Mb
 of VM with 1k bots, CPU 50-60%. Data sent/received is 60-70 packets
 and 6-7kb/sec. Killed after a while.
 
 The statistics are phys/VM, CPU usage in % and #packets/transfer speed
 
 Total:   1345, Lobby:   1326, Failed:  0, 102/184, 50%, 90/8kb
 Total:   1395, Lobby:   1367, Failed:  2
 Total:   1421, Lobby:   1394, Failed:  4
 Total:   1490, Lobby:   1463, Failed:  4, 108/194, 50%, 110/11Kb
 Total:   1574, Lobby:   1546, Failed:  4, 113/202, 50%, 116/11kb

Hmm, your machine is spending 50% of its time doing nothing, and the
network traffic is very low.  I wouldn't expect 2k connections to pose
any problem at all, so further investigation is definitely required.

With 2k connections the overhead of select() is going to start to be a
problem.  You would notice the system time going up.  -threaded may help
with this, because it calls select() less often.

If that's not the cause, we should find out what your app is doing while
it's idle.  If there are runnable threads (eg. the lauchner), then the
app should not be spending any of its time idle.

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont


On Dec 15, 2005, at 2:02 PM, Simon Marlow wrote:

The statistics are phys/VM, CPU usage in % and #packets/transfer  
speed


Total:   1345, Lobby:   1326, Failed:  0, 102/184, 50%, 90/8kb
Total:   1395, Lobby:   1367, Failed:  2
Total:   1421, Lobby:   1394, Failed:  4
Total:   1490, Lobby:   1463, Failed:  4, 108/194, 50%, 110/11Kb
Total:   1574, Lobby:   1546, Failed:  4, 113/202, 50%, 116/11kb


Hmm, your machine is spending 50% of its time doing nothing, and the
network traffic is very low.  I wouldn't expect 2k connections to pose
any problem at all, so further investigation is definitely required.


That's CPU utilization by the program. My laptop is actually running  
a lot of other stuff as well, although the other stuff is not  
consuming much CPU.



With 2k connections the overhead of select() is going to start to be a
problem.  You would notice the system time going up.  -threaded may  
help

with this, because it calls select() less often.


I'm testing 4k connections now but I think the app is spending most  
of the time collecting garbage :-). Well, running handlers on those  
keep-alive packets as well to update internal state.


I think I would need to profile next. I would love to see a report of  
data in drag/void state but it's impossible since I'm using STM.  
Unless I can hack support for STM into profiling myself (unlikely?  
any pointers?) I think I'll have to move away from STM just to  
profile the program.


Joel


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont


On Dec 15, 2005, at 2:02 PM, Simon Marlow wrote:

Hmm, your machine is spending 50% of its time doing nothing, and the
network traffic is very low.  I wouldn't expect 2k connections to pose
any problem at all, so further investigation is definitely required.

With 2k connections the overhead of select() is going to start to be a
problem.  You would notice the system time going up.  -threaded may  
help

with this, because it calls select() less often.


I ran two more tests today after making a few changes. The end result  
is that increasing the thread stack space makes the program run  
significantly faster as it was able to launch 1,000 more bots within  
the same hour.


Looking at the end of the 2nd test, 267Mb of physical memory and  
423Mb of VM are something that I will need to really look into. 80%  
CPU utilization by the app is probably a combination of select on 4k  
sockets


The 89 failures are all connections reset by peer, probable cause is  
my wireless LAN.


I'm now using the threaded runtime. Worker threads write to the  
socket. There's one thread monitoring all the timers. Started about  
12:30pm with no thread stack increase and full (very verbose) logging.


It's running 5 OS threads pretty consistently.

Total:  399, Lobby:  398, Failed: 0, 26/81, 10-20%,
Total:  819, Lobby:  810, Failed: 0, 52/119, 20-30%
Total: 1051, Lobby: 1048, Failed: 0, 63/136, 30-50%
Total: 1229, Lobby: 1219, Failed: 0, 74/153, 30-50%
Total: 1318, Lobby: 1299, Failed: 0, 76/157, 30-50%
Total: 1448, Lobby: 1433, Failed: 0, 82/167, 40-60%, 13:06
Total: 1544, Lobby: 1526, Failed: 0, 86/174, 50-60%, 13:13
Total: 1672, Lobby: 1648, Failed: 0, 90/182, 50-60%, 13:23
Total: 1754, Lobby: 1727, Failed: 0, 91/186, 40-60%, 13:31
Total: 1824, Lobby: 1796, Failed: 0, 93/189, 50-60%, 13:40

With reduced logging and +RTS -k3k. Started at 13:42, 4 OS threads.

Total:  367, Lobby:  363, Failed: 0,  24/76, 10%, 13:49
Total:  516, Lobby:  510, Failed: 14, 34/91, 10-20%, 13:52
Total:  841, Lobby:  836, Failed: 17, 49/116, 20% , 13:56
Total: 1450, Lobby: 1434, Failed: 34, 97/181, 20-50-80%, 14:08
Total: 2008, Lobby: 1999, Failed: 35, 133/234, 70-80%, 14:20
Total: 2318, Lobby: 2308, Failed: 35, 154/263, 70-85%, 14:29
Total: 2623, Lobby: 2613, Failed: 35, 174/293, 70-80%, 14:39
Total: 2862, Lobby: 2854, Failed: 35, 191/316, 70-80%, 14:47
Total: 3151, Lobby: 3142, Failed: 40, 214/347, 60-80%, 14:56
Total: 3364, Lobby: 3355, Failed: 40, 219/359, 60-80%, 15:03
Total: 3808, Lobby: 3744, Failed: 89, 247/398, 70-85%, 15:19
Total: 4000, Lobby: 3938, Failed: 89, 267/423, 80%, 15:27

The system has 120+Mb of free physical memory around 3pm but is not  
swapping heavily as the number of page outs is not increasing.  
There's a total of 1Gb of physical memory. 4 OS threads became 5 at  
some point.


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Bulat Ziganshin
Hello Joel,

Thursday, December 15, 2005, 5:13:17 PM, you wrote:
 The statistics are phys/VM, CPU usage in % and #packets/transfer
 speed

 Total:   1345, Lobby:   1326, Failed:  0, 102/184, 50%, 90/8kb
 Total:   1395, Lobby:   1367, Failed:  2
 Total:   1421, Lobby:   1394, Failed:  4
 Total:   1490, Lobby:   1463, Failed:  4, 108/194, 50%, 110/11Kb
 Total:   1574, Lobby:   1546, Failed:  4, 113/202, 50%, 116/11kb

 Hmm, your machine is spending 50% of its time doing nothing, and the
 network traffic is very low.  I wouldn't expect 2k connections to pose
 any problem at all, so further investigation is definitely required.

JR That's CPU utilization by the program. My laptop is actually running  
JR a lot of other stuff as well, although the other stuff is not  
JR consuming much CPU.

if your program has something to do, but cpu usage is less that 100%,
this means (at least in windows), that your program is just works in
some system calls, which waits for hardware. for example, read from
disk. your program may wait for network i/o, logging i/o. try to
disable using these code parts and see how cpu utilization will
change



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Bulat Ziganshin
Hello Tomasz,

Wednesday, December 14, 2005, 10:48:43 PM, you wrote:

TZ You don't have to check every few seconds. You can determine
TZ exactly how much you have to sleep - just check the timeout/event with
TZ the lowest ClockTime.

this scenario don't count that we can receive new request while
sleeping and if this thread services different waiting periods, the
new message may require more earlier answer

TZ On Wed, Dec 14, 2005 at 07:11:15PM +, Joel Reymont wrote:
 I figure I can have a single timer thread and a timer map keyed on  
 ClockTime. I would try to get the min. key from the map every few  
 seconds, compare it to clock time, fire of the event as needed,  
 remove the timer and repeat.

i repeat my thought - if you have one or several fixed waiting periods
(say, 1 sec, 3 sec and 1 minute), then you don't need even to sort
requests - just use one waking thread for each waiting period and
requests will be arrive already sorted. in this way, you can really
sleep as Tomasz suggests

Wednesday, December 14, 2005, 11:04:38 PM, you wrote:
JR Right, thanks for the tip! I would need to way a predefined amount of
JR time when the map is empty, though.

no. you just read next message from the Chan (but don't use MVar here!
;)


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re[2]: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Joel Reymont

Bulat,

On Dec 14, 2005, at 9:00 PM, Bulat Ziganshin wrote:


TZ You don't have to check every few seconds. You can determine
TZ exactly how much you have to sleep - just check the timeout/ 
event with

TZ the lowest ClockTime.

this scenario don't count that we can receive new request while
sleeping and if this thread services different waiting periods, the
new message may require more earlier answer



The scenario above does account for the situation that you are  
describing. We will always retrieve the minimum key and will fire the  
timer as long as it has expired. My timers don't need to be precise  
so this works for me.


checkTimers :: IO ()
checkTimers =
do t - readMVar timers -- takes it and puts it back
   case M.size t of
 -- no timers
 0 - threadDelay timeout
 -- some timers
 _ - do let (key@(Timer time _), io) = M.findMin t
 TOD now _ - getClockTime
 if (time = now)
then do stopTimer key
try $ io -- don't think we care
return ()
else threadDelay timeout
   checkTimers


i repeat my thought - if you have one or several fixed waiting periods
(say, 1 sec, 3 sec and 1 minute), then you don't need even to sort
requests - just use one waking thread for each waiting period and
requests will be arrive already sorted. in this way, you can really
sleep as Tomasz suggests


I do not have several fixed waiting periods, they are determined by  
the user.


Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread John Meacham
On Thu, Dec 15, 2005 at 02:02:02PM -, Simon Marlow wrote:
 With 2k connections the overhead of select() is going to start to be a
 problem.  You would notice the system time going up.  -threaded may help
 with this, because it calls select() less often.

we should be using /dev/poll on systems that support it. it cuts down on
the overhead a whole lot. 'poll(2)' is also mostly portable and usually
better than select since there is no arbitrary file descriptor limit and
it doesn't have to traverse the whole bitset. a few #ifdefs should let
us choose the optimum one available on any given system.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈ 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Einar Karttunen
On 15.12 17:14, John Meacham wrote:
 On Thu, Dec 15, 2005 at 02:02:02PM -, Simon Marlow wrote:
  With 2k connections the overhead of select() is going to start to be a
  problem.  You would notice the system time going up.  -threaded may help
  with this, because it calls select() less often.
 
 we should be using /dev/poll on systems that support it. it cuts down on
 the overhead a whole lot. 'poll(2)' is also mostly portable and usually
 better than select since there is no arbitrary file descriptor limit and
 it doesn't have to traverse the whole bitset. a few #ifdefs should let
 us choose the optimum one available on any given system.

To matters nontrivial all the *nix variants use a different
more efficient replacement for poll.

Solaris has /dev/poll
*BSD (and OS X) has kqueue
Linux has epoll

Also on linux NPTL+blocking calls can actually be very fast
with a suitable scenario. An additional problem is that
these mechanisms depend on the version of the kernel
running on the machine... Thus e.g. not all linux machines
will have epoll.

- Einar Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Ketil Malde
Einar Karttunen ekarttun@cs.helsinki.fi writes:

 To matters nontrivial all the *nix variants use a different
 more efficient replacement for poll.

 Solaris has /dev/poll
 *BSD (and OS X) has kqueue
 Linux has epoll

Since this is 'cafe, here's a page has some performance testing of
epoll: 

   http://lse.sourceforge.net/epoll/

 Thus e.g. not all linux machines will have epoll.

It is present in 2.6, but not 2.4?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Tomasz Zielonka
On 12/16/05, Einar Karttunen ekarttun@cs.helsinki.fi wrote:
 To matters nontrivial all the *nix variants use a different
 more efficient replacement for poll.

So we should find a library that offers a unified
interface for all of them, or implement one ourselves.

I am pretty sure such a library exists. It should fall back to select()
or poll() on platforms that don't have better alternatives.

Best regards
Tomasz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-15 Thread Andrew Pimlott
On Fri, Dec 16, 2005 at 07:03:46AM +0100, Tomasz Zielonka wrote:
 On 12/16/05, Einar Karttunen ekarttun@cs.helsinki.fi wrote:
  To matters nontrivial all the *nix variants use a different
  more efficient replacement for poll.
 
 So we should find a library that offers a unified
 interface for all of them, or implement one ourselves.

http://monkey.org/~provos/libevent/

See also

http://www.kegel.com/c10k.html

Andrew
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Joel Reymont

Folks,

In my current architecture I launch a two threads per socket where  
the socket reader places results in a TMVar and the socket writer  
takes input from a TChan. I also have the worker thread the does the  
bulk of packet processing and a timer thread. The time thread sleeps  
for a few minutes and exits after posting a timeout event if it  
hasn't been killed before.


My goal is to launch poker 2,000 bots that join the server lobby  
and sit there sending small keep-alive packets every few minutes. The  
ultimate goal is for 4,000 bots to be playing but I'm taking it one  
step at a time.


This is Mac OSX Tiger with a couple of header files modified to allow  
FD_SETSIZE of 10240. This is the maximum allowed by 'ulimit -n'. I'm  
also running ghc 6.4.1, compiled after FD_SETSIZE has been increased.


I can get to 2k bots without any trouble if I use a keep-alive  
timeout of 9 minutes. Memory usage with 2k bots is 161Mb of physical  
memory and 262Mb VM. CPU usage 20-40%. Memory usage is constant once  
all bots have been launched.


With a 1 minute keep-alive timeout system is starting to get stressed  
almost right away. There's verbose logging going on and almost every  
event/packet sent and received is traced. The extra logging of the  
timeout events probably adds to the stress and so, I assume, do the  
extra packets. New bots are being launched very slowly even with just  
200 bots already running.


Based on the above, would you have any suggestions for an improved  
architecture?


I will try 1) disabling logging alltogether and 2) increase thread  
stack size to 3k (+RTS -k3k) as per Simon Marlow's suggestion. As per  
simon if a thread stack space is between 2k and 4k then each thread  
gets its own memory block (right Simon?) and threads are not GCd then.


I'm a bit concerned about trippling my memory use with -k3k, though.  
I'm not sure if switching to a continuations-based framework will  
help me. Has anyone tried this?


Thanks, Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Bulat Ziganshin
Hello Joel,

Wednesday, December 14, 2005, 7:55:36 PM, you wrote:

JR In my current architecture I launch a two threads per socket where
JR the socket reader places results in a TMVar and the socket writer  
JR takes input from a TChan.

as i already said, you can write to socket directly in your worker
thread

JR I also have the worker thread the does the  
JR bulk of packet processing and a timer thread. The time thread sleeps  
JR for a few minutes and exits after posting a timeout event if it  
JR hasn't been killed before.

you can use just one timeouts thread for all your bots. if this
timeout is constant across program run, then this thread will be very
simple - just: 

1) read from Chan (yes, it is the case where using of Chan wll be appropriate! 
;)
2) wait until 9 or so minutes from the time when this message was sent
3) send kill signal to the thread mentioned in message

so, you will had only 2 threads. you can then try to play with
conbinating socket reading and TMVar reading in one thread (btw, try
to replace TMVar with MVar - may be, it will be better?). or, you can
try to create one sockets reading thread, which will service all sockets.
may be, this can be somewhat done with help of select() system call?
it is a more right way, but i don't know how this can be
accomplished

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Joel Reymont


On Dec 14, 2005, at 6:06 PM, Bulat Ziganshin wrote:


as i already said, you can write to socket directly in your worker
thread


True. 1 less thread to deal with... multiplied by 4,000.


you can use just one timeouts thread for all your bots. if this
timeout is constant across program run, then this thread will be very
simple - just:


Well, the bots may take a couple of hours to get on board. I don't  
think using one thread with a constant timeout is appropriate. This  
is also a keep-alive timeout, meaning that the bot sends a ping to  
server whenever the timer is fired.


I figure I can have a single timer thread and a timer map keyed on  
ClockTime. I would try to get the min. key from the map every few  
seconds, compare it to clock time, fire of the event as needed,  
remove the timer and repeat. This way I will have a single timer  
thread but as many timers as I need.


Thanks, Joel

--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Tomasz Zielonka
On Wed, Dec 14, 2005 at 07:11:15PM +, Joel Reymont wrote:
 I figure I can have a single timer thread and a timer map keyed on  
 ClockTime. I would try to get the min. key from the map every few  
 seconds, compare it to clock time, fire of the event as needed,  
 remove the timer and repeat.

You don't have to check every few seconds. You can determine
exactly how much you have to sleep - just check the timeout/event with
the lowest ClockTime.

Best regards
Tomasz

-- 
I am searching for a programmer who is good at least in some of
[Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Joel Reymont


On Dec 14, 2005, at 7:48 PM, Tomasz Zielonka wrote:


You don't have to check every few seconds. You can determine
exactly how much you have to sleep - just check the timeout/event with
the lowest ClockTime.


Right, thanks for the tip! I would need to way a predefined amount of  
time when the map is empty, though.


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Joel Reymont


On Dec 14, 2005, at 7:48 PM, Tomasz Zielonka wrote:


You don't have to check every few seconds. You can determine
exactly how much you have to sleep - just check the timeout/event with
the lowest ClockTime.


Something like this? Comments are welcome!

It would be cool to not have to export and call initTimers somehow.

---
{-# OPTIONS_GHC -fglasgow-exts -fno-cse #-}
module Timer
(
initTimers,
startTimer,
stopTimer
)
where

import qualified Data.Map as M
import System.Time
import System.IO.Unsafe
import Control.Exception
import Control.Concurrent

--- Map timer name and kick-off time to action
type Timers = M.Map (ClockTime, String) (IO ())

timeout :: Int
timeout = 500 -- 1 second

{-# NOINLINE timers #-}
timers :: MVar Timers
timers = unsafePerformIO $ newMVar M.empty

--- Call this first
initTimers :: IO ()
initTimers =
do forkIO $ block checkTimers
   return ()

--- Not sure if this is the most efficient way to do it
startTimer :: String - Int - (IO ()) - IO ()
startTimer name delay io =
do stopTimer name
   now - getClockTime
   let plus = TimeDiff 0 0 0 0 0 delay 0
   future = addToClockTime plus now
   block $ do t - takeMVar timers
  putMVar timers $ M.insert (future, name) io t

--- The filter expression is kind of long...
stopTimer :: String - IO ()
stopTimer name =
block $ do t - takeMVar timers
   putMVar timers $
   M.filterWithKey (\(_, k) _ - k /= name) t

--- Tried to take care of exceptions here
--- but the code looks kind of ugly
checkTimers :: IO ()
checkTimers =
do t - takeMVar timers
   case M.size t of
 -- no timers
 0 - do putMVar timers t
 unblock $ threadDelay timeout
 -- some timers
 n - do let (key@(time, name), io) = M.findMin t
 now - getClockTime
 if (time = now)
then do putMVar timers $ M.delete key t
unblock io
else do putMVar timers t
unblock $ threadDelay timeout
   checkTimers


--
http://wagerlabs.com/





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Einar Karttunen
On 14.12 23:07, Joel Reymont wrote:
 Something like this? Comments are welcome!

 timeout :: Int
 timeout = 500 -- 1 second

Is that correct?

 {-# NOINLINE timers #-}
 timers :: MVar Timers
 timers = unsafePerformIO $ newMVar M.empty
 
 --- Call this first
 initTimers :: IO ()
 initTimers =
 do forkIO $ block checkTimers
return ()

Here is a nice trick for you:

{-# NOINLINE timers #-}
timers :: MVar Timers
timers = unsafePerformIO $ do mv - newMVar M.empty
  forkIO $ block checkTimers
  return mv


initTimers goes thus away.

 --- Not sure if this is the most efficient way to do it
 startTimer :: String - Int - (IO ()) - IO ()
 startTimer name delay io =
 do stopTimer name
now - getClockTime
let plus = TimeDiff 0 0 0 0 0 delay 0
future = addToClockTime plus now
block $ do t - takeMVar timers
   putMVar timers $ M.insert (future, name) io t

I had code which used a global IORef containing
the current time. It was updated once by a second
by a dedicated thread, but reading it was practically
free. Depends how common getClockTime calls are.

 --- The filter expression is kind of long...
 stopTimer :: String - IO ()
 stopTimer name =
 block $ do t - takeMVar timers
putMVar timers $
M.filterWithKey (\(_, k) _ - k /= name) t

And slow. This is O(size_of_map)

 --- Tried to take care of exceptions here
 --- but the code looks kind of ugly

Is there a reason you need block for checkTimers?
What you certainly want to do is ignore exceptions
from the timer actions.


- Einar Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Optimizing a high-traffic network architecture

2005-12-14 Thread Bulat Ziganshin
Hello Joel,

Wednesday, December 14, 2005, 7:55:36 PM, you wrote:

JR With a 1 minute keep-alive timeout system is starting to get stressed
JR almost right away. There's verbose logging going on and almost every  
JR event/packet sent and received is traced. The extra logging of the  
JR timeout events probably adds to the stress and so, I assume, do the  
JR extra packets.

oh, yes, i forget to say that you can speed up logging bu using large
buffer on logger hadnle, say use:

hSetBuffering logger (BlockBuffering (Just 4096))

and of course avoid logging to the screen


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe