Work this week was pretty skewed, a lot of reading on some days and a lot of coding on others.
I now maintain thread specific request handles which keep track of the socket, bytes send, cached file and other request metadata. I completely changed the old approach of using stages and now take over in the stage30 using using read events directly on sockets. I also maintain a global file cache hash table based on inodes. It contains file related data including an mmap to the whole file, a custom sized pipe for storing initial chunks of file in kernel and other file statistics. Currently I use pthread rwlocks to maintain consistency across threads. I started to implement different file serving implementations, first one was simply a sendfile implementation which had been there last week. This week I added a basic mmap implementation which takes the mmap from the file cache and just writes it down the socket, it worked but it was never on par with the sendfile implementation in terms of performance. The third implementation as an improvement over the second one where I use the linux zero kernel apis to push data directly from mmap to the socket without (hopefully) copying any bytes. It gifts bytes over to a kernel buffer (or pipe) using vmsplice and then splices data directly to the socket. Normally pipes have a max size of 16 pages (in total 64K) but I change it using the linux specific SETPIPE_SZ flag to a bigger size which in return performs better with large files. This implementation also maintains another (kinda readonly) pipe with initial file data for every file for improving performance for small files which completely fit inside a pipe. With this it flushes the intial file data directly from the pipe without even touching the mmaps and for bigger files resorts to splicing from mmaps. The third implementation (although contains some bugs when used with multiple files which I think is rather monkey bug but I still have to debug more) is the default implementation and resorts to 1st implementation if mmap doesnt work. In terms of performance currently implementation is currently on par with the default monkey static file server (although I was hoping for more this week) despite the fact that it maintain a global hash table for keeping track of file caches and does a lot more syscalls compared to a single main sendfile syscall in monkey. Here are some numbers Without cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html) Server Software: Monkey/1.3.0 Server Hostname: localhost Server Port: 2001 Document Path: /webui-aria2/index.html Document Length: 26510 bytes Concurrency Level: 500 Time taken for tests: 18.938 seconds Complete requests: 47132 Failed requests: 0 Write errors: 0 Total transferred: 1265512772 bytes HTML transferred: 1256433587 bytes Requests per second: 2488.75 [#/sec] (mean) Time per request: 200.904 [ms] (mean) Time per request: 0.402 [ms] (mean, across all concurrent requests) Transfer rate: 65257.76 [Kbytes/sec] received With cache plugin (# ab -c500 -n10000 http://localhost:2001/webui-aria2/index.html) Server Software: Monkey/1.3.0 Server Hostname: localhost Server Port: 2001 Document Path: /webui-aria2/index.html Document Length: 26510 bytes Concurrency Level: 500 Time taken for tests: 19.710 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 1332800000 bytes HTML transferred: 1325500000 bytes Requests per second: 2536.79 [#/sec] (mean) Time per request: 197.100 [ms] (mean) Time per request: 0.394 [ms] (mean, across all concurrent requests) Transfer rate: 66035.77 [Kbytes/sec] received Basically without cache plugin you get 2488 req/sec and with cache you get 2536 req/sec. Not a big boost but its basically comparing the kernel senfile implementation using file readahead cache with the plugin performance. I already cache small files completely in pipes which reside in kernel memory, plan for next week is to get the entire file (given that it fits into memory) in pipes residing in kernel and try to write them directly to sockets (basically maintaining the custom cache directly in the kernel which i guess cant be swapped out where as memory mappings can). I also want to cache http headers and also append them to the file pipes for even lower latency and send them all in one chunk, which should reduce cpu time even further while handling the request, should be easy inside monkey but need to find out how it can be done inside a plugin perm link for this post: http://ziahamza.wordpress.com/2013/07/14/weekly-progress-5/ github link: https://github.com/ziahamza/monkey-cache blog: http://ziahamza.wordpress.com/ regards hamza zia
_______________________________________________ Monkey mailing list [email protected] http://lists.monkey-project.com/listinfo/monkey
