Work this week was pretty skewed, a lot of reading on some days and a lot
of coding on others.

I now maintain thread specific request handles which keep track of the
socket, bytes send, cached file and other request metadata. I completely
changed the old approach of using stages and now take over in the stage30
using using read events directly on sockets.

I also maintain a global file cache hash table based on inodes. It contains
file related data including an mmap to the whole file, a custom sized pipe
for storing initial chunks of file in kernel and other file statistics.
Currently I use pthread rwlocks to maintain consistency across threads.

I started to implement different file serving implementations, first one
was simply a sendfile implementation which had been there last week. This
week I added a basic mmap implementation which takes the mmap from the file
cache and just writes it down the socket, it worked but it was never on par
with the sendfile implementation in terms of performance. The third
implementation as an improvement over the second one where I use the linux
zero kernel apis to push data directly from mmap to the socket without
(hopefully) copying any bytes. It gifts bytes over to a kernel buffer (or
pipe) using vmsplice and then splices data directly to the socket. Normally
pipes have a max size of 16 pages (in total 64K) but I change it using the
linux specific SETPIPE_SZ flag to a bigger size which in return performs
better with large files. This implementation also maintains another  (kinda
readonly) pipe with initial file data for every file for improving
performance for small files which completely fit inside a pipe. With this
it flushes the intial file data directly from the pipe without even
touching the mmaps and for bigger files resorts to splicing from mmaps. The
third implementation (although contains some bugs when used with multiple
files which I think is rather monkey bug but I still have to debug more) is
the default implementation and resorts to 1st implementation if mmap doesnt
work.

In terms of performance currently implementation is currently on par with
the default monkey static file server (although I was hoping for more this
week) despite the fact that it maintain a global hash table for keeping
track of file caches and does a lot more syscalls compared to a single main
sendfile syscall in monkey. Here are some numbers

Without cache plugin (# ab -c500 -n10000
http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   18.938 seconds
Complete requests:      47132
Failed requests:        0
Write errors:           0
Total transferred:      1265512772 bytes
HTML transferred:       1256433587 bytes
Requests per second:    2488.75 [#/sec] (mean)
Time per request:       200.904 [ms] (mean)
Time per request:       0.402 [ms] (mean, across all concurrent requests)
Transfer rate:          65257.76 [Kbytes/sec] received

With cache plugin (# ab -c500 -n10000
http://localhost:2001/webui-aria2/index.html)

Server Software:        Monkey/1.3.0
Server Hostname:        localhost
Server Port:            2001

Document Path:          /webui-aria2/index.html
Document Length:        26510 bytes

Concurrency Level:      500
Time taken for tests:   19.710 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      1332800000 bytes
HTML transferred:       1325500000 bytes
Requests per second:    2536.79 [#/sec] (mean)
Time per request:       197.100 [ms] (mean)
Time per request:       0.394 [ms] (mean, across all concurrent requests)
Transfer rate:          66035.77 [Kbytes/sec] received

Basically without cache plugin you get 2488 req/sec and with cache you get
2536 req/sec. Not a big boost but its basically comparing the kernel
senfile implementation using file readahead cache with the plugin
performance.

I already cache small files completely in pipes which reside in kernel
memory, plan for next week is to get the entire file (given that it fits
into memory) in pipes residing in kernel and try to write them directly to
sockets (basically maintaining the custom cache directly in the kernel
which i guess cant be swapped out where as memory mappings can). I also
want to  cache http headers and also append them to the file pipes for even
lower latency and send them all in one chunk, which should reduce cpu time
even further while handling the request, should be easy inside monkey but
need to find out how it can be done inside a plugin

perm link for this post:
http://ziahamza.wordpress.com/2013/07/14/weekly-progress-5/

github link: https://github.com/ziahamza/monkey-cache

blog: http://ziahamza.wordpress.com/


regards

hamza zia
_______________________________________________
Monkey mailing list
[email protected]
http://lists.monkey-project.com/listinfo/monkey

Reply via email to