[
https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486833#comment-13486833
]
Suraj Menon commented on HAMA-559:
----------------------------------
Hi, that was a nice catch. I found out that I am doing an extra buffer copy
than needed. I see that spilling buffer is giving better performance but only
sometimes. Currently, investigating why so. I implemented a synchronous disk
queue without a spilling thread. Here are the performance numbers for now. I
added a case for 10 million integers in your benchmark code. I am putting the
numbers for both scenarios. I am trying to find out what is changing the
numbers so drastically on every benchmark run and this is not for just spilling
buffer.
{noformat}
size type us linear runtime
1000000 DISK_LIST 221546.22 ==============
1000000 SPILLING_BUFFER 118403.87 =======
1000000 DISK_BUFFER 40151.49 ==
10000000 DISK_LIST 473334.31 ==============================
10000000 SPILLING_BUFFER 360539.53 ======================
10000000 DISK_BUFFER 389689.06 ========================
vm: java
trial: 0
benchmark: Spill
{noformat}
The one with bad performance:
{noformat}
size type us linear runtime
1000000 DISK_LIST 38550.9 =
1000000 SPILLING_BUFFER 140961.3 ===
1000000 DISK_BUFFER 44809.6 =
10000000 DISK_LIST 340909.8 =========
10000000 SPILLING_BUFFER 1116445.2 ==============================
10000000 DISK_BUFFER 374593.0 ==========
{noformat}
> Add a spilling message queue
> ----------------------------
>
> Key: HAMA-559
> URL: https://issues.apache.org/jira/browse/HAMA-559
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Suraj Menon
> Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-559.patch-v1,
> spilling_buffer_cpu_usage_text_write.png,
> SpillingBufferProfile-2012-10-27.snapshot,
> spilling_buffer_profile_cpu_graph_test_write.png,
> spilling_buffer_profile_cpugraph_writeUTF.png,
> spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG,
> spilling_buffer_profile_timesplit_text_write.png,
> spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the
> messages in RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get
> faster. However we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the
> write call) could be modified, but the changes should not be reflected in the
> messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading
> and writing. This feature is needed when we implement asynchronous
> communication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira