[
https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489738#comment-13489738
]
Thomas Jungblut commented on HAMA-559:
--------------------------------------
Hey Suraj,
managed to further improve. I have written my own OutputStream which enables
asynchronous flushing to disk.
Have a look here:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/datastructure/AsyncBufferedOutputStream.java
Looks a bit simpler than your solution and is also faster ;)
{noformat}
0% Scenario{vm=java, trial=0, benchmark=Spill, size=1000000,
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 22447456,65 ns; σ=2294787,10 ns @ 10
trials
17% Scenario{vm=java, trial=0, benchmark=Spill, size=10000000,
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 208675630,29 ns; σ=129832,20 ns @ 3
trials
33% Scenario{vm=java, trial=0, benchmark=Spill, size=100000000,
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 2179034450,50 ns; σ=159017861,44 ns @
10 trials
50% Scenario{vm=java, trial=0, benchmark=Spill, size=1000000, type=DISK_LIST,
memoryMax=-Xmx8g} 17597941,21 ns; σ=156327,75 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=Spill, size=10000000, type=DISK_LIST,
memoryMax=-Xmx8g} 174984110,83 ns; σ=5126997,66 ns @ 10 trials
83% Scenario{vm=java, trial=0, benchmark=Spill, size=100000000, type=DISK_LIST,
memoryMax=-Xmx8g} 1731678008,00 ns; σ=5150036,16 ns @ 3 trials
size type ms linear runtime
1000000 SPILLING_BUFFER 22,4 =
1000000 DISK_LIST 17,6 =
10000000 SPILLING_BUFFER 208,7 ==
10000000 DISK_LIST 175,0 ==
100000000 SPILLING_BUFFER 2179,0 ==============================
100000000 DISK_LIST 1731,7 =======================
vm: java
trial: 0
benchmark: Spill
memoryMax: -Xmx8g
Note: benchmarks printed 11516 characters to System.out and 0 characters to
System.err. Use --debug to see this output.
{noformat}
So the disklist can be updated by using the stream like this:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/datastructure/DiskList.java
> Add a spilling message queue
> ----------------------------
>
> Key: HAMA-559
> URL: https://issues.apache.org/jira/browse/HAMA-559
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Suraj Menon
> Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-559.patch-v1, spillbench_code.tar.gz,
> spilling_buffer_cpu_usage_text_write.png,
> SpillingBufferProfile-2012-10-27.snapshot,
> spilling_buffer_profile_cpu_graph_test_write.png,
> spilling_buffer_profile_cpugraph_writeUTF.png,
> spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG,
> spilling_buffer_profile_timesplit_text_write.png,
> spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the
> messages in RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get
> faster. However we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the
> write call) could be modified, but the changes should not be reflected in the
> messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading
> and writing. This feature is needed when we implement asynchronous
> communication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira