[ 
https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489738#comment-13489738
 ] 

Thomas Jungblut commented on HAMA-559:
--------------------------------------

Hey Suraj,

managed to further improve. I have written my own OutputStream which enables 
asynchronous flushing to disk.

Have a look here:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/datastructure/AsyncBufferedOutputStream.java

Looks a bit simpler than your solution and is also faster ;)

{noformat}
 0% Scenario{vm=java, trial=0, benchmark=Spill, size=1000000, 
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 22447456,65 ns; σ=2294787,10 ns @ 10 
trials
17% Scenario{vm=java, trial=0, benchmark=Spill, size=10000000, 
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 208675630,29 ns; σ=129832,20 ns @ 3 
trials
33% Scenario{vm=java, trial=0, benchmark=Spill, size=100000000, 
type=SPILLING_BUFFER, memoryMax=-Xmx8g} 2179034450,50 ns; σ=159017861,44 ns @ 
10 trials
50% Scenario{vm=java, trial=0, benchmark=Spill, size=1000000, type=DISK_LIST, 
memoryMax=-Xmx8g} 17597941,21 ns; σ=156327,75 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=Spill, size=10000000, type=DISK_LIST, 
memoryMax=-Xmx8g} 174984110,83 ns; σ=5126997,66 ns @ 10 trials
83% Scenario{vm=java, trial=0, benchmark=Spill, size=100000000, type=DISK_LIST, 
memoryMax=-Xmx8g} 1731678008,00 ns; σ=5150036,16 ns @ 3 trials

     size            type     ms linear runtime
  1000000 SPILLING_BUFFER   22,4 =
  1000000       DISK_LIST   17,6 =
 10000000 SPILLING_BUFFER  208,7 ==
 10000000       DISK_LIST  175,0 ==
100000000 SPILLING_BUFFER 2179,0 ==============================
100000000       DISK_LIST 1731,7 =======================

vm: java
trial: 0
benchmark: Spill
memoryMax: -Xmx8g

Note: benchmarks printed 11516 characters to System.out and 0 characters to 
System.err. Use --debug to see this output.

{noformat}

So the disklist can be updated by using the stream like this:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/datastructure/DiskList.java
 
                
> Add a spilling message queue
> ----------------------------
>
>                 Key: HAMA-559
>                 URL: https://issues.apache.org/jira/browse/HAMA-559
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Suraj Menon
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HAMA-559.patch-v1, spillbench_code.tar.gz, 
> spilling_buffer_cpu_usage_text_write.png, 
> SpillingBufferProfile-2012-10-27.snapshot, 
> spilling_buffer_profile_cpu_graph_test_write.png, 
> spilling_buffer_profile_cpugraph_writeUTF.png, 
> spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG, 
> spilling_buffer_profile_timesplit_text_write.png, 
> spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the 
> messages in RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get 
> faster. However we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the 
> write call) could be modified, but the changes should not be reflected in the 
> messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading 
> and writing. This feature is needed when we implement asynchronous 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to