[ 
https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490776#comment-13490776
 ] 

Suraj Menon commented on HAMA-559:
----------------------------------

Yes, I think the spilling queue should take the value as the size of memory 
required. Internally, we can initialize the spillingBuffer with multiple 
buffers  summing upto the total memory specified. 

I am going to make the following changes:
- I am simplifying the implementation of get*Index functions in SpillIndexStatus
- Using cachedThreadPools.
- Moving my inner classes to separate re-usable classes.
- I might fall back to the old way of completely filling the buffer before we 
proceed depending on the numbers

I am doing the above because I think we can extend the current implementation 
to handle our asynchronous handling at sender side. 
The sender side queues have scalability requirements (send batch RPC's) and 
fault tolerance requirements(write messages to file). 

We have the following scenarios on sender side:

Synchronous message transfer  with/without message persistence : Could be 
handled by current spilling queue by a read thread after writing is complete.
Asynchronous message transfer without message persistence: A spilling queue, 
where the read thread (to send RPCs) is started asynchronously before the write 
thread is complete. Here we are not writing to a file but to an RPC socket.
Asynchronous message with message persistence. : Here, the queue should be 
spilling the messages to disk. An asynchronous read thread would be sending 
data in batch for RPC's.

The last scenario could be implemented by having an extra bytebuffer in the 
current spilling queue that asynchronously reads from the memory buffer or 
spilled file to send bytes in batch for RPC. I am looking into how and where 
could we implement Combining in this. In our bsp code, let's avoid creating new 
objects before sending.







                
> Add a spilling message queue
> ----------------------------
>
>                 Key: HAMA-559
>                 URL: https://issues.apache.org/jira/browse/HAMA-559
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Suraj Menon
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HAMA-559.patch-v1, spillbench_code.tar.gz, 
> spilling_buffer_cpu_usage_text_write.png, 
> SpillingBufferProfile-2012-10-27.snapshot, 
> spilling_buffer_profile_cpu_graph_test_write.png, 
> spilling_buffer_profile_cpugraph_writeUTF.png, 
> spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG, 
> spilling_buffer_profile_timesplit_text_write.png, 
> spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the 
> messages in RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get 
> faster. However we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the 
> write call) could be modified, but the changes should not be reflected in the 
> messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading 
> and writing. This feature is needed when we implement asynchronous 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to