[
https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490776#comment-13490776
]
Suraj Menon commented on HAMA-559:
----------------------------------
Yes, I think the spilling queue should take the value as the size of memory
required. Internally, we can initialize the spillingBuffer with multiple
buffers summing upto the total memory specified.
I am going to make the following changes:
- I am simplifying the implementation of get*Index functions in SpillIndexStatus
- Using cachedThreadPools.
- Moving my inner classes to separate re-usable classes.
- I might fall back to the old way of completely filling the buffer before we
proceed depending on the numbers
I am doing the above because I think we can extend the current implementation
to handle our asynchronous handling at sender side.
The sender side queues have scalability requirements (send batch RPC's) and
fault tolerance requirements(write messages to file).
We have the following scenarios on sender side:
Synchronous message transfer with/without message persistence : Could be
handled by current spilling queue by a read thread after writing is complete.
Asynchronous message transfer without message persistence: A spilling queue,
where the read thread (to send RPCs) is started asynchronously before the write
thread is complete. Here we are not writing to a file but to an RPC socket.
Asynchronous message with message persistence. : Here, the queue should be
spilling the messages to disk. An asynchronous read thread would be sending
data in batch for RPC's.
The last scenario could be implemented by having an extra bytebuffer in the
current spilling queue that asynchronously reads from the memory buffer or
spilled file to send bytes in batch for RPC. I am looking into how and where
could we implement Combining in this. In our bsp code, let's avoid creating new
objects before sending.
> Add a spilling message queue
> ----------------------------
>
> Key: HAMA-559
> URL: https://issues.apache.org/jira/browse/HAMA-559
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core
> Affects Versions: 0.5.0
> Reporter: Thomas Jungblut
> Assignee: Suraj Menon
> Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HAMA-559.patch-v1, spillbench_code.tar.gz,
> spilling_buffer_cpu_usage_text_write.png,
> SpillingBufferProfile-2012-10-27.snapshot,
> spilling_buffer_profile_cpu_graph_test_write.png,
> spilling_buffer_profile_cpugraph_writeUTF.png,
> spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG,
> spilling_buffer_profile_timesplit_text_write.png,
> spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the
> messages in RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get
> faster. However we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the
> write call) could be modified, but the changes should not be reflected in the
> messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading
> and writing. This feature is needed when we implement asynchronous
> communication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira