[
https://issues.apache.org/jira/browse/HAMA-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596297#comment-13596297
]
Suraj Menon commented on HAMA-742:
----------------------------------
First of all > "Also we should stop sending classnames arround, I'm not sure if
we should restrict the usage of a single class for messaging or not. But it
would definitely add less complexity and improve messaging performance if we
restrict the usage to a single class. This way you can serialize the messages
after each other without taking care of boundaries and just keep a single
message in memory that get's filled from disk in case of a spill."
We can do it either ways with SpillingQueue. I did make an attempt to keep it a
single object within if possible.( There are 2 versions of poll - M poll(M msg)
and M poll()) HAMA-706 patch should look into using it and make it less dirty
there.
I agree BSPMessageBundle implies an Iterable. It may encapsulate a list of
messages or a byte buffer where the messages are serialized. Even within the
byte buffers, there could be a protocol defined. (Say, I might be storing the
messages and the offsets in the buffer) . Currently, by default, we process 16K
spilled data at a time. The BSPMessageBundle could just be a placeholder to
mark the boundaries of unit spilled data. For a spillin queue, these boundaries
don't matter. But for sorting queues, we would need to mark the start and end
of each sorted segment. So the name remains but I propose to use it just to
represent a bunch of messages and not a map of class-name to list of messages.
The receiver queue would be reading from the BSPMessageBundle. It would be a
part of the protocol. :)
> Implement of Hama RPC
> ----------------------
>
> Key: HAMA-742
> URL: https://issues.apache.org/jira/browse/HAMA-742
> Project: Hama
> Issue Type: Sub-task
> Reporter: Edward J. Yoon
> Fix For: 0.6.1
>
>
> To solve HDFS 2.0 compatibility issue, we have to change a lot of codes for
> Hadoop 2.0 RPC, moreover, yarn RPC doesn't support asynchronous call directly.
> Ultimately, we can pursue the performance and integrate more easily with
> hadoop multi-versions by having our own RPC.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira