[
https://issues.apache.org/jira/browse/HAMA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254629#comment-13254629
]
Edward J. Yoon commented on HAMA-521:
-------------------------------------
Let's schedule this to 0.6.
During run sssp job on 32 node cluster, I've received error message as below:
{code}
12/04/16 19:15:20 DEBUG graph.GraphJobRunner: 459379, 2147483647
12/04/16 19:15:20 DEBUG graph.GraphJobRunner: 274699, 2147483647
12/04/16 19:15:20 DEBUG graph.GraphJobRunner: 488920, 2147483647
12/04/16 19:15:20 ERROR bsp.BSPTask: Error closing BSP Peer.
java.lang.NullPointerException
at org.apache.hama.bsp.message.DiskQueue.add(DiskQueue.java:185)
at org.apache.hama.bsp.message.DiskQueue.addAll(DiskQueue.java:177)
at
org.apache.hama.bsp.message.AbstractMessageManager.clearOutgoingQueues(AbstractMessageManager.java:101)
at org.apache.hama.bsp.BSPPeerImpl.clear(BSPPeerImpl.java:378)
at org.apache.hama.bsp.BSPPeerImpl.close(BSPPeerImpl.java:370)
at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:181)
at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097)
12/04/16 19:15:20 ERROR bsp.BSPTask: Shutting down ping service.
12/04/16 19:15:20 FATAL bsp.GroomServer: Error running child
java.lang.NullPointerException
at org.apache.hama.bsp.message.DiskQueue.add(DiskQueue.java:185)
at org.apache.hama.bsp.message.DiskQueue.addAll(DiskQueue.java:177)
at
org.apache.hama.bsp.message.AbstractMessageManager.clearOutgoingQueues(AbstractMessageManager.java:101)
at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:337)
at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:65)
at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:167)
at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097)
java.lang.NullPointerException
at org.apache.hama.bsp.message.DiskQueue.add(DiskQueue.java:185)
at org.apache.hama.bsp.message.DiskQueue.addAll(DiskQueue.java:177)
at
org.apache.hama.bsp.message.AbstractMessageManager.clearOutgoingQueues(AbstractMessageManager.java:101)
at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:337)
at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:65)
at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:167)
at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1097)
{code}
> Improve message buffering to save memory
> ----------------------------------------
>
> Key: HAMA-521
> URL: https://issues.apache.org/jira/browse/HAMA-521
> Project: Hama
> Issue Type: Sub-task
> Reporter: Thomas Jungblut
> Assignee: Thomas Jungblut
> Attachments: HAMA-521.patch, HAMA-521_1.patch, HAMA-521_2.patch
>
>
> Suraj and I had a bit of discussion about incoming and outgoing message
> buffering and scalability.
> Currently everything lies on the heap, causing huge amounts of GC and waste
> of memory. We can do better.
> Therefore we need to extract an abstract Messenger class which is directly
> under the interface but over the compressor class.
> It should abstract the use of the queues in the back (currently lot of
> duplicated code) and it should be backed by a sequencefile on local disk.
> Once sync() starts it should return a message iterator for combining and then
> gets put into a message bundle which is send over RPC.
> On the other side we get a bundle and looping over it putting everything into
> the heap making it much larger than it needs to be. Here we can also flush on
> disk because we are just using a queue-like method to the user-side.
> Plus points:
> In case we have enough heap (see our new metric system), we can also
> implement a buffering technology that is not flushing everything to disk.
> Open questions:
> I don't know how much slower the whole system gets, but it would save alot of
> memory. Maybe we should first evaluate if it is really needed.
> In any case, the refactoring of the duplicate code in the messengers is
> needed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira