I will make the code stable first before merging it back.
On 18 August 2014 17:40, Edward J. Yoon <edwardy...@apache.org> wrote: > Do you have any plan for merging them? > > This is side opinion. If we want to use Git, now I'm +1. > > On Sat, Aug 16, 2014 at 12:00 AM, Chia-Hung Lin <cli...@googlemail.com> wrote: >> Code right now is at https://github.com/chlin501/hama.git >> >> Maven and jdk are required to build the project >> >> Command to have a clean build: >> mvn clean install -DskipTests=true -Dmaven.javadoc.skip=true >> >> To test a specific test case: >> mvn -DskipTests=false -Dtest=<TestCaseName> test >> >> >> On 15 August 2014 18:21, Suraj Menon <menonsur...@gmail.com> wrote: >>> Hi Edward, sorry to enter the discussion so late. >>> >>> Bundling and Unbundling of message queue is not Spilling queue's >>> responsibility, it was ended up there to be compatible with the existent >>> implementation of BSP Peer communication. Remember Spilling Queue >>> implementation was done to immediately remove some OutOfMemory issues on >>> sender side first. Spilling Queue gives you a byte array (ByteBuffer) with >>> a batch of serialized messages. This is effectively bundling the messages >>> in byte array (hence the ByteArrayMessageBundle) and sending them for >>> processing. The SpilledDataProcessor's are implemented as a pipeline of >>> processing done using inheritance, something like what we may use trait for >>> in Scala. So if we have a SpilledDataProcessor that sends this bundled >>> message via RPC to the peer, there is no need to write them to file and >>> read them back. As I previously mentioned this was done to be compatible >>> with the existent implementation of peer.send. >>> >>> Also, the async checkpoint recovery code was written before spilling queue. >>> Today we can remove the single message write and do this in "before peer >>> sync" phase to just write the whole file to HDFS. >>> >>> I would say performance numbers and maintainability comes first and if you >>> think removing spilling queue is a solution go for it. As far as async >>> checkpointing is to be considered, that was a first proof of concept we did >>> and it is high time we move forward from there. >>> >>> Chiahung, do you have some instruction on where and how I can build the >>> scala version of your code? >>> >>> I am really finding it hard to dedicate time for Hama these days. >>> >>> - Suraj >>> >>> >>> On Tue, Aug 12, 2014 at 7:15 AM, Edward J. Yoon <edwardy...@apache.org> >>> wrote: >>> >>>> ChiaHung, >>>> >>>> Yes, I'm thinking similar things. >>>> >>>> On Tue, Aug 12, 2014 at 4:11 PM, Chia-Hung Lin <cli...@googlemail.com> >>>> wrote: >>>> > I am currently working on this part based on the superstep api, >>>> > similar to the Superstep.java in the trunk. >>>> > >>>> > The checkpointer[1] saves bundle message instead of single message. >>>> > Not very sure if this is what you are looking for? >>>> > >>>> > [1]. >>>> https://github.com/chlin501/hama/blob/peer-comm-mech-changed/core/src/main/scala/org/apache/hama/monitor/Checkpointer.scala >>>> > >>>> > >>>> > >>>> > >>>> > On 12 August 2014 15:04, Edward J. Yoon <edwardy...@apache.org> wrote: >>>> >> I think that transferring single messages at a time is not a wise way. >>>> >> Bundle is used to avoid network overheads and contentions. So, if we >>>> >> use Bundle, each processor always sends/receives an bundles. >>>> >> >>>> >> BSPMessageBundle is Writable (and Iterable). And it manages the >>>> >> serialized message as a byte array. If we write an bundles when >>>> >> checkpointing or using Disk-queue, it'll be more simple and faster. >>>> >> >>>> >> In Spilling Queue case, it always requires the process of unbundling >>>> >> and putting messages into queue. >>>> >> >>>> >> >>>> >> On Tue, Aug 12, 2014 at 2:41 PM, Tommaso Teofili >>>> >> <tommaso.teof...@gmail.com> wrote: >>>> >>> -1, can't we first discuss? Also it'd be helpful to be more specific >>>> on the >>>> >>> problems. >>>> >>> Tommaso >>>> >>> >>>> >>> >>>> >>> >>>> >>> 2014-08-12 4:25 GMT+02:00 Edward J. Yoon <edwardy...@apache.org>: >>>> >>> >>>> >>>> All, >>>> >>>> >>>> >>>> I'll delete Spilling queue, and rewrite checkpoint/recovery >>>> >>>> implementation (checkpointing bundles is better than checkpointing all >>>> >>>> messages). Current implementation is quite mess :/ there are huge >>>> >>>> deserialization/serialization overheads.. >>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, Edward J. Yoon >>>> >>>> CEO at DataSayer Co., Ltd. >>>> >>>> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Best Regards, Edward J. Yoon >>>> >> CEO at DataSayer Co., Ltd. >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> CEO at DataSayer Co., Ltd. >>>> > > > > -- > Best Regards, Edward J. Yoon > CEO at DataSayer Co., Ltd.