Hi All, Curtis-Thank you for listing some of the differences. I was waiting for the completed multi-gem5 patch before I send my review. Please see my inline response below. I’ve addressed the concerns that you’ve raised. Also, I’ve added a bit more to the comparison.
-* Synchronization. pd-gem5 implements this in Python (not a problem in itself; aesthetically this is nice, but...). The issue is that pd-gem5's data packets and barrier messages travel over different sockets. Since pd-gem5 could see data packets passing synchronization barriers, it could create an inconsistent checkpoint. multi-gem5's synchronization is implemented in C++ using sync events, but more importantly, the messages queue up in the same stream and so cannot have the issue just described. (Event ordering is often crucial in snapshot protocols.) Therefore we feel that multi-gem5 is a more robust solution in this respect. Each packet in pd-gem5 has a time-stamp. So even if data packets pass synchronization barriers (in another word data packets arrive early at the destination node), destination node process packets based on their timestamp. Actually allowing data packets to pass sync barriers is a nice feature that can reduce the likelihood of late packet reception. Ordering of data messages that flow over pd-gem5 nodes is also preserved in pd-gem5 implementation. What you mentioned as an advantage for multi-gem5 is actually a key disadvantage: buffering sync messages behind data packets can add up to the synchronization overhead and slow down simulation significantly. Also, multi-gem5 send huge sized messages (multiHeaderPkt) through network to perform each synchronization point, which increases synchronization overhead further. In pd-gem5, we choose to send just one character as sync message through a separate socket to reduce synchronization overhead. * Packet handling. pd-gem5 uses EtherTap for data packets but changed the polling mechanism to go through the main event queue. Since this rate is actually linked with simulator progress, it cannot guarantee that the packets are serviced at regular intervals of real time. This can lead to packets queueing up which would contribute to the synchronization issues mentioned above. multi-gem5 uses plain sockets with separate receive threads and so does not have this issue. I think again you are pointing to your first concern that I’ve explained above. Packets that have queued up in EtherTap socket, will be processed and delivered to simulation environment at the beginning of next simulation quantum. Please notice that multi-gem5 introduces a new simObjects to interface simulation environment to real world which is redundant. This functionality is already there by EtherTap. * Checkpoint accuracy. A user would like to have a checkpoint at precisely the time the 'm5 checkpoint' operation is executed so as to not miss any of the area of interest in his application. pd-gem5 requires that simulation finish the current quantum before checkpointing, so it cannot provide this. (Shortening the quantum can help, but usually the snapshot is being taken while 'fast-forwarding', i.e. simulating as fast as possible, which would motivate a longer quantum.) multi-gem5 can enter the drain cycle immediately upon receiving a checkpoint request. We find this accuracy highly desirable. It’s true that if you have a large quantum size then there would be some discrepancy between the m5_ckpt instruction tick and the actual dump tick. Based on multi-gem5 code, my understanding is that you send async checkpoint message as soon as one of the gem5 processes encounter m5_ckpt instruction. But I’m not sure how you fix the aforementioned issue, because you have to sync all gem5 processes before you start dumping checkpoint, which necessitate a global synchronization beforehand. By the way, we have a fix for this issue by introducing a new m5 pseudo instruction. * Implementation of network topology. pd-gem5 uses a separate gem5 process to act as a switch whereas multi-gem5 uses a standalone packet relay process. We haven't measured the overhead of pd-gem5's simulated switch yet, but we're confident that our approach is at least as fast and more scalable. There is this flexibility in pd-gem5 to simulate a switch box alongside one of the other gem5 processes. However, it might make that gem5 process the simulation bottleneck. One of the advantages of pd-gem5 over multi-gem5 is that we use gem5 to simulate a switch box, which allows us to model any network topology by instantiating several Switch simObjects and interconnect them with EhterLink in an arbitrary fashion. A standalone tcp server just can provide switch functionality (forwarding packets to destinations) and model a star network topology. Furthermore, it cannot model various network timings such as queueing delay, congestion, and routing latency. Also it has some accuracy issues that I will point out next. * Broken network timing: Forwarding packets between gem5 processes using a standalone tcp server can cause reordering between packets that have different source but same destination. It causes inaccurate network timing and worse of all non-deterministic simulation. pd-gem5 resolve this by reordering packets at Switch process and then send them to their destination (it’s possible as switch is synchronized with the rest of the nodes). * Amount of changes pd-gem5 introduce different modes in etherlink just to provide accurate timing for each component in the network subsystem (NIC, link, switch) as well as capability of modeling different network topologies (mesh, ring, fat tree, etc). To enable a simple functionality, like what multi-gem5 provides, the amount of changes in gem5 can be limited to time-stamping packets and providing synchronization through python scripts. However, multi-gem5 re-implements functionalists that are already in gem5. * Integrating with gem5 mainstream: pd-gem5 launch script is written in python which is suited for integration with gem5 python scripts. However multi-gem5 uses bash script. Also, all source files in pd-gem5 are already parts of gem5 mainstream. However multi-gem5 has tcp_server.cc/hh that is a standalone process and cannot be part of gem5. Thank you, Mohammad On Fri, Jun 26, 2015 at 8:40 PM, Curtis Dunham <curtis.dun...@arm.com> wrote: > Hello everyone, > We have taken a look at how pd-gem5 compares with multi-gem5. While > intending > to deliver the same functionality, there are some crucial differences: > > * Synchronization. > > pd-gem5 implements this in Python (not a problem in itself; > aesthetically > this is nice, but...). The issue is that pd-gem5's data packets and > barrier messages travel over different sockets. Since pd-gem5 could see > data packets passing synchronization barriers, it could create an > inconsistent checkpoint. > > multi-gem5's synchronization is implemented in C++ using sync events, > but > more importantly, the messages queue up in the same stream and so cannot > have the issue just described. (Event ordering is often crucial in > snapshot protocols.) Therefore we feel that multi-gem5 is a more robust > solution in this respect. > > * Packet handling. > > pd-gem5 uses EtherTap for data packets but changed the polling mechanism > to go through the main event queue. Since this rate is actually linked > with simulator progress, it cannot guarantee that the packets are > serviced > at regular intervals of real time. This can lead to packets queueing up > which would contribute to the synchronization issues mentioned above. > > multi-gem5 uses plain sockets with separate receive threads and so does > not > have this issue. > > * Checkpoint accuracy. > > A user would like to have a checkpoint at precisely the time the > 'm5 checkpoint' operation is executed so as to not miss any of the > area of interest in his application. > > pd-gem5 requires that simulation finish the current quantum > before checkpointing, so it cannot provide this. > > (Shortening the quantum can help, but usually the snapshot is being taken > while 'fast-forwarding', i.e. simulating as fast as possible, which would > motivate a longer quantum.) > > multi-gem5 can enter the drain cycle immediately upon receiving a > checkpoint request. We find this accuracy highly desirable. > > * Implementation of network topology. > > pd-gem5 uses a separate gem5 process to act as a switch whereas > multi-gem5 > uses a standalone packet relay process. > > We haven't measured the overhead of pd-gem5's simulated switch yet, but > we're confident that our approach is at least as fast and more scalable. > > > Thanks, > Curtis > ________________________________________ > From: gem5-dev [gem5-dev-boun...@gem5.org] On Behalf Of Mohammad Alian [ > al...@wisc.edu] > Sent: Friday, June 26, 2015 7:37 PM > To: gem5 Developer List > Subject: Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system > on multiple physical hosts > > Hi Anthony, > > I think that would be a good option, then I can add pd-gem5 functionality > on top of that. Right now I've simplified your implementation. Also, I > think I had found some bugs in your patch that I cannot remember now. If > you decided to ship EtherSwitch patch, let me know to give you a review on > that. > > Thanks, > Mohammad > > On Thu, Jun 25, 2015 at 8:36 PM, Gutierrez, Anthony < > anthony.gutier...@amd.com> wrote: > > > Would it make sense for me to ship the EtherSwitch patch first, since it > > has utility on its own, and then we can decide which of the "multi-gem5" > > approaches is best, or if it's some combination of both? > > > > The only reason I never shipped it was because Steve raised an issue that > > I didn't have a good alternative for, and didn't have the time to look > into > > one at that time. > > ________________________________________ > > From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Mohammad Alian [ > > al...@wisc.edu] > > Sent: Wednesday, June 24, 2015 12:43 PM > > To: gem5 Developer List > > Subject: Re: [gem5-dev] pd-gem5: simulating a parallel/distributed system > > on multiple physical hosts > > > > Hi Andreas, > > > > Thanks for the comment. > > I think the checkpointing support in both works is the same. Here is how > > checkpointing support is implemented in pd-gem5: > > > > Whenever one of gem5 processes encounter an m5-checkpoint pseudo > > instruction, it will send a “recv-ckpt” signal to the > > “barrier” process. Then the “barrier” process sends a “take-ckpt” signal > to > > all the simulated nodes > > (including the node that encountered m5-checkpoint) at the end of the > > current simulation quantum. On the reception of > > “take-ckpt” signal, gem5 processes start dumping check-points. This makes > > each simulated node dump a checkpoint > > at the same simulated time point while ensuring there is no in-flight > > packets. > > > > I believe this is the same as multi-gem5 patch approach for checkpoint > > support (based on the commit message of http://reviews.gem5.org/r/2865/ > ). > > Also, we have tested our mechanism with several benchmarks and it works. > As > > Steve suggested, I'll look into Curtis's patch and try to review it as > > well. > > But as Nilay also mentioned earlier, there are some codes missing in > > Curtis's patch. I prefer to first run multi-gem5 before starting to > review > > it. > > > > Thank you, > > Mohammad > > > > On Wed, Jun 24, 2015 at 7:25 AM, Andreas Hansson < > andreas.hans...@arm.com> > > wrote: > > > > > Hi Steve, > > > > > > Apologies for the confusion. We are on the same page. My point is that > we > > > cannot simply take a little bit of patch A and a little bit of patch B. > > > This change involves a lot of code, and we need to approach this in a > > > structured fashion. My proposal is to do it bottom up, and start by > > > getting the basic support in place. Since > > http://reviews.gem5.org/r/2826/ > > > has already been on the review board for a few months, I am merely > > > suggesting that the it would be a good start to relate the newly posted > > > patches to what is already there. > > > > > > Andreas > > > > > > > > > > > > On 24/06/2015 13:11, "gem5-dev on behalf of Steve Reinhardt" > > > <gem5-dev-boun...@gem5.org on behalf of ste...@gmail.com> wrote: > > > > > > >Hi Andreas, > > > > > > > >I'm a little confused by your email---you say you're fundamentally > > opposed > > > >to looking at both patches and picking the best features, then you > point > > > >out that the patches Curtis posted have the feature of better > > > >checkpointing > > > >support so we should pick that :). > > > > > > > >Obviously we can't just pick patch A from Mohammad's set and patch B > > from > > > >Curtis's set and expect them to work together, but I think that having > > > >both > > > >sets of patches available and comparing and contrasting the two > > > >implementations should enable us to get to a single implementation > > that's > > > >the best of both. Someone will have to make the effort of integrating > > the > > > >better ideas from one set into the other set to create a new unified > set > > > >of > > > >patches; (or maybe we commit one set and then integrate the best of > the > > > >other set as patches on top of that), but the first step is to > identify > > > >what "the best of both" is. Having Mohammad look at Curtis's patches, > > and > > > >Curtis (or someone else from ARM) closely examine Mohammad's patches > > would > > > >be a great start. I intend to review them both, though unfortunately > my > > > >time has been scarce lately---I'm hoping to squeeze that in later this > > > >week. > > > > > > > >Once we've had a few people look at both, we can discuss the pros and > > cons > > > >of each, then discuss the strategy for getting the best features in. > So > > > >far I've heard that Mohammad's patches have a better network model but > > the > > > >ARM patches have better checkpointing support; that seems like a good > > > >start. > > > > > > > >Steve > > > > > > > >On Wed, Jun 24, 2015 at 12:26 AM Andreas Hansson < > > andreas.hans...@arm.com > > > > > > > >wrote: > > > > > > > >> Hi all, > > > >> > > > >> Great work. However, I fundamentally do not believe in the approach > of > > > >> ‘letting reviewers pick the best features’. There is no way we would > > > >>ever > > > >> get something working out if it. We need to get _one_ working > solution > > > >> here, and figure out how to best get there. I would propose to do it > > > >> bottom up, starting with the basic multi-simulator instance support, > > > >> checkpointing support, and then move on to the network between the > > > >> simulator instances. > > > >> > > > >> Thus, I propose we go with the low-level plumbing and checkpoint > > support > > > >> from what Curtis has posted. I believe proper checkpointing support > to > > > >>be > > > >> the most challenging, and from what I can tell this is far more > > limited > > > >>in > > > >> what you just posted Mohammad. Could you perhaps review Curtis > patches > > > >> based on your insights, and we can try and get these patches in > shape > > > >>and > > > >> committed asap. > > > >> > > > >> Once we have the baseline functionality in place, then we can start > > > >> looking at the more elaborate network models. > > > >> > > > >> Does this sound reasonable? > > > >> > > > >> Thanks, > > > >> > > > >> Andreas > > > >> > > > >> On 24/06/2015 05:05, "gem5-dev on behalf of Mohammad Alian" > > > >> <gem5-dev-boun...@gem5.org on behalf of al...@wisc.edu> wrote: > > > >> > > > >> >Hello All, > > > >> > > > > >> >I have submitted a chain of patches which enables gem5 to simulate > a > > > >> >cluster on multiple physical hosts: > > > >> > > > > >> >http://reviews.gem5.org/r/2909/ > > > >> >http://reviews.gem5.org/r/2910/ > > > >> >http://reviews.gem5.org/r/2912/ > > > >> >http://reviews.gem5.org/r/2913/ > > > >> >http://reviews.gem5.org/r/2914/ <http://reviews.gem5.org/r/2914/> > > > >> > > > > >> >and a patch that contains run scripts for a simple experiment: > > > >> >http://reviews.gem5.org/r/2915/ > > > >> > > > > >> >We have run several benchmarks using this infrastructure, including > > NAS > > > >> >parallel benchmarks (MPI) and DCBench-hadoop > > > >> >(http://prof.ict.ac.cn/DCBench/), > > > >> >and would be happy to share scripts/diskimages. > > > >> > > > > >> >We call this *pd-gem5*. *pd-gem5 *functionality is more or less the > > > >>same > > > >> >as > > > >> >Curtis's patch for *multi-gem5.* However, I feel *pd-gem5 *network > > > >>model > > > >> >is > > > >> >more thorough; it also enables modeling different network > topologies. > > > >> >Having both set of changes together let reviewers to pick best > > features > > > >> >from both works. > > > >> > > > > >> >Thank you, > > > >> >Mohammad Alian > > > >> >_______________________________________________ > > > >> >gem5-dev mailing list > > > >> >gem5-dev@gem5.org > > > >> >http://m5sim.org/mailman/listinfo/gem5-dev > > > >> > > > >> > > > >> -- IMPORTANT NOTICE: The contents of this email and any attachments > > are > > > >> confidential and may also be privileged. If you are not the intended > > > >> recipient, please notify the sender immediately and do not disclose > > the > > > >> contents to any other person, use it for any purpose, or store or > copy > > > >>the > > > >> information in any medium. Thank you. > > > >> > > > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > > > >> Registered in England & Wales, Company No: 2557590 > > > >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > > > >>9NJ, > > > >> Registered in England & Wales, Company No: 2548782 > > > >> _______________________________________________ > > > >> gem5-dev mailing list > > > >> gem5-dev@gem5.org > > > >> http://m5sim.org/mailman/listinfo/gem5-dev > > > >> > > > >_______________________________________________ > > > >gem5-dev mailing list > > > >gem5-dev@gem5.org > > > >http://m5sim.org/mailman/listinfo/gem5-dev > > > > > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > > > confidential and may also be privileged. If you are not the intended > > > recipient, please notify the sender immediately and do not disclose the > > > contents to any other person, use it for any purpose, or store or copy > > the > > > information in any medium. Thank you. > > > > > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > > > Registered in England & Wales, Company No: 2557590 > > > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > 9NJ, > > > Registered in England & Wales, Company No: 2548782 > > > _______________________________________________ > > > gem5-dev mailing list > > > gem5-dev@gem5.org > > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > _______________________________________________ > > gem5-dev mailing list > > gem5-dev@gem5.org > > http://m5sim.org/mailman/listinfo/gem5-dev > > _______________________________________________ > > gem5-dev mailing list > > gem5-dev@gem5.org > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org > http://m5sim.org/mailman/listinfo/gem5-dev > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 > > _______________________________________________ > gem5-dev mailing list > gem5-dev@gem5.org > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev