Hi Nilay,
Seems like the problem is deadlock in the Torus topology.
The code enforces XY routing in the Torus (similar to a Mesh) but since there are rings in every row/column, it leads to a cyclic dependency within a dimension at high loads. Each response message becomes 5 flits increasing load in that vnet, which is why this artefact is being seen. I never tested the Torus at high-loads before so missed this. I notice that decreasing the injection rate to 0.1 does not show this problem.

If you change the topology to Mesh, you will notice that all messages are received. Even in a mesh, though, you will have to decrease the injection rate, since 0.3 is post saturation. [0.3 packets/node/cycle => 0.3x(1+1+5)/3 flits/node/cycle = 0.7 flits/node/cycle. An 8x8 mesh theoretically saturates before 0.5 flits/node/cycle].

There are a bunch of deadlock free routing algorithms for torii, one of which will have to be implemented I guess for the Torus topology to be robust.
Should we add a warning in that topology file about this?

Thanks,
Tushar
PS: I am pushing a patch for review that (1) breaks the stats on a per vnet basis which is useful for such debugging etc, and (2) cleans up redundant code between flexible and fixed networks.


On 03/18/2012 06:41 PM, Nilay Vaish wrote:
Tushar

I am using the network tester in gem5 with GARNET. I am observing a peculiar behavior when I run the following command --

./build/X86/gem5.fast ./configs/example/ruby_network_test.py --num-cpus=64 --num-dirs=64 --topology=Torus --garnet-network=flexible --mesh-rows=8 --sim-cycles=10000 -i 0.30 --random_seed=1

As you might recall, there three different types of messages generated by the network tester -- request, forward and response, and these are generated with ration 1:1:1. In the file ruby.stats, the number of responses that reach the directory controllers are way less than compared to the number of requests and forwards, the ratio being 10(request):10(forward):1(response) . It seems that the network interface at the cache controller is not able to issue most of the response messages and hence these keep waiting the in the queue at the input port.

Can you look into why this is happening?

Thanks
Nilay
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to