Hi Nilay,
Seems like the problem is deadlock in the Torus topology.
The code enforces XY routing in the Torus (similar to a Mesh) but since
there are rings in every row/column, it leads to a cyclic dependency
within a dimension at high loads.
Each response message becomes 5 flits increasing load in that vnet,
which is why this artefact is being seen. I never tested the Torus at
high-loads before so missed this.
I notice that decreasing the injection rate to 0.1 does not show this
problem.
If you change the topology to Mesh, you will notice that all messages
are received.
Even in a mesh, though, you will have to decrease the injection rate,
since 0.3 is post saturation.
[0.3 packets/node/cycle => 0.3x(1+1+5)/3 flits/node/cycle = 0.7
flits/node/cycle. An 8x8 mesh theoretically saturates before 0.5
flits/node/cycle].
There are a bunch of deadlock free routing algorithms for torii, one of
which will have to be implemented I guess for the Torus topology to be
robust.
Should we add a warning in that topology file about this?
Thanks,
Tushar
PS: I am pushing a patch for review that (1) breaks the stats on a per
vnet basis which is useful for such debugging etc, and (2) cleans up
redundant code between flexible and fixed networks.
On 03/18/2012 06:41 PM, Nilay Vaish wrote:
Tushar
I am using the network tester in gem5 with GARNET. I am observing a
peculiar behavior when I run the following command --
./build/X86/gem5.fast ./configs/example/ruby_network_test.py
--num-cpus=64 --num-dirs=64 --topology=Torus --garnet-network=flexible
--mesh-rows=8 --sim-cycles=10000 -i 0.30 --random_seed=1
As you might recall, there three different types of messages generated
by the network tester -- request, forward and response, and these are
generated with ration 1:1:1. In the file ruby.stats, the number of
responses that reach the directory controllers are way less than
compared to the number of requests and forwards, the ratio being
10(request):10(forward):1(response) . It seems that the network
interface at the cache controller is not able to issue most of the
response messages and hence these keep waiting the in the queue at the
input port.
Can you look into why this is happening?
Thanks
Nilay
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users