Tushar, thanks for the help. The stat that I found useful was the number
of packets that are waiting at network interfaces for being routed, and
the average time for which these packets have been waiting. I think this
time should be part of the queueing delay stat that is already present.
What do you think?
--
Nilay
On Mon, 19 Mar 2012, Tushar Krishna wrote:
Hi Nilay,
Seems like the problem is deadlock in the Torus topology.
The code enforces XY routing in the Torus (similar to a Mesh) but since there
are rings in every row/column, it leads to a cyclic dependency within a
dimension at high loads.
Each response message becomes 5 flits increasing load in that vnet, which is
why this artefact is being seen. I never tested the Torus at high-loads
before so missed this.
I notice that decreasing the injection rate to 0.1 does not show this
problem.
If you change the topology to Mesh, you will notice that all messages are
received.
Even in a mesh, though, you will have to decrease the injection rate, since
0.3 is post saturation.
[0.3 packets/node/cycle => 0.3x(1+1+5)/3 flits/node/cycle = 0.7
flits/node/cycle. An 8x8 mesh theoretically saturates before 0.5
flits/node/cycle].
There are a bunch of deadlock free routing algorithms for torii, one of which
will have to be implemented I guess for the Torus topology to be robust.
Should we add a warning in that topology file about this?
Thanks,
Tushar
PS: I am pushing a patch for review that (1) breaks the stats on a per vnet
basis which is useful for such debugging etc, and (2) cleans up redundant
code between flexible and fixed networks.
On 03/18/2012 06:41 PM, Nilay Vaish wrote:
Tushar
I am using the network tester in gem5 with GARNET. I am observing a
peculiar behavior when I run the following command --
./build/X86/gem5.fast ./configs/example/ruby_network_test.py --num-cpus=64
--num-dirs=64 --topology=Torus --garnet-network=flexible --mesh-rows=8
--sim-cycles=10000 -i 0.30 --random_seed=1
As you might recall, there three different types of messages generated by
the network tester -- request, forward and response, and these are
generated with ration 1:1:1. In the file ruby.stats, the number of
responses that reach the directory controllers are way less than compared
to the number of requests and forwards, the ratio being
10(request):10(forward):1(response) . It seems that the network interface
at the cache controller is not able to issue most of the response messages
and hence these keep waiting the in the queue at the input port.
Can you look into why this is happening?
Thanks
Nilay
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users