Hello,

I used gprof to get the time spent on function. My OE function spends 38% of total time, and std::__introsort_loop spends 45% of running time, I guess this is the function call of two sorts in my OE. The adaptive routing doesn't take much time. However, when I ran adaptive routing alone, there seems no improvement on sim_insts (even decreased a little bit). So I'm facing two problems: 1. the OE routing is too complex in the routeCompute function. 2. Adaptive routing doesn't improve the performance. Do you have any suggestions for help?

Yuhang

On 09/12/2013 07:31 PM, Andreas Hansson wrote:
Hi Yuhang,

I suspect you call your routing function every cycle for every packet, causing the massive slow down. You can always do a profiling run to figure out where the time is spent. Build gem5.perf and use google perftools to analyse the output, or use gem5.prof and analyse it with pprof.

Good luck.

Andreas

From: Yuhang <[email protected] <mailto:[email protected]>>
Reply-To: gem5 users mailing list <[email protected] <mailto:[email protected]>>
Date: Thursday, 12 September 2013 18:25
To: gem5 users mailing list <[email protected] <mailto:[email protected]>> Subject: [gem5-users] Odd even and adaptive routing didn't improve the performance

Hello all,

I implemented odd even scheme and adaptive routing in garnet. For odd even, I use the algorithm in the paper /The odd-even turn model for adaptive routing/ (Ge-Ming Chiu 2000). For adaptive routing, I use get_credit_cnt(vcs) for each output to sum up all the credits in it, and choose the one with most credits. I traced the flits flow, they work fine. However, the performance didn't improve after the modification. I ran FFT and LU kernels in splash2 with ALPHA MESI protocol, detailed cpu type, 4*4 mesh, 1000000000 max ticks.


FFT with OE and adaptive routing FFT without OE and adaptive RADIX with OE and adaptive RADIX without OE and adaptive
host_inst_rate  1006    11708   1945    15865
sim_insts       15008035        15016804        19748978        19752713

        
        
        
        
total flits injected    1315661         1309101         1131643         1130144
average latency         20.4676         20.4485         19.9921         19.968


Noticed that the host_inst_rate is extremely low with the implementation, and the sim_insts even reduced a little bit. Is that because my modification is too complex, so that each routing takes too many instructions? Or I just write the codes wrong? I tried to reduce both l1 and l2 cache size to achieve higher contention, but only got less than 1% improvement in sim_insts. In addition, the benchmark runs very slow (usually take one day) with my modification and reduced cache size. Could anyone give me some help with my issue?

Sincerely,
Yuhang

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782


_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to