Hello all,

I have recently proposed adding a git hook 
(https://gem5-review.googlesource.com/c/public/gem5/+/21739) to make sure that 
commit messages follow the guidelines in CONTRIBUTING, with the uttermost goal 
being to make sure that the gem5 tags used are valid. I have also checked if 
the previous commits would have been valid if this hook had been applied, and 
just a tiny percentage of them would,
Although typos were frequent, the main issue was that the tags in MAINTAINERS 
do not correspond to reality, and many other tags were being used instead.

I have done a quick check on all used tags since around 2013 (last 5000 
commits) to check which and how many times each tag has been used. I have 
manually removed some noise (fixed simple tag typos, and ignored grotesque 
outliers, which were less than 0.1% of the tags), and all tags were treated as 
lowercase. This is the result:

    {'mem': 575, 'systemc': 466, 'cpu': 368, 'arm': 347, 'ruby': 327, 
'arch-arm': 251, 'sim': 214, 'stats': 194, 'mem-cache': 193, 'config': 193, 
'dev': 192, 'base': 187, 'x86': 172, 'tests': 144, 'scons': 142, 'dev-arm': 
127, 'misc': 113, 'arch': 93, 'kvm': 79, 'python': 73, 'util': 70, 'configs': 
67, 'mem-ruby': 63, 'syscall_emul': 53, 'sim-se': 52, 'style': 48, 'ext': 48, 
'gpu-compute': 46, 'arch-riscv': 43, 'riscv': 35, 'sparc': 32, 'power': 26, 
'cpu-o3': 23, 'alpha': 21, 'arch-x86': 19, 'mips': 18, 'slicc': 17, 'hsail': 
16, 'regressions': 13, 'system-arm': 12, 'o3': 11, 'learning_gem5': 11, 
'syscall-emul': 10, 'fastmodel': 10, 'isa': 8, '': 7, 'ps2': 7, 'kern': 7, 
'test': 6, 'dist': 5, 'tlm': 5, 'pwr': 5, 'cache': 4, 'energy': 4, 'gpu': 4, 
'probe': 3, 'virtio': 3, 'arch-mips': 3, 'cpu/o3': 3, 'net': 3, 'cpu-minor': 3, 
'syscall emulation': 3, 'revert "cpu': 3, 'arch-power': 3, 'null': 3, 'build': 
3, 'copyright': 2, 'swig': 2, 'arch-sparc': 2, 'x86 regressions': 2, 
'cpu-tester': 2, 'loader': 2, 'arch-hsail': 2, 'cpuid': 2, 'minor': 2, 'o3 
cpu': 2, 'syscall': 2, 'proto': 2, 'vnc': 2, 'hsail-x86': 2, 'arch-alpha': 1, 
'base simple cpu': 1, 'testlib': 1, 'gdb': 1, 'pci': 1, 'docs': 1, 'tarch': 1, 
'x86 cpuid': 1, 'testers': 1, 'debug': 1, 'util/regress': 1, 'branch 
predictor': 1, 'devices': 1, 'arch/x86': 1, 'kmi': 1, 'x86 isa': 1, 'cpu. 
arch': 1, 'sym': 1, 'rcs scripts': 1, 'regress': 1, 'ide': 1, 'cache recorder': 
1, 'o3 iew': 1, 'unittest': 1, 'pseudo inst': 1, 'build opts': 1, 'pl011': 1, 
'hmc': 1, 'decoder': 1, 'learning-gem5': 1, 'o3cpu': 1,  'sconstruct': 1, 
'cpu-kvm': 1, 'ruby sequencer': 1, 'ext lib': 1, 'mem-garnet': 1, 
'arch-generic': 1, 'options': 1}

Matching against the current tags existing in maintainers (not including 
dev-arm, which shall be added soon), this is the list:

    {'mem': 575, 'cpu': 367, 'arch-arm': 251, 'sim': 214, 'stats': 194, 
'mem-cache': 193, 'dev': 192, 'base': 187, 'tests': 144, 'scons': 142, 'misc': 
113, 'arch': 93, 'python': 73, 'util': 70, 'configs': 67, 'mem-ruby': 63, 
'sim-se': 51, 'ext': 48, 'gpu-compute': 46, 'arch-riscv': 43, 'cpu-o3': 23, 
'arch-x86': 19, 'system-arm': 12, 'arch-mips': 3, 'arch-power': 3, 'cpu-minor': 
3, 'arch-hsail': 2, 'arch-sparc': 2, 'arch-alpha': 1, 'cpu-kvm': 1, 
'mem-garnet': 1, 'learning-gem5': 1}

Therefore these gem5 tags have not been used in a long time:

    {'dev-virtio', 'system-alpha', 'sim-power', 'system', 'cpu-simple'}

'dev-virtio' has been used as 'virtio' (3)
'system-alpha' has been used as 'alpha' (21) and 'arch-alpha' (1)

'power' (26) has both been used as 'sim-power' and 'arch-power', which are 
completely different

'system' and 'cpu-simple' seem to have been substituted by other tags

=============================================


Further analysis:

Except for arm and riscv, which had tied results, there seems to exist a 
preference to not include "arch-" in ISA specific commits, so I think they 
should all be replaced, i.e., 'arch-x86' becomes 'x86', 'arch-hsail' becomes 
'hsail' and so on, but I also agree with the counterargument that these tags 
reflect the path to the dir, and therefore 'arch-' must be present.

The same applies to "mem-ruby", which is constantly used as "ruby".
Maybe 'learning-gem5' should be move to 'misc' due to their low frequency. 
Also, move 'mem-garnet' to 'misc', since although it belongs to 'mem' I do not 
know if the maintainer is familiar with it.
The usage of 'misc' has been fair, overall, although sometimes a new tag 
'style' has been used instead, and other times other tags should have been 
applied instead.
'dev-virtio' is very rare, so it could be incorporated by 'dev'.
'cpu-minor' is very rare, so it could be incorporated by 'cpu'

There is no maintainer for 'cpu-o3'. Does it really need to be apart from 'cpu'?

I do not know what to think of the 'system', 'system-alpha', and 'system-arm' 
tags.
=============================================


Given all that information, I propose remodeling the list of tags. This is what 
I am thinking:

[alpha, arch, arm, base, configs, cpu, cpu-o3, dev, dev-arm, ext, gpu-compute, 
hsail, kvm, learning-gem5, mem, mem-cache, mips, misc, power, python, riscv, 
ruby, scons, sim, sim-se, sim-power, slicc, sparc, stats, system, system-arm, 
systemc, tests, util, x86]

Of course, this is mostly from a statistical point of view, so it may not 
reflect the knowledge nor will of the maintainers, so I ask you: what do you 
think?


Regards,
Daniel
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to