Attached are a few supplemental tidbits to go along with this change.
The first is a file called trietimes.txt which shows the before and
after of this change performance wise using the atomic CPU to boot linux
and to run the twolf regression. I chose the atomic CPU because it would
emphasize the impact of improving address translation. For booting,
simulator performance improved by a little more than 16.5%, and for
twolf a little more than 22.7%. I'm guessing twolf was a little better
because SE doesn't muck with devices and translation is an even bigger
part of what it does.
I also tried a couple variations, one where I used FastAlloc for the
internal Node struct used in the trie, and one where I cached the last
successful lookup in the trie. Whenever the structure of the trie
changed, I threw away the cache. In both cases, performance was similar
but slightly worse. I was surprised especially that FastAlloc didn't
help, but maybe glibc's malloc does a really good job with small objects
now? The addrtrie.hh from both of these are attached for reference.
I don't know if this class will give a similar performance boost to
other ISAs or if x86's translation was just particularly stinky before.
I expect it probably will help a little bit since a trie is such a well
suited data structure for this sort of thing, but it's hard to say.
Gabe
On 04/08/12 01:02, Gabe Black wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1143/
> -----------------------------------------------------------
>
> Review request for Default.
>
>
> Description
> -------
>
> Changeset 8945:f40e80105a03
> ---------------------------
> X86: Use the AddrTrie class to implement the TLB.
>
> This change also adjusts the TlbEntry class so that it stores the number of
> address bits wide a page is rather than its size in bytes. In other words,
> instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K,
> but it's a little harder going the other way.
>
>
> Diffs
> -----
>
> src/arch/x86/pagetable.hh a47fd7c2d44e
> src/arch/x86/pagetable.cc a47fd7c2d44e
> src/arch/x86/pagetable_walker.hh a47fd7c2d44e
> src/arch/x86/pagetable_walker.cc a47fd7c2d44e
> src/arch/x86/tlb.hh a47fd7c2d44e
> src/arch/x86/tlb.cc a47fd7c2d44e
> src/arch/x86/vtophys.cc a47fd7c2d44e
>
> Diff: http://reviews.gem5.org/r/1143/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Gabe Black
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
/*
* Copyright (c) 2012 Google
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met: redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer;
* redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution;
* neither the name of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Authors: Gabe Black
*/
#ifndef __SIM_ADDRTRIE_HH__
#define __SIM_ADDRTRIE_HH__
#include "base/cprintf.hh"
#include "base/fast_alloc.hh"
#include "base/misc.hh"
#include "base/types.hh"
template <class T>
class AddrTrie
{
public:
struct Node : public FastAlloc
{
Addr key;
Addr mask;
bool
matches(Addr addr)
{
return (addr & mask) == key;
}
T *value;
Node *parent;
Node *kids[2];
Node(Addr _key, Addr _mask, T *_val) :
key(_key & _mask), mask(_mask), value(_val),
parent(NULL)
{
kids[0] = NULL;
kids[1] = NULL;
}
void
clear()
{
if (kids[1]) {
kids[1]->clear();
delete kids[1];
kids[1] = NULL;
}
if (kids[0]) {
kids[0]->clear();
delete kids[0];
kids[0] = NULL;
}
}
void
dump(int level)
{
for (int i = 1; i < level; i++) {
cprintf("|");
}
if (level == 0)
cprintf("Root ");
else
cprintf("+ ");
cprintf("(%p, %p, %#X, %#X, %p)\n", parent, this, key, mask, value);
if (kids[0])
kids[0]->dump(level + 1);
if (kids[1])
kids[1]->dump(level + 1);
}
};
protected:
Node head;
public:
AddrTrie() : head(0, 0, NULL)
{}
static const Addr MaxBits = sizeof(Addr) * 8;
private:
bool
goesAfter(Node **parent, Node *kid, Addr addr,
Addr new_mask)
{
if (kid && kid->matches(addr) && (kid->mask & new_mask) == kid->mask) {
*parent = kid;
return true;
} else {
return false;
}
}
Addr
extendMask(Addr orig)
{
// Just in case orig was 0.
const Addr msb = ULL(1) << (MaxBits - 1);
return orig | (orig >> 1) | msb;
}
public:
Node *
insert(Addr addr, Addr width, T *val)
{
// Build a mask which masks off all the bits we don't care about.
Addr new_mask = ~(Addr)0;
if (width < MaxBits)
new_mask <<= (MaxBits - width);
// Use it to tidy up the address.
addr &= new_mask;
// Walk past all the nodes this new node will be inserted after. They
// can be ignored for the purposes of this function.
Node *node = &head;
while (goesAfter(&node, node->kids[0], addr, new_mask) ||
goesAfter(&node, node->kids[1], addr, new_mask))
{}
assert(node);
Addr cur_mask = node->mask;
// If we're already where the value needs to be...
if (cur_mask == new_mask) {
assert(!node->value);
node->value = val;
return node;
}
for (unsigned int i = 0; i < 2; i++) {
Node *&kid = node->kids[i];
Node *new_node;
if (!kid) {
// No kid. Add a new one.
new_node = new Node(addr, new_mask, val);
new_node->parent = node;
kid = new_node;
return new_node;
}
// Walk down the leg until something doesn't match or we run out
// of bits.
Addr last_mask;
bool done;
do {
last_mask = cur_mask;
cur_mask = extendMask(cur_mask);
done = ((addr & cur_mask) != (kid->key & cur_mask)) ||
last_mask == new_mask;
} while (!done);
cur_mask = last_mask;
// If this isn't the right leg to go down at all, skip it.
if (cur_mask == node->mask)
continue;
// At the point we walked to above, add a new node.
new_node = new Node(addr, cur_mask, NULL);
new_node->parent = node;
kid->parent = new_node;
new_node->kids[0] = kid;
kid = new_node;
// If we ran out of bits, the value goes right here.
if (cur_mask == new_mask) {
new_node->value = val;
return new_node;
}
// Still more bits to deal with, so add a new node for that path.
new_node = new Node(addr, new_mask, val);
new_node->parent = kid;
kid->kids[1] = new_node;
return new_node;
}
panic("Reached the end of the AddrTrie insert function!\n");
return NULL;
}
Node *
lookupNode(Addr addr)
{
Node *node = &head;
while (node) {
if (node->value)
return node;
if (node->kids[0] && node->kids[0]->matches(addr))
node = node->kids[0];
else if (node->kids[1] && node->kids[1]->matches(addr))
node = node->kids[1];
else
node = NULL;
}
return NULL;
}
T *
lookup(Addr addr)
{
Node *node = lookupNode(addr);
if (node)
return node->value;
else
return NULL;
}
void
remove(Node *node)
{
if (node->kids[1]) {
assert(node->value);
node->value = NULL;
return;
}
if (!node->parent)
panic("AddrTrie: Can't remove root node.\n");
Node *parent = node->parent;
// If there's a kid, fix up it's parent pointer.
if (node->kids[0])
node->kids[0]->parent = parent;
// Figure out which kid we are, and update our parent's pointers.
if (parent->kids[0] == node)
parent->kids[0] = node->kids[0];
else if (parent->kids[1] == node)
parent->kids[1] = node->kids[0];
else
panic("AddrTrie: Inconsistent parent/kid relationship.\n");
// Make sure if the parent only has one kid, it's kid[0].
if (parent->kids[1] && !parent->kids[0]) {
parent->kids[0] = parent->kids[1];
parent->kids[1] = NULL;
}
// If the parent has less than two kids and no cargo and isn't the
// root, delete it too.
if (!parent->kids[1] && !parent->value && parent->parent)
remove(parent);
delete node;
}
void
clear()
{
head.clear();
}
void
dump(const char *title)
{
cprintf("**************************************************\n");
cprintf("*** Start of Trie: %s\n", title);
cprintf("*** (parent, me, address, mask, value pointer)\n");
cprintf("**************************************************\n");
head.dump(0);
}
};
#endif
/*
* Copyright (c) 2012 Google
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met: redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer;
* redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution;
* neither the name of the copyright holders nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Authors: Gabe Black
*/
#ifndef __SIM_ADDRTRIE_HH__
#define __SIM_ADDRTRIE_HH__
#include "base/cprintf.hh"
#include "base/misc.hh"
#include "base/types.hh"
template <class T>
class AddrTrie
{
public:
struct Node
{
Addr key;
Addr mask;
bool
matches(Addr addr)
{
return (addr & mask) == key;
}
T *value;
Node *parent;
Node *kids[2];
Node(Addr _key, Addr _mask, T *_val) :
key(_key & _mask), mask(_mask), value(_val),
parent(NULL)
{
kids[0] = NULL;
kids[1] = NULL;
}
void
clear()
{
if (kids[1]) {
kids[1]->clear();
delete kids[1];
kids[1] = NULL;
}
if (kids[0]) {
kids[0]->clear();
delete kids[0];
kids[0] = NULL;
}
}
void
dump(int level)
{
for (int i = 1; i < level; i++) {
cprintf("|");
}
if (level == 0)
cprintf("Root ");
else
cprintf("+ ");
cprintf("(%p, %p, %#X, %#X, %p)\n", parent, this, key, mask, value);
if (kids[0])
kids[0]->dump(level + 1);
if (kids[1])
kids[1]->dump(level + 1);
}
};
protected:
Node head;
Node *lastMatch;
public:
AddrTrie() : head(0, 0, NULL), lastMatch(NULL)
{}
static const Addr MaxBits = sizeof(Addr) * 8;
private:
bool
goesAfter(Node **parent, Node *kid, Addr addr,
Addr new_mask)
{
if (kid && kid->matches(addr) && (kid->mask & new_mask) == kid->mask) {
*parent = kid;
return true;
} else {
return false;
}
}
Addr
extendMask(Addr orig)
{
// Just in case orig was 0.
const Addr msb = ULL(1) << (MaxBits - 1);
return orig | (orig >> 1) | msb;
}
public:
Node *
insert(Addr addr, Addr width, T *val)
{
lastMatch = NULL;
// Build a mask which masks off all the bits we don't care about.
Addr new_mask = ~(Addr)0;
if (width < MaxBits)
new_mask <<= (MaxBits - width);
// Use it to tidy up the address.
addr &= new_mask;
// Walk past all the nodes this new node will be inserted after. They
// can be ignored for the purposes of this function.
Node *node = &head;
while (goesAfter(&node, node->kids[0], addr, new_mask) ||
goesAfter(&node, node->kids[1], addr, new_mask))
{}
assert(node);
Addr cur_mask = node->mask;
// If we're already where the value needs to be...
if (cur_mask == new_mask) {
assert(!node->value);
node->value = val;
return node;
}
for (unsigned int i = 0; i < 2; i++) {
Node *&kid = node->kids[i];
Node *new_node;
if (!kid) {
// No kid. Add a new one.
new_node = new Node(addr, new_mask, val);
new_node->parent = node;
kid = new_node;
return new_node;
}
// Walk down the leg until something doesn't match or we run out
// of bits.
Addr last_mask;
bool done;
do {
last_mask = cur_mask;
cur_mask = extendMask(cur_mask);
done = ((addr & cur_mask) != (kid->key & cur_mask)) ||
last_mask == new_mask;
} while (!done);
cur_mask = last_mask;
// If this isn't the right leg to go down at all, skip it.
if (cur_mask == node->mask)
continue;
// At the point we walked to above, add a new node.
new_node = new Node(addr, cur_mask, NULL);
new_node->parent = node;
kid->parent = new_node;
new_node->kids[0] = kid;
kid = new_node;
// If we ran out of bits, the value goes right here.
if (cur_mask == new_mask) {
new_node->value = val;
return new_node;
}
// Still more bits to deal with, so add a new node for that path.
new_node = new Node(addr, new_mask, val);
new_node->parent = kid;
kid->kids[1] = new_node;
return new_node;
}
panic("Reached the end of the AddrTrie insert function!\n");
return NULL;
}
Node *
lookupNode(Addr addr)
{
if (lastMatch && lastMatch->matches(addr))
return lastMatch;
Node *node = &head;
while (node) {
if (node->value) {
lastMatch = node;
return node;
}
if (node->kids[0] && node->kids[0]->matches(addr))
node = node->kids[0];
else if (node->kids[1] && node->kids[1]->matches(addr))
node = node->kids[1];
else
node = NULL;
}
return NULL;
}
T *
lookup(Addr addr)
{
Node *node = lookupNode(addr);
if (node)
return node->value;
else
return NULL;
}
void
remove(Node *node)
{
lastMatch = NULL;
if (node->kids[1]) {
assert(node->value);
node->value = NULL;
return;
}
if (!node->parent)
panic("AddrTrie: Can't remove root node.\n");
Node *parent = node->parent;
// If there's a kid, fix up it's parent pointer.
if (node->kids[0])
node->kids[0]->parent = parent;
// Figure out which kid we are, and update our parent's pointers.
if (parent->kids[0] == node)
parent->kids[0] = node->kids[0];
else if (parent->kids[1] == node)
parent->kids[1] = node->kids[0];
else
panic("AddrTrie: Inconsistent parent/kid relationship.\n");
// Make sure if the parent only has one kid, it's kid[0].
if (parent->kids[1] && !parent->kids[0]) {
parent->kids[0] = parent->kids[1];
parent->kids[1] = NULL;
}
// If the parent has less than two kids and no cargo and isn't the
// root, delete it too.
if (!parent->kids[1] && !parent->value && parent->parent)
remove(parent);
delete node;
}
void
clear()
{
lastMatch = NULL;
head.clear();
}
void
dump(const char *title)
{
cprintf("**************************************************\n");
cprintf("*** Start of Trie: %s\n", title);
cprintf("*** (parent, me, address, mask, value pointer)\n");
cprintf("**************************************************\n");
head.dump(0);
}
};
#endif
original:
atomic booting:
host_seconds 148.37 #
Real time elapsed on the host
host_seconds 153.58 #
Real time elapsed on the host
host_seconds 159.36 #
Real time elapsed on the host
twolf:
host_seconds 106.17 #
Real time elapsed on the host
host_seconds 104.43 #
Real time elapsed on the host
host_seconds 110.26 #
Real time elapsed on the host
With change version 1:
atomic booting:
host_seconds 128.62 #
Real time elapsed on the host
host_seconds 128.46 #
Real time elapsed on the host
host_seconds 128.05 #
Real time elapsed on the host
twolf:
host_seconds 82.41 #
Real time elapsed on the host
host_seconds 82.55 #
Real time elapsed on the host
host_seconds 83.04 #
Real time elapsed on the host
With change version 2 (FastAlloc):
atomic booting:
host_seconds 129.19 #
Real time elapsed on the host
host_seconds 129.12 #
Real time elapsed on the host
host_seconds 129.65 #
Real time elapsed on the host
twolf:
host_seconds 86.37 #
Real time elapsed on the host
host_seconds 85.99 #
Real time elapsed on the host
host_seconds 85.91 #
Real time elapsed on the host
With change version 3 (tiny cache):
atomic booting:
host_seconds 129.68 #
Real time elapsed on the host
host_seconds 127.70 #
Real time elapsed on the host
host_seconds 127.49 #
Real time elapsed on the host
twolf:
host_seconds 87.10 #
Real time elapsed on the host
host_seconds 87.31 #
Real time elapsed on the host
host_seconds 86.75 #
Real time elapsed on the host
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev