Attached are a few supplemental tidbits to go along with this change.
The first is a file called trietimes.txt which shows the before and
after of this change performance wise using the atomic CPU to boot linux
and to run the twolf regression. I chose the atomic CPU because it would
emphasize the impact of improving address translation. For booting,
simulator performance improved by a little more than 16.5%, and for
twolf a little more than 22.7%. I'm guessing twolf was a little better
because SE doesn't muck with devices and translation is an even bigger
part of what it does.

I also tried a couple variations, one where I used FastAlloc for the
internal Node struct used in the trie, and one where I cached the last
successful lookup in the trie. Whenever the structure of the trie
changed, I threw away the cache. In both cases, performance was similar
but slightly worse. I was surprised especially that FastAlloc didn't
help, but maybe glibc's malloc does a really good job with small objects
now? The addrtrie.hh from both of these are attached for reference.

I don't know if this class will give a similar performance boost to
other ISAs or if x86's translation was just particularly stinky before.
I expect it probably will help a little bit since a trie is such a well
suited data structure for this sort of thing, but it's hard to say.

Gabe

On 04/08/12 01:02, Gabe Black wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1143/
> -----------------------------------------------------------
>
> Review request for Default.
>
>
> Description
> -------
>
> Changeset 8945:f40e80105a03
> ---------------------------
> X86: Use the AddrTrie class to implement the TLB.
>
> This change also adjusts the TlbEntry class so that it stores the number of
> address bits wide a page is rather than its size in bytes. In other words,
> instead of storing 4K for a 4K page, it stores 12. 12 is easy to turn into 4K,
> but it's a little harder going the other way.
>
>
> Diffs
> -----
>
>   src/arch/x86/pagetable.hh a47fd7c2d44e 
>   src/arch/x86/pagetable.cc a47fd7c2d44e 
>   src/arch/x86/pagetable_walker.hh a47fd7c2d44e 
>   src/arch/x86/pagetable_walker.cc a47fd7c2d44e 
>   src/arch/x86/tlb.hh a47fd7c2d44e 
>   src/arch/x86/tlb.cc a47fd7c2d44e 
>   src/arch/x86/vtophys.cc a47fd7c2d44e 
>
> Diff: http://reviews.gem5.org/r/1143/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Gabe Black
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

/*
 * Copyright (c) 2012 Google
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are
 * met: redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer;
 * redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution;
 * neither the name of the copyright holders nor the names of its
 * contributors may be used to endorse or promote products derived from
 * this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * Authors: Gabe Black
 */

#ifndef __SIM_ADDRTRIE_HH__
#define __SIM_ADDRTRIE_HH__

#include "base/cprintf.hh"
#include "base/fast_alloc.hh"
#include "base/misc.hh"
#include "base/types.hh"

template <class T>
class AddrTrie
{
  public:
    struct Node : public FastAlloc
    {
        Addr key;
        Addr mask;

        bool
        matches(Addr addr)
        {
            return (addr & mask) == key;
        }

        T *value;

        Node *parent;
        Node *kids[2];

        Node(Addr _key, Addr _mask, T *_val) :
            key(_key & _mask), mask(_mask), value(_val),
            parent(NULL)
        {
            kids[0] = NULL;
            kids[1] = NULL;
        }

        void
        clear()
        {
            if (kids[1]) {
                kids[1]->clear();
                delete kids[1];
                kids[1] = NULL;
            }
            if (kids[0]) {
                kids[0]->clear();
                delete kids[0];
                kids[0] = NULL;
            }
        }

        void
        dump(int level)
        {
            for (int i = 1; i < level; i++) {
                cprintf("|");
            }
            if (level == 0)
                cprintf("Root ");
            else
                cprintf("+ ");
            cprintf("(%p, %p, %#X, %#X, %p)\n", parent, this, key, mask, value);
            if (kids[0])
                kids[0]->dump(level + 1);
            if (kids[1])
                kids[1]->dump(level + 1);
        }
    };

  protected:
    Node head;

  public:
    AddrTrie() : head(0, 0, NULL)
    {}

    static const Addr MaxBits = sizeof(Addr) * 8;

  private:
    bool
    goesAfter(Node **parent, Node *kid, Addr addr,
            Addr new_mask)
    {
        if (kid && kid->matches(addr) && (kid->mask & new_mask) == kid->mask) {
            *parent = kid;
            return true;
        } else {
            return false;
        }
    }

    Addr
    extendMask(Addr orig)
    {
        // Just in case orig was 0.
        const Addr msb = ULL(1) << (MaxBits - 1);
        return orig | (orig >> 1) | msb;
    }

  public:
    Node *
    insert(Addr addr, Addr width, T *val)
    {
        // Build a mask which masks off all the bits we don't care about.
        Addr new_mask = ~(Addr)0;
        if (width < MaxBits)
            new_mask <<= (MaxBits - width);
        // Use it to tidy up the address.
        addr &= new_mask;

        // Walk past all the nodes this new node will be inserted after. They
        // can be ignored for the purposes of this function.
        Node *node = &head;
        while (goesAfter(&node, node->kids[0], addr, new_mask) ||
                goesAfter(&node, node->kids[1], addr, new_mask))
        {}
        assert(node);

        Addr cur_mask = node->mask;
        // If we're already where the value needs to be...
        if (cur_mask == new_mask) {
            assert(!node->value);
            node->value = val;
            return node;
        }

        for (unsigned int i = 0; i < 2; i++) {
            Node *&kid = node->kids[i];
            Node *new_node;
            if (!kid) {
                // No kid. Add a new one.
                new_node = new Node(addr, new_mask, val);
                new_node->parent = node;
                kid = new_node;
                return new_node;
            }

            // Walk down the leg until something doesn't match or we run out
            // of bits.
            Addr last_mask;
            bool done;
            do {
                last_mask = cur_mask;
                cur_mask = extendMask(cur_mask);
                done = ((addr & cur_mask) != (kid->key & cur_mask)) ||
                    last_mask == new_mask;
            } while (!done);
            cur_mask = last_mask;

            // If this isn't the right leg to go down at all, skip it.
            if (cur_mask == node->mask)
                continue;

            // At the point we walked to above, add a new node.
            new_node = new Node(addr, cur_mask, NULL);
            new_node->parent = node;
            kid->parent = new_node;
            new_node->kids[0] = kid;
            kid = new_node;

            // If we ran out of bits, the value goes right here.
            if (cur_mask == new_mask) {
                new_node->value = val;
                return new_node;
            }

            // Still more bits to deal with, so add a new node for that path.
            new_node = new Node(addr, new_mask, val);
            new_node->parent = kid;
            kid->kids[1] = new_node;
            return new_node;
        }

        panic("Reached the end of the AddrTrie insert function!\n");
        return NULL;
    }

    Node *
    lookupNode(Addr addr)
    {
        Node *node = &head;
        while (node) {
            if (node->value)
                return node;

            if (node->kids[0] && node->kids[0]->matches(addr))
                node = node->kids[0];
            else if (node->kids[1] && node->kids[1]->matches(addr))
                node = node->kids[1];
            else
                node = NULL;
        }

        return NULL;
    }

    T *
    lookup(Addr addr)
    {
        Node *node = lookupNode(addr);
        if (node)
            return node->value;
        else
            return NULL;
    }

    void
    remove(Node *node)
    {
        if (node->kids[1]) {
            assert(node->value);
            node->value = NULL;
            return;
        }
        if (!node->parent)
            panic("AddrTrie: Can't remove root node.\n");

        Node *parent = node->parent;

        // If there's a kid, fix up it's parent pointer.
        if (node->kids[0])
            node->kids[0]->parent = parent;
        // Figure out which kid we are, and update our parent's pointers.
        if (parent->kids[0] == node)
            parent->kids[0] = node->kids[0];
        else if (parent->kids[1] == node)
            parent->kids[1] = node->kids[0];
        else
            panic("AddrTrie: Inconsistent parent/kid relationship.\n");
        // Make sure if the parent only has one kid, it's kid[0].
        if (parent->kids[1] && !parent->kids[0]) {
            parent->kids[0] = parent->kids[1];
            parent->kids[1] = NULL;
        }

        // If the parent has less than two kids and no cargo and isn't the
        // root, delete it too.
        if (!parent->kids[1] && !parent->value && parent->parent)
            remove(parent);
        delete node;
    }

    void
    clear()
    {
        head.clear();
    }

    void
    dump(const char *title)
    {
        cprintf("**************************************************\n");
        cprintf("*** Start of Trie: %s\n", title);
        cprintf("*** (parent, me, address, mask, value pointer)\n");
        cprintf("**************************************************\n");
        head.dump(0);
    }
};

#endif
/*
 * Copyright (c) 2012 Google
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are
 * met: redistributions of source code must retain the above copyright
 * notice, this list of conditions and the following disclaimer;
 * redistributions in binary form must reproduce the above copyright
 * notice, this list of conditions and the following disclaimer in the
 * documentation and/or other materials provided with the distribution;
 * neither the name of the copyright holders nor the names of its
 * contributors may be used to endorse or promote products derived from
 * this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * Authors: Gabe Black
 */

#ifndef __SIM_ADDRTRIE_HH__
#define __SIM_ADDRTRIE_HH__

#include "base/cprintf.hh"
#include "base/misc.hh"
#include "base/types.hh"

template <class T>
class AddrTrie
{
  public:
    struct Node
    {
        Addr key;
        Addr mask;

        bool
        matches(Addr addr)
        {
            return (addr & mask) == key;
        }

        T *value;

        Node *parent;
        Node *kids[2];

        Node(Addr _key, Addr _mask, T *_val) :
            key(_key & _mask), mask(_mask), value(_val),
            parent(NULL)
        {
            kids[0] = NULL;
            kids[1] = NULL;
        }

        void
        clear()
        {
            if (kids[1]) {
                kids[1]->clear();
                delete kids[1];
                kids[1] = NULL;
            }
            if (kids[0]) {
                kids[0]->clear();
                delete kids[0];
                kids[0] = NULL;
            }
        }

        void
        dump(int level)
        {
            for (int i = 1; i < level; i++) {
                cprintf("|");
            }
            if (level == 0)
                cprintf("Root ");
            else
                cprintf("+ ");
            cprintf("(%p, %p, %#X, %#X, %p)\n", parent, this, key, mask, value);
            if (kids[0])
                kids[0]->dump(level + 1);
            if (kids[1])
                kids[1]->dump(level + 1);
        }
    };

  protected:
    Node head;
    Node *lastMatch;

  public:
    AddrTrie() : head(0, 0, NULL), lastMatch(NULL)
    {}

    static const Addr MaxBits = sizeof(Addr) * 8;

  private:
    bool
    goesAfter(Node **parent, Node *kid, Addr addr,
            Addr new_mask)
    {
        if (kid && kid->matches(addr) && (kid->mask & new_mask) == kid->mask) {
            *parent = kid;
            return true;
        } else {
            return false;
        }
    }

    Addr
    extendMask(Addr orig)
    {
        // Just in case orig was 0.
        const Addr msb = ULL(1) << (MaxBits - 1);
        return orig | (orig >> 1) | msb;
    }

  public:
    Node *
    insert(Addr addr, Addr width, T *val)
    {
        lastMatch = NULL;

        // Build a mask which masks off all the bits we don't care about.
        Addr new_mask = ~(Addr)0;
        if (width < MaxBits)
            new_mask <<= (MaxBits - width);
        // Use it to tidy up the address.
        addr &= new_mask;

        // Walk past all the nodes this new node will be inserted after. They
        // can be ignored for the purposes of this function.
        Node *node = &head;
        while (goesAfter(&node, node->kids[0], addr, new_mask) ||
                goesAfter(&node, node->kids[1], addr, new_mask))
        {}
        assert(node);

        Addr cur_mask = node->mask;
        // If we're already where the value needs to be...
        if (cur_mask == new_mask) {
            assert(!node->value);
            node->value = val;
            return node;
        }

        for (unsigned int i = 0; i < 2; i++) {
            Node *&kid = node->kids[i];
            Node *new_node;
            if (!kid) {
                // No kid. Add a new one.
                new_node = new Node(addr, new_mask, val);
                new_node->parent = node;
                kid = new_node;
                return new_node;
            }

            // Walk down the leg until something doesn't match or we run out
            // of bits.
            Addr last_mask;
            bool done;
            do {
                last_mask = cur_mask;
                cur_mask = extendMask(cur_mask);
                done = ((addr & cur_mask) != (kid->key & cur_mask)) ||
                    last_mask == new_mask;
            } while (!done);
            cur_mask = last_mask;

            // If this isn't the right leg to go down at all, skip it.
            if (cur_mask == node->mask)
                continue;

            // At the point we walked to above, add a new node.
            new_node = new Node(addr, cur_mask, NULL);
            new_node->parent = node;
            kid->parent = new_node;
            new_node->kids[0] = kid;
            kid = new_node;

            // If we ran out of bits, the value goes right here.
            if (cur_mask == new_mask) {
                new_node->value = val;
                return new_node;
            }

            // Still more bits to deal with, so add a new node for that path.
            new_node = new Node(addr, new_mask, val);
            new_node->parent = kid;
            kid->kids[1] = new_node;
            return new_node;
        }

        panic("Reached the end of the AddrTrie insert function!\n");
        return NULL;
    }

    Node *
    lookupNode(Addr addr)
    {
        if (lastMatch && lastMatch->matches(addr))
            return lastMatch;

        Node *node = &head;
        while (node) {
            if (node->value) {
                lastMatch = node;
                return node;
            }

            if (node->kids[0] && node->kids[0]->matches(addr))
                node = node->kids[0];
            else if (node->kids[1] && node->kids[1]->matches(addr))
                node = node->kids[1];
            else
                node = NULL;
        }

        return NULL;
    }

    T *
    lookup(Addr addr)
    {
        Node *node = lookupNode(addr);
        if (node)
            return node->value;
        else
            return NULL;
    }

    void
    remove(Node *node)
    {
        lastMatch = NULL;

        if (node->kids[1]) {
            assert(node->value);
            node->value = NULL;
            return;
        }
        if (!node->parent)
            panic("AddrTrie: Can't remove root node.\n");

        Node *parent = node->parent;

        // If there's a kid, fix up it's parent pointer.
        if (node->kids[0])
            node->kids[0]->parent = parent;
        // Figure out which kid we are, and update our parent's pointers.
        if (parent->kids[0] == node)
            parent->kids[0] = node->kids[0];
        else if (parent->kids[1] == node)
            parent->kids[1] = node->kids[0];
        else
            panic("AddrTrie: Inconsistent parent/kid relationship.\n");
        // Make sure if the parent only has one kid, it's kid[0].
        if (parent->kids[1] && !parent->kids[0]) {
            parent->kids[0] = parent->kids[1];
            parent->kids[1] = NULL;
        }

        // If the parent has less than two kids and no cargo and isn't the
        // root, delete it too.
        if (!parent->kids[1] && !parent->value && parent->parent)
            remove(parent);
        delete node;
    }

    void
    clear()
    {
        lastMatch = NULL;
        head.clear();
    }

    void
    dump(const char *title)
    {
        cprintf("**************************************************\n");
        cprintf("*** Start of Trie: %s\n", title);
        cprintf("*** (parent, me, address, mask, value pointer)\n");
        cprintf("**************************************************\n");
        head.dump(0);
    }
};

#endif
original:

atomic booting:
host_seconds                                   148.37                       # 
Real time elapsed on the host
host_seconds                                   153.58                       # 
Real time elapsed on the host
host_seconds                                   159.36                       # 
Real time elapsed on the host

twolf:
host_seconds                                   106.17                       # 
Real time elapsed on the host
host_seconds                                   104.43                       # 
Real time elapsed on the host
host_seconds                                   110.26                       # 
Real time elapsed on the host



With change version 1:

atomic booting:
host_seconds                                   128.62                       # 
Real time elapsed on the host
host_seconds                                   128.46                       # 
Real time elapsed on the host
host_seconds                                   128.05                       # 
Real time elapsed on the host

twolf:
host_seconds                                    82.41                       # 
Real time elapsed on the host
host_seconds                                    82.55                       # 
Real time elapsed on the host
host_seconds                                    83.04                       # 
Real time elapsed on the host



With change version 2 (FastAlloc):

atomic booting:
host_seconds                                   129.19                       # 
Real time elapsed on the host
host_seconds                                   129.12                       # 
Real time elapsed on the host
host_seconds                                   129.65                       # 
Real time elapsed on the host

twolf:
host_seconds                                    86.37                       # 
Real time elapsed on the host
host_seconds                                    85.99                       # 
Real time elapsed on the host
host_seconds                                    85.91                       # 
Real time elapsed on the host



With change version 3 (tiny cache):

atomic booting:
host_seconds                                   129.68                       # 
Real time elapsed on the host
host_seconds                                   127.70                       # 
Real time elapsed on the host
host_seconds                                   127.49                       # 
Real time elapsed on the host

twolf:
host_seconds                                    87.10                       # 
Real time elapsed on the host
host_seconds                                    87.31                       # 
Real time elapsed on the host
host_seconds                                    86.75                       # 
Real time elapsed on the host
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to