[boost] Re: Interest in multiindex_set?(again)

Jeremy Maitin-Shepard Thu, 10 Jul 2003 16:29:57 -0700

Joaquín Mª López Muñoz wrote:

The semantics of std::map (or any other STL associative container) are
not overwriting on insertion: instead, if an equivalent element is found, no
insertion occurs. This is what my library does, too. Maybe you're confusing
here map::operator[] with map::insert.

Yes, I was confused. Your semantics are correct.

There are various advantages to separating key extraction from key
comparison:

1) It allows the use of more generic key extraction and key comparison
functors.  Specifically, for an ordered index, the key comparison
function in many cases would be std::less<KeyType>, and can default to
that.  For "unordered" (hash-table-like) indices, it is necessary, at
least internally, to separate key comparison and key extraction because
the hash function must also operate on the "key" value; otherwise, the
hash function must also be very specialized, when in many cases a
default library-provided hash function can be used (and thus not specified).

A small utility class named less_by is used for precisely this purpose. You can
see it in action in the example accompanying the library. As for hashing, an
analogous device can be written (when hashes get included).


2) It is necessary in order to support certain convenient interfaces;
specifically, I think it is very convenient to allow the syntax:
table[key], find(key), and equal_range(key).  In order for the front-end
to deteremine which index corresponds to the key type, it must "know"
about keys.

find(key) and related search memfuns are already supported in the way
you suggest! Look for "special lookup operations" in the documentation.
Matter of fact, the operations provided are even more powerful in that
they allow for alternative arguments and comparison predicates.

It does not appear, however, that they support automatically detecting the index based on the type of the key. This functionality seems useful, although in general it is only a syntactic convenience.. Specifically, if a table has two indices, one with a key type of int, the other with key type of std::string, with this functionality, the following syntax could be used: (note that the operator [] I show here could not have the semantics of std::map; rather it would need to throw an exception or have undefined behavior if an existing element is not found)

table["somekey"].whatever;

The above statement would use the std::string index, because the type of the argument (char const *) is convertible to std::string.

table[3].whatever

The above statement would use the int index, because the type of the argument is int.

A similar syntax could be used for find, equal_range, count, erase. Additionally, something like the following could be supported, where table_type is the type of the table:

table_type::index_with_key<std::string>::iterator it = table.find("something");

Note that you provide a version of equal_range that is not provided by std::map or set, namely, a two-argument version, with the semantics of returning the range between the two arguments. It is not possible to implement this for a hash table index. Under a policy-based interface, this could be supported by the "view" of an index that is a binary search tree, while not being supported by the table class itself.

Similarly, finding using a weaker ordering predicate is only supported by binary-search-tree-based indices, and thus it seems this should be supported on a per-index basis only.

Even if it is decided that automatically detecting the index based on the key type is not useful, I think it is advantageous to decouple key extraction from key comparison.

Couldn't a `composite index' be supported by a key which is a tuple of
some number of references?

I guess you're taking inspiration here from your own library's design. This
implementation of composite indices simply doesn't fit here. You can consider
things are "the other way around" with respect to indices and dependent keys:
every index is composite. Those that happen to depend on a single member
of the element can be conveniently built by resorting to less_by. I'm
not sure I'm being clear enough. If not, please tell me so.

On a side note, taking composite indices as indices over a tuple of
members does not cover the full spectrum of what an index can do. Imagine an
index whose associated comparison predicate depend on the result of an
external function, say

int f(const element& e);

f() won't accept a tuple of members of element, but the actual element. To make
it fit one would be forced to construct a new element on the fly out of the
provided parts.

Although you have not specified the semantics of f(), it seems that f() is effectively a key extractor, and the key type would be int. In order to use this with your current system of integrating key extractor with key comparison, an additional functor would be needed that effectively composed this function f() with some comparison predicate. In order to allow the user to use an int value (rather than an element value) as a key, this composition functor would also have to provide a number of overloads some of which bypass this key extraction functor for one of the arguments. For a hash-table-based index, it becomes even more complicated, because a second "composition" functor must be provided that deals with hash codes.

In constrast, if key extractors are supported by the table class itself, in most cases only the function f must be specified, and default key comparison and hash functions (if it is a hash-table-based index) can be used. Decoupling the key extractor from key comparison (and hash functions) simply seems more logical, because I think in the far majority of cases they are decoupled.

It is hard for me to imagine a case where there is no underlying key type _but_ the comparison predicate of the index supports comparison between the element type and some other type.

There's a difference. With update(), if a collision happens the changed
element is kept with its original value. With modify(), it is erased.
BTW I like the names "update" and "modify" in that they suggest the
way the different memfuns work: Somehow, "modify" seems more
intrusive, which precisely modify() is. What do you think about it?

Okay, this seems reasonable.

---
Jeremy Maitin-Shepard

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

[boost] Re: Interest in multiindex_set?(again)

Reply via email to