----- Original Message ----- From: "William E. Kempf" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, December 12, 2002 7:41 PM Subject: Re: [boost] Re: Formal review: Optional library
> [snipped] > > To be honest, I'm wondering if the pointer semantics should just be > dropped. Here's the rationale you provide for pointer semantics: > > [snipped] > > template <typename T> > class optional > { > public: > optional(); > explicit optional(T const& value); // Does it need to be explicit? > optional(optional const& other); > template <typename U> > optional(optional<U> const& other); > > operator safe-bool(); > T& ref(); > T const& ref() const; > }; > > template <typename T> > bool operator==(optional<T> const& opt1, optional<T> const& opt2); > > template <typename T> > bool operator!=(optional<T> const& opt1, optional<T> const& opt2); > > William E. Kempf > OK. Let's consider the whole thing all over again from scratch. I really prefer to have the library rejected if it is not really well thought out; we can work out the standing isssues and review it again later. But we're half way on the review period so there is a chance that we can come out with a sound design at this point. But if we don't, feel free to reject it. (And I don't mind ending up with a totally different beast either) So, after having carefully considered all the arguments raised so far, here is my re-conceptualization of optional<>. optional<>'s purpose is to wrap a possibly non existent value. >From this definition, it is clear that optional is a value-based container. It has clearly deep-copy semantics, which are clearly _not_ pointer semantics. So, the first important concept is that optional is a container. This implies that at least the following part of the interface is IMO clearly correct: optional<T>::optional( T const& v ) Contains a _copy_ of the value 'v'. Uses T::T( T const& ) optional<T>::~optional(); Destroys its _copy_ of the contained value. Uses T::~T(); optional<T>::reset ( T const& v ) Replaced the contained value, if any, with a _copy_ of the new value. Possibly uses T::~T() And uses T::T( T const& ) optional<T>::reset(); Removes the contained value. Uses T::~T(); Being a container, and being its purpose to model the possible nonexistence of the contained value, there are two options: One is to model the nonexisting case as noncontainment. That is, optional, as a container, can have either one element (the value) or none. Let's call this the 1-Element-Sequence model. Here, optional<T> is just like a (stack based) vector<T> with a fixed capacity of 1 and a varying size of at most 1, so: optional<T>::optional(); Contains no copy of any value. Size = 0. For the other possible model the nonexisting case can be given by the containment of an object of a type other than T. This is the model followed by variant<T,nil_t> and a Hashkell's-like may_be<T>. With this model, the size of the container is always one, just as its capacity, and the nonexistence of the value is given explicitely by a special type nil_t. Let's call this the Union model. Here, optional<T> is a discriminated union of T | nil_t, just like variant<T,nil_t>, so: optional<T>::optional(); Contains an instance of the special type nil_t. Size = 1. Hre comes the first design choice: Should we model optional<> as a 1-Element-Sequence or as a Union of T | nil_t. I find more convenient for optional purposes the Sequence model because it doesn't require a explicit special type to indicate a nonexisting value. Notice that any stack_based_vector<T> perfectly fits the Sequence model as used for optional<T>, therefore, it has _exactly_ the required semantics. (optional<T> would use just a subset, though) Whatever the model we choose, the semantics of the relational operations will be well defined, so we would have: bool operator == ( optional<T>, optional<T> ) ; bool operator != ( optional<T>, optional<T> ) ; Sequence: Compare sizes(), and if they match, compare contained values. Union: Compare values. In both cases the effects are the same, so both models are equivalent on this respect. --CHECKPOINT-- Do we agree this far? Now, let's consider value access on the presence of possibly nonexisting values. Under the 1-Element-Sequence model: 1) The sequence might have 0 elements, so it would be undefined to access its containee. 2) The sequence might have 1 elements, so it is perfectly defined to access its containee, and since the container is value-based a reference to the value can be obtained. vector's interface has size() and front(), but since this sequence is restricted to at most 1 element, we can choose another more convenient spelling, such as: bool optional<T>::empty() T const& optional<T>::ref() const ; T& optional<T>::ref() ; Under the Union model: The union always has an instantiated value, so we either (1) let the user to know about this value explicitely: // no empty() T const& / nil_t optional<T>::value() const; T& / nil_t optional<T>::value() ; (2) hide the fact the the contained value might be nil_t and provide a 1-Element-Sequence interface. I find significantly better for optional<> purposes the 1-Element-Sequence interface, which means that I prefer the Sequence model to the Union model (which doesn't mean that a union couldn't be used for the implementation) Suppose we go on with a 1-Element-Sequence model: Since the sequence contains at most one element, and since ref() is already undefined for the case the sequence is empty(), one possibility would to replace ref() by operator T&() provided the initializing constructor is explicit and empty() is not replaced by a conversion to bool (safe or not). bool optional<T>::empty() ; operator T const&() const ; operator T& () ; This would look on the user code as: optional<int> foo() { return optional<int>(3) ; } void bar() { optional<int> f = foo(); if ( !f.empty() ) { bar(f); f = 2 ; ... if ( f == 2 ) gee(f); } } This interface gives optional<> an almost strict value semantics. It would almost look like an ordinary T variable, with the only exception being the explicit empty() call. AFAICT, this interface is completely consistent and provides completely sound semantics. But there is a problem... And the only way for me to show you the problem is to ask you to trust the fact that I've been using different versions of optional<> for more than two years on dozens of different projects and I had been caught already by most of the traps that some interfaces hide. {I've just counted 436 appearances of the word 'optional<' in my code base} I originally used _a lot_ the interface shown above. The problem with this interface is precisely the fact that it shows ordinary value semantics. IOWs, is the fact that it really looks like an ordinary variable were the trap is hidden. It is too easy for a dumb programmer like me to forget about testing the initialization state. IOWs, when you see code like this: opt = 2 ; int n = opt ; if ( opt == 3 ) foo(opt) etc... it is hard, or at least it was hard for me, to keep in mind that unlike ordinary values, 'opt' might have no value at all. That is, I used to loose track of the fact that all the expressions above can have undefined behavior; they look too familiar for me to remember that; and the _declaration_ of opt is usually not near enough to account for the lack of context. As a side note, notice that that empty() must remain explicit if operator T&() is used instead of ::ref(), otherwise, expressions like if ( opt ) or ( opt == 0 ) would be ambiguous. I believe that is is always important when syntax alone comunicate meaning. This is were the 'operator T&()' interface fails. And I _had_ been caugth by this so much as to go through the trouble of changing it. Fortunately, a possible remedy to the problem of lack of context teeling that expressions like: (opt = 2 ), or ( int x = opt), or (opt == 2 ) can be undefined, is to keep the explicit ref() member function as William suggested. This way, the expressions would look like: (opt.ref() = 2 ), or ( int x = opt.ref()), or (opt.ref() == 2 ) and at least won't look decievenly familar. Therefore, I can see that an interface with a explicit member function used to access the possibly uninitialized value is pretty sound. Suppose we go on with an explicit member function as the value access method. How about safe_bool() as a replacement for empty()? Safe_bool will allow the familiar idioms: if ( opt ) if ( !opt ) if ( opt == 0 ) if ( opt != 0 ) The first form, which is probably the most used, comunicates exactly what it means. But the second form, using operator ==, appears to me confusing for value-semantic objects because a comparison against 0 as indication of uninitialization is typical of pointers but not of containers. No container-like object that I know of uses that idiom to indicate emptyness() so I believe it would we wrong to have it in this sort of interface As a conclusion, I agree that it is possible to model optional<> entirely as a 1-element-sequence provided that its interface uses explicit member funcions for both value access and emptyness testing. I notice in particular that this interface allows for a consistent definition of relational operators. --CHECKPOINT-- Do we agree this far? The quest for a pointer-like interface: Let's get back to the point I realized that: if ( opt == 3 ) int x = opt ; opt = 1 ; opt.foo(); had a syntax too familiar with non-optional values, and hidden the fact the any of the expressions above would be undefined had 'opt' been uninitialized. William's interface do not show this problem because the syntax is not familiar. Back then, I wondered which familiar syntax conveys the precise notion that the expressions above can be undefined unless the 'initialized' or 'value-exist' condition is met. I've noticed that pointers do just that. When you see: if ( *opt == 3 ) int x = *opt ; *opt = 1 ; opt->foo(); it is clear by syntax alone that 'opt' _must_ refer to an existing value and that this might not be the case so you know that a test for this condition must be present somewhere before the expressions are reached. If we think about this per see, we should agree that this interface is even more comunicative than the .ref() interface: if ( opt.ref() == 3 ) int x = opt.ref() ; opt.ref() = 1 ; opt.ref().foo(); The above indicates that there is something 'peculiar' with 'opt', but it doesn't comunicate what. OTOH, the pointer-like interface is self-evident. I concluded then, and I still think, that for the possible unexisting value access operation, operator*() and ->() allows for a very familiar and comunicative syntax which expresses _exactly_ the semantic of the operation. It is not just a minor convenience IMO. And to be honest, the fact that this interface allows optional<> to be replaced by a pointer is not the reason why I adopted it, and having though about droping it, I realized that it is not even the reason why the pointer interface is really good. It is really good because it comunicates _clearly_ and _each time_ you access an optional value that the operation will be undefined unless the initialized condition is met. For dumb programmers like me, this is a very important feature of the interface; and I know this based on experience, since I stop being caught often by uninitialized optionals<> once I adopted it. Anyway, I concede that the pointer interface is really just a visual aid. And of course, smart programmers don't need this aid, and can safely deal with whatever consistent interface you gave them. But I'd like to stress out that this particular aid is significantly important. Anyway, I notice that _only_ operator *() and ->(), which are used to access the value, are the part of the pointer-like interface that I consider really important. operator safe_bool() is just handy and it is consistent with optional<> (a value-based container) even though it allows expressions of the form: if ( opt == 0 ) just because it goes along with the pointer-like interface. Because this expression (with the intended meaning) is used with pointers. I think that if the pointer-like interface is dropped, this should be dropped too. ------- The problem with the pointer-like interface. Mostly thanks to William Kempf, we've came to realize that the pointer-like interface makes optional<> appear like a pointer, while it is not, and that this false appearance evidences itself when we face the fact that the relational operators that are perfectly defined for optional<> per see, since it is a value-based container, conflicts with its pointer-like interface. Thinking about this problem we realize that operators *() and ->() shape optional<> into a pointer-like thing even though it isn't. It is clear to me after much discussion that optional<> is not a pointer at all but a container. OTOH, it is true that pointers can either be null or point to a valid object, so it is true that there is some analogy between a pointer and an optional<>. This analogy is what makes operators *() and ->() viable options. However, optional<> should not be considered a pointer just like an iterator should not be considered a pointer either. The semantics of value access for optional<> is clearly defined, thus the following are just three different syntactic variants of the same: (1) (opt = 2), ( 3 == opt), ( x = opt ) (2) (opt.ref() = 2), ( 3 == opt.ref()), ( x = opt.ref()) (3) (*opt = 2), ( 3 == *opt), ( x = *opt) I argued that (1) is problematic (2) it only lacks self-evident syntax. (3) is ideal (in itself). However, operator*() should be seen on optional as a visual aid and not as implying that optional<> is a pointer. OTOH, I came to figure that resembling a pointer w.r.t to value access is a good idea because it is true that pointers do model perfectly well the notion of optional pointees. IOWs, there exist the well defined concept of OptionalValue as Augustus shown. This model is well defined by a part of pointers interface: operator*, operator-> and conversion to bool. And pointers have historically been used to deal with optional objects precisely because they model this concept. Therefore, it make sense to me to have optional<>, a container with value semantics, models the OtionalValue concept, provided that it's clearly understood (and documented) that it is a container with value semantics and not a pointer. Of course, having a value-semantic container model a concept which is defined via a pointer-like interface has the drawback that it makes the appearance that it is a pointer. This is an important drawback which definitely justifies reconsidering to which extent it is convenient to model the OptionalValue concept in optional<>. A pro of modeling this concept is that allows optionals<> to be interchanged with true pointers; but I concede that this pro is not really that much important. A con is that it could make the programmer believe it is a pointer. Since it is fundamental that optional<> is not mistaken to be a pointer, but rather a value-based container modeling the OptionalValue concept, proper documentation is not enough. It is required that the interface of optional<> that looks like a pointer follow _exactly_ pointer semantics. This way, if it comes to ocurr that a programmer did think that optional<> is a pointer, the code he wrote won't behave abnormally. A way to measure this potential problem is to consider what would happen if optional<T> were actually be replaced by T* With the current definition of the OptionalValue concept, the code will have _exactly_ the same effect, so the potential danger of mistakenly consider optional<T> as a pointer is conceptual but not practical. However, and very unfortunately, this _requires_ the properly well defined relational operators to be disallowed, because they can effectively create practical problems if optional is mistaken for a pointer and used, for example, to test for aliased equivalence as you do when you compare pointers. As a general conclusion, the final decision boils down, as Augustus said, to weight the convenience of this syntax: if ( opt ) // or if ( opt != 0 ) { ... ... if ( opt2 && *opt == *opt2 ) foo(*opt) } else { ... ... opt.reset(3); } against the convenience of the more conceptually consistent: if ( !opt.empty() ) { ... ... if ( opt == opt2 ) foo(opt.ref()) } else { ... ... opt.reset(3); } My experience shows that comparions of optionals is _extemely_ less frequent than testing for initalization and accessing the value; so I really prefer the pointer-like interface with poisonned relops. Provided the documentation is reworked to clearly state that optional<> is a value-based container and not a pointer. Anyway, the viable choices are, AFAICT: (A) Strict container-like interface: optional () ; explicit optional ( T const& v ) ; optional ( optional const& rhs ) ; template<class U> optional ( optional<U> const& rhs ) ; optional& operator = ( optional const& rhs ) ; template<class U> optional& operator = ( optional<U> const& rhs ) ; bool empty() const ; T const& ref() const ; T& ref() ; friend void swap ( optional<T>& x, optional<T>& y ) ; bool operator == ( optional<T> const& x, optional<T> const& y ) ; bool operator != ( optional<T> const& x, optional<T> const& y ) ; (A) Pointer-like interface to model OptionalValue concept: optional () ; explicit optional ( T const& v ) ; optional ( optional const& rhs ) ; template<class U> optional ( optional<U> const& rhs ) ; optional& operator = ( optional const& rhs ) ; template<class U> optional& operator = ( optional<U> const& rhs ) ; T const* operator -> () const ; T* operator -> () ; T const& operator* () const ; T& operator* () ; T const* get() const ; T* get() ; operator safe_bool() const ; friend void swap ( optional<T>& x, optional<T>& y ) ; Fernando Cacciola _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost