Re: should pure functions accept/deal with shared data?

Artur Skawina Thu, 07 Jun 2012 17:22:13 -0700

On 06/08/12 00:42, Steven Schveighoffer wrote:
> On Thu, 07 Jun 2012 17:36:45 -0400, Artur Skawina <[email protected]> wrote:
> 
>> On 06/07/12 21:55, Steven Schveighoffer wrote:
>>> On Thu, 07 Jun 2012 15:16:20 -0400, Artur Skawina <[email protected]> 
>>> wrote:
>>>
>>>> On 06/07/12 20:29, Steven Schveighoffer wrote:
>>>
>>>>> I'm not proposing disallowing mutable references, just shared references.
>>>>
>>>> I know, but if a D function marked as "pure" takes a mutable ref (which a 
>>>> shared
>>>> one has to be assumed to be), it won't be treated as really pure for 
>>>> optimization
>>>> purposes (yes, i'm deliberately trying to avoid "strong" and "weak").
>>>
>>> However, a mutable pure function can be *inside* an optimizable pure 
>>> function, and the optimizable function can still be optimized.
>>>
>>> A PAS function (pure accepting shared), however, devolves to a mutable pure 
>>> function.  That is, there is zero advantage of having a pure function take 
>>> shared vs. simply mutable TLS.
>>>
>>> There is only one reason to mark a function that does not take all 
>>> immutable or value type arguments as pure -- so it can be called inside a 
>>> strong-pure function.  Otherwise, it's just a normal function, and even 
>>> marked as pure will not be optimized.  You gain nothing else by marking it 
>>> pure.
>>>
>>> So let's look at two cases.  I'll re-state my example, in terms of two 
>>> overloads, one which takes shared int and one which takes just int (both of 
>>> which do the right thing):
>>>
>>> void inc(ref int t) pure;
>>> {
>>>   ++t;
>>> }
>>>
>>> void inc(ref shared(int) t) pure
>>> {
>>>   atomicOp!"++"(t);
>>> }
>>>
>>> Now, let's define a strong-pure function that uses inc:
>>>
>>> int slowAdd(int x, int y) pure
>>> {
>>>    while(y--) inc(x);
>>>    return x;
>>> }
>>>
>>> I think we can both agree that inc *cannot* be optimized away, and that we 
>>> agree slowAdd is *fully pure*.  That is, slowAdd *can* be optimized away, 
>>> even though its call to inc cannot.
>>>
>>> Now, what about a strong-pure function using the second (shared) form?  A 
>>> strong pure function has to have all parameters (and return types) that are 
>>> immutable or implicitly convertable to immutable.
>>>
>>> I'll re-define slowAdd:
>>>
>>> int slowAddShared(int x, int y) pure
>>> {
>>>    shared int sx = x;
>>>    while(y--) inc(sx);
>>>    return sx;
>>> }
>>>
>>> We can agree for the same reason the original slowAdd is strong-pure, 
>>> slowAddShared is strong-pure.
>>>
>>> But what do we gain by being able to declare sx shared?  We can't return it 
>>> as shared, or slowAddShared becomes weak-pure.
>>
>> Actually, *value* return types shouldn't prevent the function from being 
>> pure. But
>> there is not much point in returning them as shared, other than to avoid 
>> explicit
>> casts, something that would better solved with some kind of 'unique' class.
> 
> Right, what I meant was, returning a shared reference.  For example, if a 
> pure function allocated memory and returned it as a shared pointer, that 
> would make it non-optimizable pure (weak pure).
> 
>>> We can't share it while inside slowAddShared, because we have no outlet for 
>>> it, and we cannot access global variables.  In essence, marking sx as 
>>> shared does *nothing*.  In fact, it does worse than nothing -- we now have 
>>> to contend with shared for data that actually is *provably* unshared.  In 
>>> other words, we are wasting cycles doing atomic operations instead of 
>>> straight ops on a shared type.  Not only that, but because there are no 
>>> outlets, declaring *any* data as shared while inside a strong-pure function 
>>> is useless, no matter how we define any PAS functions.
>>>
>>> So if shared is useless inside a strong-pure function, and the only point 
>>> in marking a non-pure-optimizable function as pure is so it can be called 
>>> within a strong-pure function, then pure is useless as an attribute on a 
>>> function that accepts or returns shared data.  *Every case* where you use 
>>> such a function inside a strong-pure function is incorrect.
>>
>> We clearly agree completely; this is exactly what I'm saying in the 
>> paragraph you
>> quoted below. What i'm *also* saying is that the 'incorrectness' of it is 
>> harmless
>> in practice - so I'm not sure that it should be forbidden, and handled 
>> specially
>> (which would be necessary in the inferred-purity cases).
> 
> I have given you an example of where it is harmful.  There is benefit in 
> being able to say "since I marked this function pure, I know I don't have to 
> deal with threading."  It allows you to eliminate possible multi-threading 
> mistakes from whole swaths of code, especial generic code which accepts a 
> myriad of types.
> 
> You know there are a ton of generic functions in phobos that don't check *at 
> all* whether shared data is being given to them?  Simply marking them pure 
> (which should be viable for most functions) would eliminate that worry.


I can see certain generic functions being useful when working with shared data 
too.
Yes, they can be used incorrectly, but I'd expect anybody working with shared to
know what they're doing. I see your point, but I'm not convinced that this 
reason
alone is enough to also disallow legal uses (and, no, I don't think I've ever 
used
'shared' in this way with phobos, nor can I think of "legal" examples right 
now).


>>>> AFAICT you're proposing to forbid something which currently is a NOOP.
>>>
>>> It's not a NOOP, marking something as shared means you need special 
>>> handling.  You can't call most functions or methods with shared data.  And 
>>> if you do handle shared data, it's not just "the same" as unshared data -- 
>>> you need to contend with data races, memory barriers, etc.  Just because 
>>> it's marked shared doesn't mean everything about it is handled.
>>
>> Exactly, see above. That's why you never access "raw" shared data - you 
>> always wrap it.
>> ("access" meaning read and/or write, passing refs around is fine)
>> Problem solved.
> 
> Let's not forget the main benefit of pure -- to allow optimization.  Marking 
> something as optimizable that *can never be* optimized or be a part of *any* 
> optimizable function serves no purpose.

That's why I'm saying it's a NOOP. Forbidding it can avoid certain type of
bugs, yes, but I'd argue that the real cause of these bugs is accessing
shared data in unsafe ways at all - disallowing this /just/ in pure functions
does not seem like much of an improvement.


> Let's not forget a secondary benefit of pure -- dispatchability (probably a 
> better term for this).  If I know there's no shared data involved, I can 
> dispatch a pure function to another worker thread without worry of races, 
> especially a strong-pure function, but it's quite easy to prove validity for 
> a weak-pure function.
> 
> If shared is involved, the second aspect goes out the window.

Hmm. Not necessarily, if shared is done right. But I don't think these types
of optimizations really work in practice, except when explicitly requested
(by annotating the code in some way, so that the compiler knows where
applying them makes sense).


>>>> And the change
>>>> could have consequences for templated functions or lambdas, where "pure" 
>>>> is inferred.
>>>
>>> I would label those as *helpful* and *positive* consequences ;)
>>
>> Are you saying that
>>
>>    auto f(T)(T v) { return v+v; }
>>
>> should be inferred as impure when used with a shared(T), but (weakly) pure
>> otherwise?
> 
> You are saying two different things here...

I meant T to be a reference type, should have been more explicit about that,
sorry.

> f's purity depends on the expression (v + v)'s purity.  And the level of 
> purity (weak or strong) depends on the level of (v + v)'s purity.  IF v + v 
> is strong-pure (such as int + int), then f is strong-pure.  If v + v is 
> weak-pure, f is weak pure.  If v + v is not pure, then f is not pure.  That 
> is how it works today.
> 
> What I'm saying is, shared just shouldn't be allowed to be any part of pure.  
> So if T is defined as shared int, even though it actually makes no sense 
> whatsoever for your example, f will be unpure.

I agree, that is the sane approach. 

Well, at least with the current 'shared' definition, which implies C-style
'volatile' (which i think should be a separate attribute, there are cases 
where it's not necessary).
And a function with no refs in the signature should not be prevented from
being pure, except when it accesses global state, but that's obvious.


> That's another aspect of shared that needs to be addressed -- type inference 
> for shared expressions.
> 
> for instance:
> 
> shared int x, y;
> 
> auto z = x + y;
> 
> What type should z be?  Right now it's shared, but that makes *no* sense, 
> because z is not shared until you share it.  Why should auto opt-in to 
> something it doesn't have to?

Yep, but there is no good solutions to that right now. I know, "polysemous",
but that would *not* solve the problem, at least not w/o the type of 'z'
remaining in that state until it is actually used.
But now we're reinventing 'uniq' again. :)


> Likewise with IFTI, f(x) should probably equate to f!int(x) (in which case it 
> *would* be pure)

Hmm, shared(T) should never implicitly convert to (T), it *is* safe when
T==int, but the loss of type info could cause problems. If you meant f(z),
then yes that would work, but 'z' remaining in that polysemous state would
be even better.


Hmm, the remaining question seems to be whether

   auto f(T)(T v) pure { return v+v; }

should accept a "shared" T or not.

And I actually think that it shouldn't, for any reasonable interpretation
of function purity. Except D's "weak" purity combined with the ill-defined
shared semantics complicates things and makes the answer less obvious.
I'm starting to feel like I'm playing the devil's advocate here. :)


artur

Re: should pure functions accept/deal with shared data?

Reply via email to