Re: [julia-users] Re: needless loss of performance: immutable conflates two concepts

Stefan Karpinski Fri, 22 Aug 2014 06:53:24 -0700

To me the appeal of annotating arguments as read-only (in) or write-only
(out) is to catch subtle programmer errors. The classic one is that you are
writing a function that is intended not to modify some argument, but you
unwittingly call a function that mutates it. If the compiler can also
leverage that kind of annotation and analysis to improve its ability to
optimized code, great, but I don't think that's the main reason for such a
feature. As Jameson pointed out, the compiler can reason about that sort of
thing pretty well for itself.


I'm not against this sort of check – I've made those mistakes myself and
one particular one was among the trickiest bugs I've ever tracked down. But
it doesn't fit terribly well with Julia's approach to types and type
checking. We don't do static checking, which is ideally how you'd want this
to work. But adding immutable wrappers for all the mutable types in the
system and then adding the corresponding methods for both just strikes me
as way too much duplication and complexity.

Maybe we'll come up with a cleaner way to do it. Sometimes these things
just need a good long time to think about and then voila! some bright idea
emerges.


On Fri, Aug 22, 2014 at 12:32 AM, <[email protected]> wrote:

> Jameson,
>
> I think it is easy to propose an example in which no-aliasing information
> could help the compiler.  Consider the classic dense matrix-multiply C=A*B
> implemented as matmul(A,B,C).  HPC experts have studied this example to
> death and have come up with all kinds of
> loop-reordering/blocking/data-distribution/parallelization strategies that
> improve performance by orders of magnitude.  However, most of these
> optimizations are invalid (i.e., they change the answer) if the user
> invokes the routine as matmul(A,B,A).
>
> Now you could argue correctly that matmul(A,B,A) is probably not going to
> work, i.e., it will not correspond to the mathematical operation A:=A*B,
> even if the exact original loop ordering is preserved,  and that the
> programmer must be a fool for making such an invocation.
>
> However, the compiler does not know that the routine is not meant to be
> invoked as matmul(A,B,A).  If it cannot make a no-aliasing assumption, then
> it must allow for matmul(A,B,A), and so it  cannot carry out all the fancy
> loop reordering even for the correct case when C is not identical to A or B.
>
> I cannot provide timing tests on this example because I don't have a
> compiler that can do all the loop reordering and blocking to change the
> performance.
>
> -- Steve
>
>
> On Thursday, August 21, 2014 11:11:35 PM UTC-4, Jameson wrote:
>
>> It's perhaps less a question of whether it is attainable, and more
>> whether it is worth the cost. Additional compile-time optimizations may
>> reduce run-time, but they generally increase compile-time. Entire-program
>> optimizations are generally the most expensive, so the julia inference step
>> has several knobs that try to detect excessively expensive optimizations
>> that are unlikely to affect runtime performance.
>>
>> > poor programmer style
>> I would prefer to say simply poor-style, and not make mention of the
>> programmer. I only intend to observe that many cases that inference has
>> trouble with can be resolved by programmer by judicious use of helper
>> functions, rather than needing to wait for the inference step to improve.
>>
>> Otherwise, I think you summarized most of my points adequately. I would
>> mention one more again though:
>>
>> While you hypothesize that aliasing information would aid the compiler,
>> I'm skeptical that it would have much benefit beyond contrived benchmarks*.
>> It is best to back up your claims with data: as you showed in your original
>> post to this thread, your assumption that immutable types would have a
>> significant performance impact was easily refuted by a quick test.
>>
>>
>> *Note: some functions (esp. those with a ! that mutate an argument) tend
>> to assume that some other code will not mutate their arguments while they
>> are running. Therefore, being able to take ownership of the object, and
>> thus prevent others from modifying it, would be great, but completely
>> tangential to being able to mark an argument as const and prevent yourself
>> from modifying it. But as I mentioned previously, this hypothetical
>> read-only/owner flag would somehow need to apply to everyone else's copy of
>> the object, except yours. Some languages make all data immutable to enforce
>> this, others (like Julia) just tell you not to modify data that you are
>> simultaneously accessing elsewhere. It would be nice to have a mechanism to
>> tell the programmer when they have violated this expectation, but unclear
>> how to do it without a high runtime cost.
>>
>>
>> On Thu, Aug 21, 2014 at 10:30 PM, <[email protected]> wrote:
>>
>>> Jameson,
>>>
>>> Do you mind if I try to summarize your point of view in my own words?
>>>
>>> First, you are saying that in most cases the compiler can tell when a
>>> function's mutable arguments are read-only, and the cases when it cannot
>>> tell arise mainly from poor programmer style.  Therefore, there is little
>>> need for programmer declaration of readonly arguments.
>>>
>>> Second, you are saying that knowledge of which arguments are readonly,
>>> regardless of whether this information comes in the form of compiler
>>> analysis or programmer declaration,  is not helpful because it could be
>>> defeated by aliasing between arguments, and aliasing will not go away even
>>> if the Julia manual forbids it.
>>>
>>> Third, you are saying (maybe I'm putting words in your mouth here; I'm
>>> reading between the lines of your several postings) that the compiler could
>>> carry out better analysis by analyzing the caller rather than the function.
>>>  Perhaps it could deduce from analyzing the caller whether aliasing takes
>>> place, thus overcoming the second point.  It could perhaps even instantiate
>>> multiple versions of the same method to handle the cases of aliased and
>>> nonaliased arguments with different implementations.
>>>
>>> So the conclusion is that Julia could in the future effectively analyze
>>> all the functions composing an entire program at once, spot most of the
>>> instances of functions called with readonly arguments, make optimizations
>>> and parallelizations based on this analysis, and perhaps pass the
>>> information back to the programmer to be used as a sanity check.  That
>>> sounds fantastic!  Is it attainable?
>>>
>>> -- Steve
>>>
>>>
>>>
>>> On Thursday, August 21, 2014 9:03:57 PM UTC-4, Jameson wrote:
>>>
>>>> > The point is that the compiler, when applied to f3, cannot tell
>>>> which of the two methods entitled gsub is called.  (At least, this is what
>>>> I guess from looking at the code_native printout
>>>>
>>>> There is a difference between what the compiler currently infers and
>>>> what it could infer. Since I can see that there is only one possible gsub
>>>> call, the compiler could too (in general, this optimization pass has not
>>>> yet been implemented mostly because it is easy to avoid tricking the
>>>> compiler like this, so the benefit to effort ratio is rather poor). But
>>>> this is still a compile-time check. I even had a branch at one point that
>>>> did more of this work at compile-time (it wasn't merged for unrelated
>>>> reasons, and I haven't bothered to revive it since it wasn't that useful of
>>>> an optimization in benchmarks).
>>>>
>>>> "Might modify" is the same as "does modify" in that it prohibits any of
>>>> the theoretic optimizations.
>>>>
>>>> In either case, code_typed is much more useful than code_native for
>>>> interpreting such situations.
>>>>
>>>> However, the type-instability of your example also is much more
>>>> impactful to performance than the possible (but in your example,
>>>> non-existent) benefit to assuming no-aliasing.
>>>>
>>>> The compiler could do a lot with the information that the second gsub
>>>> method is pure, especially for doing constant propagation. But, while the
>>>> compiler already knows that it is pure, it doesn't currently bother to
>>>> propagate that information to make full use of it.
>>>>
>>>> If you want to see this option added, you need to show that it can
>>>> actually benefit real code. While C's memory model remains a good
>>>> representation of the capabilities of hardware, that does not automatically
>>>> mean that any other model is inherently unusable. Gains in programmer
>>>> productivity are sometimes worth much more than gains in computer
>>>> efficiency – otherwise why would anyone have tried to improve upon 
>>>> assembly?
>>>>
>>>>
>>>> On Thu, Aug 21, 2014 at 6:12 PM, <[email protected]> wrote:
>>>>
>>>>> Jameson,
>>>>>
>>>>> Earlier I said that if function f has a mutable argument x that it
>>>>> passes to g, the compiler cannot necessarily tell whether f modifies x
>>>>> since the compiler may not be able to tell at f's compile time what g 
>>>>> does.
>>>>>  One way to get this circumstance, as you point out, is when f calls g via
>>>>> eval.  However, there are simpler cases when the compiler cannot tell
>>>>> whether g modifies its argument when it compiles f.  Here is a contrived
>>>>> example:
>>>>>
>>>>> module test_whichg
>>>>>
>>>>> ## s is modified in this gsub
>>>>> function gsub(s::IntSet, c::Int)
>>>>>     pop!(s)
>>>>>     cos(convert(Float64, c))
>>>>> end
>>>>>
>>>>> ## s is not modified in this gsub
>>>>> function gsub(s::IntSet, c::Char)
>>>>>     h = maxabs(s)
>>>>>     sin(convert(Float64, h))
>>>>> end
>>>>>
>>>>> function f3(s::IntSet)
>>>>>     y = 2
>>>>>     for i = 1 : 1
>>>>>         y = 'c'
>>>>>     end
>>>>>     gsub(s,y)
>>>>> end
>>>>>
>>>>> end
>>>>>
>>>>> The point is that the compiler, when applied to f3, cannot tell which
>>>>> of the two methods entitled gsub is called.  (At least, this is what I
>>>>> guess from looking at the code_native printout.  Unfortunately, I cannot
>>>>> read assembler, but I can make conjectures by comparing assemblies of
>>>>> different but similar functions.)  It is clear to us that 
>>>>> gsub(IntSet,Char)
>>>>> is called, which does not modify its argument, but apparently not to the
>>>>> compiler.  I misspoke earlier when I said that g might not even be parsed
>>>>> at the time f is compiled.
>>>>>
>>>>> With regard to aliasing, I agree that it is not possible in general
>>>>> for the compiler or even the run-time system to check for aliasing.  My
>>>>> proposal is that there simply be a rule against it in the manual, and that
>>>>> the compiler and run-time system try to catch the easy cases when the rule
>>>>> is violated.  It would be similar to the situation with a
>>>>> subscript-out-of-bounds error when the @inbounds macro is used, namely, if
>>>>> the rule is violated, then anything could happen.  There could be a
>>>>> compiler flag called --possible-aliasing in which the user warns that
>>>>> aliasing might be present and therefore instructs the compiler to disable
>>>>> all code transformations that assume no-aliasing.  (In other languages, 
>>>>> the
>>>>> situation is the opposite: the compiler accepts a flag in which the user
>>>>> promises no aliasing).
>>>>>
>>>>>
>>>>> -- Steve Vavasis
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thursday, August 21, 2014 12:46:30 AM UTC-4, Jameson wrote:
>>>>>
>>>>>> If you called f(x) which calls g(x), but g(x) does not exist, you are
>>>>>> going to get a no-method error. Or if you are using run-time eval to 
>>>>>> insert
>>>>>> code into the compile-time environment, you are forcing Julia to
>>>>>> re-evaluate a lot of assumptions anyways, so any performance benefits of
>>>>>> assuming const would be swamped by the additional cost imposed of having 
>>>>>> a
>>>>>> changing idea of what g(x) does. In short, if g(x) isn't defined by the
>>>>>> time you call it, hoping that the compiler noticed one of your arguments
>>>>>> was const should be the least of your concerns.
>>>>>>
>>>>>> The memcpy function spec explicitly forbids aliasing – I don't
>>>>>> understand how that is relevant. I thought I heard that Fortran did 
>>>>>> forbid
>>>>>> aliasing, and could actually achieve some code optimization from this 
>>>>>> fact?
>>>>>>
>>>>>> Preventing aliasing in general is impossible, if only looking at the
>>>>>> function arguments. For example, all of the following sometimes alias,
>>>>>> although there are cases for each that the compiler could not possibly
>>>>>> detect, at compile or runtime:
>>>>>>
>>>>>> dict[1] = dict.keys
>>>>>>
>>>>>> something(a) = (a[x] = 1)
>>>>>> something(x)
>>>>>>
>>>>>> call_f(a, T(a))
>>>>>>
>>>>>> @everywhere compute(g)
>>>>>>
>>>>>> That is why I've considered adding a "frozen" or "const" flag to the
>>>>>> object instance itself, rather than to the variable binding. I have a
>>>>>> suspicion that this is the property that is actually desired. It might 
>>>>>> even
>>>>>> be possible to implement this for all types by stealing a bit from the 
>>>>>> type
>>>>>> tag field.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 20, 2014 at 11:36 PM, <[email protected]> wrote:
>>>>>>
>>>>>>> Jameson,
>>>>>>>
>>>>>>> You wrote that the compiler can already tell whether or not a
>>>>>>> function modifies one of its mutable arguments.  Say that the function 
>>>>>>> is
>>>>>>> f, the mutable argument is x, and that f contains a call like g(x), 
>>>>>>> where g
>>>>>>> is another function. Then apparently in order to analyze f the compiler
>>>>>>> would have to know whether or not g modifies its argument.  But how can 
>>>>>>> it
>>>>>>> tell, since in Julia the function g might not even have been parsed 
>>>>>>> until
>>>>>>> that statement is encountered?
>>>>>>>
>>>>>>> With regard to your other point, I agree with you that aliasing is a
>>>>>>> significant loophole.  Although this is getting off topic, it seems to 
>>>>>>> me
>>>>>>> that the Julia community should simply declare that aliasing between
>>>>>>> read/write function arguments (or write/write) is not allowed in Julia.
>>>>>>>  The C community did not have that luxury because of legacy tricks with 
>>>>>>> the
>>>>>>> memcpy function, and neither did the Fortran community because of the
>>>>>>> legacy trick of equivalencing many smaller arrays on top of one big one.
>>>>>>> Since Julia gets to start with a clean slate, why not forbid aliasing?
>>>>>>>
>>>>>>> -- Steve
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, August 5, 2014 5:38:17 PM UTC-4, [email protected]
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear Julia users,
>>>>>>>>
>>>>>>>> It seems to me that Julia's distinction between a 'type' and an
>>>>>>>> 'immutable' conflates two independent properties; the consequence of 
>>>>>>>> this
>>>>>>>> conflation is a needless loss of performance.  In more detail, the
>>>>>>>> differences between a 'type' struct and 'immutable' struct in Julia 
>>>>>>>> are:
>>>>>>>>
>>>>>>>> 1. Assignment of 'type' struct copies a pointer; assignment of an
>>>>>>>> 'immutable' struct copies the data.
>>>>>>>>
>>>>>>>> 2. An array of type structs is an array of pointers, while an array
>>>>>>>> of immutables is an array of data.
>>>>>>>>
>>>>>>>> 3. Type structs are refcounted, whereas immutables are not.  (This
>>>>>>>> is not documented; it is my conjecture.)
>>>>>>>>
>>>>>>>> 4. Fields in type structs can be modified, but fields in immutables
>>>>>>>> cannot.
>>>>>>>>
>>>>>>>> Clearly #1-#3 are related concepts.  As far as I can see, #4 is
>>>>>>>> completely independent from #1-#3, and there is no obvious reason why 
>>>>>>>> it is
>>>>>>>> forbidden to modify fields in immutables.  There is no analogous
>>>>>>>> restriction in C/C++.
>>>>>>>>
>>>>>>>> This conflation causes a performance hit.  Consider:
>>>>>>>>
>>>>>>>> type floatbool
>>>>>>>>   a::Float64
>>>>>>>>   b:Bool
>>>>>>>> end
>>>>>>>>
>>>>>>>> If t is of type Array{floatbool,1} and I want to update the flag b
>>>>>>>> in t[10] to 'true', I say 't[10].b=true' (call this 'fast'update).  
>>>>>>>> But if
>>>>>>>> instead of 'type floatbool' I had said 'immutable floatbool', then to 
>>>>>>>> set
>>>>>>>> flag b in t[10] I need the more complex code t[10] =
>>>>>>>> floatbool(t[10].a,true) (call this 'slow' update).
>>>>>>>>
>>>>>>>> To document the performance hit, I wrote five functions below. The
>>>>>>>> first three use 'type' and either no update, fast update, or slow 
>>>>>>>> update;
>>>>>>>> the last two use 'immutable' and either no update or slow update.   
>>>>>>>> You can
>>>>>>>> see a HUGE hit on performance between slow and fast update for `type'; 
>>>>>>>> for
>>>>>>>> immutable there would presumably also be a difference, although 
>>>>>>>> apparently
>>>>>>>> smaller. (Obviously, I can't test fast update for immutable; this is 
>>>>>>>> the
>>>>>>>> point of my message!)
>>>>>>>>
>>>>>>>> So why does Julia impose this apparently needless restriction on
>>>>>>>> immutable?
>>>>>>>>
>>>>>>>> -- Steve Vavasis
>>>>>>>>
>>>>>>>>
>>>>>>>> julia> @time testimmut.type_upd_none()
>>>>>>>> @time testimmut.type_upd_none()
>>>>>>>> elapsed time: 0.141462422 seconds (48445152 bytes allocated)
>>>>>>>>
>>>>>>>> julia> @time testimmut.type_upd_fast()
>>>>>>>> @time testimmut.type_upd_fast()
>>>>>>>> elapsed time: 0.618769232 seconds (48247072 bytes allocated)
>>>>>>>>
>>>>>>>> julia> @time testimmut.type_upd_slow()
>>>>>>>> @time testimmut.type_upd_slow()
>>>>>>>> elapsed time: 4.511306586 seconds (4048268640 bytes allocated)
>>>>>>>>
>>>>>>>> julia> @time testimmut.immut_upd_none()
>>>>>>>> @time testimmut.immut_upd_none()
>>>>>>>> elapsed time: 0.04480173 seconds (32229468 bytes allocated)
>>>>>>>>
>>>>>>>> julia> @time testimmut.immut_upd_slow()
>>>>>>>> @time testimmut.immut_upd_slow()
>>>>>>>> elapsed time: 0.351634871 seconds (32000096 bytes allocated)
>>>>>>>>
>>>>>>>> module testimmut
>>>>>>>>
>>>>>>>> type xytype
>>>>>>>>     x::Int
>>>>>>>>     y::Float64
>>>>>>>>     z::Float64
>>>>>>>>     summed::Bool
>>>>>>>> end
>>>>>>>>
>>>>>>>> immutable xyimmut
>>>>>>>>     x::Int
>>>>>>>>     y::Float64
>>>>>>>>     z::Float64
>>>>>>>>     summed::Bool
>>>>>>>> end
>>>>>>>>
>>>>>>>> myfun(x) = x * (x + 1) * (x + 2)
>>>>>>>>
>>>>>>>> function type_upd_none()
>>>>>>>>     n = 1000000
>>>>>>>>     a = Array(xytype, n)
>>>>>>>>     for i = 1 : n
>>>>>>>>         a[i] = xytype(div(i,2), 0.0, 0.0, false)
>>>>>>>>     end
>>>>>>>>     numtri = 100
>>>>>>>>     for tri = 1 : numtri
>>>>>>>>         sum = 0
>>>>>>>>         for i = 1 : n
>>>>>>>>             @inbounds x = a[i].x
>>>>>>>>             sum += myfun(x)
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>>
>>>>>>>> function type_upd_fast()
>>>>>>>>     n = 1000000
>>>>>>>>     a = Array(xytype, n)
>>>>>>>>     for i = 1 : n
>>>>>>>>         a[i] = xytype(div(i,2),  0.0, 0.0, false)
>>>>>>>>     end
>>>>>>>>     numtri = 100
>>>>>>>>     for tri = 1 : numtri
>>>>>>>>         sum = 0
>>>>>>>>         for i = 1 : n
>>>>>>>>             @inbounds x = a[i].x
>>>>>>>>             sum += myfun(x)
>>>>>>>>             @inbounds a[i].summed = true
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>> function type_upd_slow()
>>>>>>>>     n = 1000000
>>>>>>>>     a = Array(xytype, n)
>>>>>>>>     for i = 1 : n
>>>>>>>>         a[i] = xytype(div(i,2),  0.0, 0.0, false)
>>>>>>>>     end
>>>>>>>>     numtri = 100
>>>>>>>>     for tri = 1 : numtri
>>>>>>>>         sum = 0
>>>>>>>>         for i = 1 : n
>>>>>>>>             @inbounds x = a[i].x
>>>>>>>>             sum += myfun(x)
>>>>>>>>             @inbounds a[i] = xytype(a[i].x, a[i].y, a[i].z, true)
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>>
>>>>>>>> function immut_upd_none()
>>>>>>>>     n = 1000000
>>>>>>>>     a = Array(xyimmut, n)
>>>>>>>>     for i = 1 : n
>>>>>>>>         a[i] = xyimmut(div(i,2),  0.0, 0.0, false)
>>>>>>>>     end
>>>>>>>>     numtri = 100
>>>>>>>>     for tri = 1 : numtri
>>>>>>>>         sum = 0
>>>>>>>>         for i = 1 : n
>>>>>>>>             @inbounds x = a[i].x
>>>>>>>>             sum += myfun(x)
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>> function immut_upd_slow()
>>>>>>>>     n = 1000000
>>>>>>>>     a = Array(xyimmut, n)
>>>>>>>>     for i = 1 : n
>>>>>>>>         a[i] = xyimmut(div(i,2),  0.0, 0.0, false)
>>>>>>>>     end
>>>>>>>>     numtri = 100
>>>>>>>>     for tri = 1 : numtri
>>>>>>>>         sum = 0
>>>>>>>>         for i = 1 : n
>>>>>>>>             @inbounds x = a[i].x
>>>>>>>>             sum += myfun(x)
>>>>>>>>             @inbounds a[i] = xyimmut(a[i].x, a[i].y, a[i].z, true)
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>> end
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: [julia-users] Re: needless loss of performance: immutable conflates two concepts

Reply via email to