I'm wondering if there are any general rules out there in Julia land
concerning when it is better to *parametrize *a function on a type versus
making a more generic function that uses typeof, eltype, as needed. I've
done some initial cursory examination, which you can see below if you're
interested. Based on this little bit of investigation, it does seem like
the code generation does effectively inline type calls when the types are
fully known, which means that in many cases, the code generated by
parameterized functions will be identical to their generic siblings.
Nevertheless, I did reveal some small differences, so I'm wondering if
there are cases where those differences are worth considering when making
implementation choices.
Any thoughts, whether derived from experience, knowledge of Julia
internals, or wild-ass guessing are appreciated.
Regards,
Michael Grant
-------
One very simple example is the definition of eltype in number.jl:
eltype(x::Number)=typeof(x)
One could also implement the same functionality as follows:
eltype{T<:Number}(x::T)=T
Which one is better? Well, if I'm understanding this little test
properly---neither!
eltype2{T<:Number}(x::T) = T
eltype3(x::Number) = typeof(x)
code_llvm(eltype2,(Complex{Float32},))
code_llvm(eltype3,(Complex{Float32},))
The output of code_llvm looks to be identical in both cases:
define %jl_value_t* @"julia_eltype3;39061"(%Complex.4) {
top:
ret %jl_value_t* inttoptr (i64 140332346166688 to %jl_value_t*), !dbg !
2208
}
Great! Now, let's try a slightly more complex example, inspired by the sign
function:
sign2{T<:Real}(x::T) = ifelse(x < 0, oftype(T,-1), ifelse(x > 0, one(T), x))
sign3(x::Real) = ifelse(x < 0, oftype(x,-1), ifelse(x > 0, one(x), x))
code_llvm(sign2,(Float64,))
code_llvm(sign3,(Float64,))
Once again, identical code:
define double @"julia_sign3;39079"(double) {
top:
%1 = fcmp uge double %0, 0.000000e+00, !dbg !2208
%2 = fcmp oge double %0, 0x43E0000000000000, !dbg !2208
%3 = fcmp ogt double %0, 0.000000e+00, !dbg !2208
%4 = or i1 %2, %3, !dbg !2208
%5 = select i1 %4, double 1.000000e+00, double %0, !dbg !2208
%6 = select i1 %1, double %5, double -1.000000e+00, !dbg !2208
ret double %6, !dbg !2208
}
Win! And yet, both of these examples rely on the fact that types are fully
resolved. What if they aren't? I can look a generic version of eltype3 by
typing
code_llvm(eltype3,(Number,))
but I can't do
code_llvm(eltype2,(Number,))
because no such definition exists. So suppose I create a new wrapper
function:
eltype4(x::Number) = eltype2(x)
so that I can see what kind of generic dispatch code will be generated for
eltype2 when necessary. Now, different code is indeed generated. Here's the
resulting code for eltype3:
define %jl_value_t* @"julia_eltype3;39082"(%jl_value_t*, %jl_value_t**, i32)
{
top:
%3 = alloca [3 x %jl_value_t*], align 8
%.sub = getelementptr inbounds [3 x %jl_value_t*]* %3, i64 0, i64 0
%4 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !2208
store %jl_value_t* inttoptr (i64 2 to %jl_value_t*), %jl_value_t** %.sub,
align 8
%5 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !2208
%6 = getelementptr [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !2208
%.c = bitcast %jl_value_t** %5 to %jl_value_t*, !dbg !2208
store %jl_value_t* %.c, %jl_value_t** %6, align 8, !dbg !2208
store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !
2208
store %jl_value_t* null, %jl_value_t** %4, align 8
%7 = icmp eq i32 %2, 1, !dbg !2208
br i1 %7, label %ifcont, label %else, !dbg !2208
else: ; preds = %top
call void @jl_error(i8* getelementptr inbounds ([26 x i8]* @_j_str0, i64 0
, i64 0)), !dbg !2208
unreachable, !dbg !2208
ifcont: ; preds = %top
%8 = load %jl_value_t** %1, align 8, !dbg !2208
%9 = load %jl_value_t** inttoptr (i64 140332374695360 to %jl_value_t**),
align 64, !dbg !2209
%10 = getelementptr inbounds %jl_value_t* %9, i64 1, i32 0, !dbg !2209
%11 = load %jl_value_t** %10, align 8, !dbg !2209
%12 = bitcast %jl_value_t* %11 to %jl_value_t* (%jl_value_t*, %jl_value_t
**, i32)*, !dbg !2209
store %jl_value_t* %8, %jl_value_t** %4, align 8, !dbg !2209
%13 = call %jl_value_t* %12(%jl_value_t* %9, %jl_value_t** %4, i32 1), !dbg
!2209
%14 = load %jl_value_t** %6, align 8, !dbg !2209
%15 = getelementptr inbounds %jl_value_t* %14, i64 0, i32 0, !dbg !2209
store %jl_value_t** %15, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2209
ret %jl_value_t* %13, !dbg !2209
}
The code for eltype4 is identical until we get to ifcont:
ifcont: ; preds = %top
%8 = load %jl_value_t** %1, align 8, !dbg !2208
store %jl_value_t* %8, %jl_value_t** %4, align 8, !dbg !2209
%9 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64
140332531469120 to %jl_value_t*), %jl_value_t** %4, i32 1), !dbg !2209
%10 = load %jl_value_t** %6, align 8, !dbg !2209
%11 = getelementptr inbounds %jl_value_t* %10, i64 0, i32 0, !dbg !2209
store %jl_value_t** %11, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2209
ret %jl_value_t* %9, !dbg !2209}
Now we see the invocation of @jl_apply_generic to handle the dispatching of
eltype2, I assume. Looking at the native code, the differences are also
minor, though of course there are a couple of calls I don't understand.
>From looks alone, I see no reason to strongly prefer one form over the
other.