All, I've been toying with the SSE code generation in GHC 7.7 and Geoffrey Mainland's work to integrate this into the 'vector' library in order to generate SIMD code from high-level Haskell code.
While working with this, I wrote some simple code for testing purposes, then compiled this into LLVM IR and x86_64 assembly form in order to figure out how 'good' the resulting code would be. First and foremost: I'm really impressed. Whilst there's most certainly room for improvement (one of them touched in this mail, though I also noticed unnecessary constant memory reads inside a tight loop), the initial results look very promising, especially taking into account how high-level the source code is. This is pretty amazing! As an example, here's 'test.hs': {-# OPTIONS_GHC -fllvm -O3 -optlo-O3 -optlc-O=3 -funbox-strict-fields #-} module Test (sum) where import Prelude hiding (sum) import Data.Int (Int32) import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as U sum :: Vector Int32 -> Int32 sum v = U.mfold' (+) (+) 0 v When compiling this into assembly (compiler/library version details at the end of this message), the 'sum' function yields (among other things) this code: .LBB2_3: # %c1C0 # =>This Inner Loop Header: Depth=1 prefetcht0 (%rsi) movdqu -1536(%rsi), %xmm1 paddd %xmm1, %xmm0 addq $16, %rsi addq $4, %rcx cmpq %rdx, %rcx jl .LBB2_3 The full LLVM IR and assembler output are attached to this message. Whilst this is a nice and tight loop, I noticed the use of 'movdqu', which is used for non-128bit aligned memory access in SSE code. For aligned memory, 'movdqa' can be used, and this can have a major performance impact. Whilst I understand why this code is currently generated as-is (also in other sample inputs), I wondered whether there are plans/approaches to tackle this. In some cases (e.g. in 'sum') this could be by using the scalar calculation at the beginning of the vector up until an aligned boundary, then use aligned access and handle the tail using scalars again, but I assume OTOH that's not trivial when multiple 'source' vectors are used in the calculation. This might even become more complex when using AVX code, which needs 256bit alignments. Whilst I can't propose an out-of-the-box solution, I'd like to point at the 'vector-simd' code [1] I wrote some months ago, which might propose some ideas. In this package, I created an unboxed vector-like type whose alignment is tracked at type level, and functions which consume a vector define the minimal required alignment. As such, vectors can be allocated at the minimal alignment they're required to be, throughout all code using them. As an example, if I'd use this code (OTOH): sseFoo :: (Storable a, AlignedToAtLeast A16 o1, AlignedToAtLeast A16 o2) => Vector o1 a -> Vector o2 a sseFoo = undefined avxFoo :: (Storable a, AlignedToAtLeast A32 o1, AlignedToAtLeast A32 o2, AlignedToAtLeast A32 o3) => Vector o1 a -> Vector o2 a -> Vector o3 a avxFoo = undefined the type of combinedFoo v = avxFoo sv sv where sv = sseFoo v would automagically be combinedFoo :: (Storable a, AlignedToAtLeast A16 o1, AlignedToAtLeast A32 o2) => Vector o1 a -> Vector o2 a and when using this v1 = combinedFoo (Vector.fromList [1 :: Int32, 2, 3, 4, 5, 6, 7, 8]) the allocated argument vector (result of Vector.fromList) will be 16byte-aligned as expected/required for the SSE function to work with unaligned loads internally (assuming no unaligned slices are supported, etc), whilst the intermediate result of 'sseFoo' ('sv') will be 32-byte aligned as required by 'avxFoo'. Attached: test.ll and test.s, compilation results of test.hs using $ ghc-7.7.20130302 -keep-llvm-files -package-db=cabal-dev/packages-7.7.20130302.conf -fforce-recomp -S test.hs GHC from HEAD/master compiled on my Fedora 18 system using system LLVM (3.1), 'primitive' 8aef578fa5e7fb9fac3eac17336b722cbae2f921 from git://github.com/mainland/primitive.git and 'vector' e1a6c403bcca07b4c8121753daf120d30dedb1b0 from git://github.com/mainland/vector.git Nicolas [1] https://github.com/NicolasT/vector-simd
{-# OPTIONS_GHC -fllvm -O3 -optlo-O3 -optlc-O=3 -funbox-strict-fields #-} module Test (sum) where import Prelude hiding (sum) import Data.Int (Int32) import Data.Vector.Unboxed (Vector) import qualified Data.Vector.Unboxed as U sum :: Vector Int32 -> Int32 sum v = U.mfold' (+) (+) 0 v
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-linux-gnu" declare ccc i8* @memcpy(i8*, i8*, i64) declare ccc i8* @memmove(i8*, i8*, i64) declare ccc i8* @memset(i8*, i64, i64) declare ccc i64 @newSpark(i8*, i8*) !0 = metadata !{metadata !"top"} !1 = metadata !{metadata !"stack",metadata !0} !2 = metadata !{metadata !"heap",metadata !0} !3 = metadata !{metadata !"rx",metadata !2} !4 = metadata !{metadata !"base",metadata !0} !5 = metadata !{metadata !"other",metadata !0} %__stginit_Test_struct = type <{}> @__stginit_Test = global %__stginit_Test_struct<{}> %Test_zdwa_closure_struct = type <{i64}> @Test_zdwa_closure = global %Test_zdwa_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_zdwa_info to i64)}> %Test_sum1_closure_struct = type <{i64}> @Test_sum1_closure = global %Test_sum1_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_sum1_info to i64)}> %Test_sum_closure_struct = type <{i64}> @Test_sum_closure = global %Test_sum_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_sum_info to i64)}> %S1DM_srt_struct = type <{}> @S1DM_srt = internal constant %S1DM_srt_struct<{}> %s1xB_entry_struct = type <{i64, i64, i64}> @s1xB_info_itable = internal constant %s1xB_entry_struct<{i64 8589934602, i64 8589934593, i64 9}>, section "X98A__STRIP,__me1", align 8 define internal cc 10 void @s1xB_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me2" { c1AJ: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 %R2_Arg, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 %R3_Arg, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1xr = alloca i64, i32 1 %ls1xy = alloca i64, i32 1 %ls1xB = alloca i64, i32 1 %ln1EB = load i64* %R3_Var store i64 %ln1EB, i64* %ls1xr %ln1EC = load i64* %R2_Var store i64 %ln1EC, i64* %ls1xy %ln1ED = load i64* %R1_Var store i64 %ln1ED, i64* %ls1xB %ln1EE = load i64* %ls1xr %ln1EF = load i64* %ls1xB %ln1EG = add i64 %ln1EF, 14 %ln1EH = inttoptr i64 %ln1EG to i64* %ln1EI = load i64* %ln1EH, !tbaa !5 %ln1EJ = icmp sge i64 %ln1EE, %ln1EI br i1 %ln1EJ, label %c1AN, label %c1AM c1AM: %ln1EK = load i64* %ls1xr %ln1EL = add i64 %ln1EK, 1 store i64 %ln1EL, i64* %R3_Var %ln1EM = load i64* %ls1xy %ln1EN = load i64* %ls1xB %ln1EO = add i64 %ln1EN, 6 %ln1EP = inttoptr i64 %ln1EO to i64* %ln1EQ = load i64* %ln1EP, !tbaa !5 %ln1ER = load i64* %ls1xB %ln1ES = add i64 %ln1ER, 22 %ln1ET = inttoptr i64 %ln1ES to i64* %ln1EU = load i64* %ln1ET, !tbaa !5 %ln1EV = load i64* %ls1xr %ln1EW = add i64 %ln1EU, %ln1EV %ln1EX = shl i64 %ln1EW, 2 %ln1EY = add i64 %ln1EX, 16 %ln1EZ = add i64 %ln1EQ, %ln1EY %ln1F0 = inttoptr i64 %ln1EZ to i32* %ln1F1 = load i32* %ln1F0, !tbaa !5 %ln1F2 = sext i32 %ln1F1 to i64 %ln1F3 = add i64 %ln1EM, %ln1F2 %ln1F4 = trunc i64 %ln1F3 to i32 %ln1F5 = sext i32 %ln1F4 to i64 store i64 %ln1F5, i64* %R2_Var %ln1F6 = load i64* %ls1xB store i64 %ln1F6, i64* %R1_Var %ln1F7 = load i64** %Base_Var %ln1F8 = load i64** %Sp_Var %ln1F9 = load i64** %Hp_Var %ln1Fa = load i64* %R1_Var %ln1Fb = load i64* %R2_Var %ln1Fc = load i64* %R3_Var %ln1Fd = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @s1xB_info( i64* %ln1F7, i64* %ln1F8, i64* %ln1F9, i64 %ln1Fa, i64 %ln1Fb, i64 %ln1Fc, i64 undef, i64 undef, i64 undef, i64 %ln1Fd ) nounwind ret void c1AN: %ln1Fe = load i64* %ls1xy store i64 %ln1Fe, i64* %R1_Var %ln1Ff = load i64** %Sp_Var %ln1Fg = getelementptr inbounds i64* %ln1Ff, i32 0 %ln1Fh = bitcast i64* %ln1Fg to i64* %ln1Fi = load i64* %ln1Fh, !tbaa !1 %ln1Fj = inttoptr i64 %ln1Fi to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1Fk = load i64** %Base_Var %ln1Fl = load i64** %Sp_Var %ln1Fm = load i64** %Hp_Var %ln1Fn = load i64* %R1_Var %ln1Fo = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Fj( i64* %ln1Fk, i64* %ln1Fl, i64* %ln1Fm, i64 %ln1Fn, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Fo ) nounwind ret void } %Test_zdwa_entry_struct = type <{i64, i64, i64}> @Test_zdwa_info_itable = constant %Test_zdwa_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me3", align 8 define cc 10 void @Test_zdwa_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me4" { c1Bf: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 %R2_Arg, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1xj = alloca i64, i32 1 %ln1FV = load i64* %R2_Var store i64 %ln1FV, i64* %ls1xj %ln1FW = load i64** %Sp_Var %ln1FX = getelementptr inbounds i64* %ln1FW, i32 -4 %ln1FY = ptrtoint i64* %ln1FX to i64 %ln1FZ = load i64* %SpLim_Var %ln1G0 = icmp ult i64 %ln1FY, %ln1FZ br i1 %ln1G0, label %c1Cf, label %c1Ce c1Ce: %ln1G1 = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Bg_info to i64 %ln1G2 = load i64** %Sp_Var %ln1G3 = getelementptr inbounds i64* %ln1G2, i32 -1 store i64 %ln1G1, i64* %ln1G3, !tbaa !1 %ln1G4 = load i64* %ls1xj store i64 %ln1G4, i64* %R1_Var %ln1G5 = load i64** %Sp_Var %ln1G6 = getelementptr inbounds i64* %ln1G5, i32 -1 %ln1G7 = ptrtoint i64* %ln1G6 to i64 %ln1G8 = inttoptr i64 %ln1G7 to i64* store i64* %ln1G8, i64** %Sp_Var %ln1G9 = load i64** %Base_Var %ln1Ga = load i64** %Sp_Var %ln1Gb = load i64** %Hp_Var %ln1Gc = load i64* %R1_Var %ln1Gd = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_ap_0_fast( i64* %ln1G9, i64* %ln1Ga, i64* %ln1Gb, i64 %ln1Gc, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Gd ) nounwind ret void c1Cf: %ln1Ge = load i64* %ls1xj store i64 %ln1Ge, i64* %R2_Var %ln1Gf = ptrtoint %Test_zdwa_closure_struct* @Test_zdwa_closure to i64 store i64 %ln1Gf, i64* %R1_Var %ln1Gg = load i64** %Base_Var %ln1Gh = getelementptr inbounds i64* %ln1Gg, i32 -1 %ln1Gi = bitcast i64* %ln1Gh to i64* %ln1Gj = load i64* %ln1Gi, !tbaa !4 %ln1Gk = inttoptr i64 %ln1Gj to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1Gl = load i64** %Base_Var %ln1Gm = load i64** %Sp_Var %ln1Gn = load i64** %Hp_Var %ln1Go = load i64* %R1_Var %ln1Gp = load i64* %R2_Var %ln1Gq = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Gk( i64* %ln1Gl, i64* %ln1Gm, i64* %ln1Gn, i64 %ln1Go, i64 %ln1Gp, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Gq ) nounwind ret void } declare cc 10 void @stg_ap_0_fast(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8 %c1Bg_entry_struct = type <{i64, i64}> @c1Bg_info_itable = internal constant %c1Bg_entry_struct<{i64 0, i64 32}>, section "X98A__STRIP,__me5", align 8 define internal cc 10 void @c1Bg_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me6" { c1Bg: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 undef, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1yF = alloca i64, i32 1 %ls1xu = alloca i64, i32 1 %ls1xv = alloca i64, i32 1 %ls1xs = alloca i64, i32 1 %lc1Bn = alloca i64, i32 1 %ls1xH = alloca i64, i32 1 %ls1xX = alloca <4 x i32>, i32 1 %ls1xL = alloca i64, i32 1 %ln1I5 = load i64** %Hp_Var %ln1I6 = getelementptr inbounds i64* %ln1I5, i32 4 %ln1I7 = ptrtoint i64* %ln1I6 to i64 %ln1I8 = inttoptr i64 %ln1I7 to i64* store i64* %ln1I8, i64** %Hp_Var %ln1I9 = load i64* %R1_Var store i64 %ln1I9, i64* %ls1yF %ln1Ia = load i64** %Hp_Var %ln1Ib = ptrtoint i64* %ln1Ia to i64 %ln1Ic = load i64** %Base_Var %ln1Id = getelementptr inbounds i64* %ln1Ic, i32 35 %ln1Ie = bitcast i64* %ln1Id to i64* %ln1If = load i64* %ln1Ie, !tbaa !4 %ln1Ig = icmp ugt i64 %ln1Ib, %ln1If br i1 %ln1Ig, label %c1Cb, label %c1BR c1BR: %ln1Ih = load i64* %ls1yF %ln1Ii = add i64 %ln1Ih, 7 %ln1Ij = inttoptr i64 %ln1Ii to i64* %ln1Ik = load i64* %ln1Ij, !tbaa !5 store i64 %ln1Ik, i64* %ls1xu %ln1Il = load i64* %ls1yF %ln1Im = add i64 %ln1Il, 15 %ln1In = inttoptr i64 %ln1Im to i64* %ln1Io = load i64* %ln1In, !tbaa !5 store i64 %ln1Io, i64* %ls1xv %ln1Ip = load i64* %ls1yF %ln1Iq = add i64 %ln1Ip, 23 %ln1Ir = inttoptr i64 %ln1Iq to i64* %ln1Is = load i64* %ln1Ir, !tbaa !5 store i64 %ln1Is, i64* %ls1xs %ln1It = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @s1xB_info to i64 %ln1Iu = load i64** %Hp_Var %ln1Iv = getelementptr inbounds i64* %ln1Iu, i32 -3 store i64 %ln1It, i64* %ln1Iv, !tbaa !2 %ln1Iw = load i64* %ls1xu %ln1Ix = load i64** %Hp_Var %ln1Iy = getelementptr inbounds i64* %ln1Ix, i32 -2 store i64 %ln1Iw, i64* %ln1Iy, !tbaa !2 %ln1Iz = load i64* %ls1xs %ln1IA = load i64** %Hp_Var %ln1IB = getelementptr inbounds i64* %ln1IA, i32 -1 store i64 %ln1Iz, i64* %ln1IB, !tbaa !2 %ln1IC = load i64* %ls1xv %ln1ID = load i64** %Hp_Var %ln1IE = getelementptr inbounds i64* %ln1ID, i32 0 store i64 %ln1IC, i64* %ln1IE, !tbaa !2 %ln1IF = load i64** %Hp_Var %ln1IG = ptrtoint i64* %ln1IF to i64 %ln1IH = add i64 %ln1IG, -22 store i64 %ln1IH, i64* %lc1Bn %ln1II = load i64* %ls1xs %ln1IJ = load i64* %ls1xs %ln1IK = srem i64 %ln1IJ, 4 %ln1IL = sub i64 %ln1II, %ln1IK store i64 %ln1IL, i64* %ls1xH %ln1IM = insertelement <4 x i32> < i32 0, i32 0, i32 0, i32 0 >, i32 0, i32 0 %ln1IN = insertelement <4 x i32> %ln1IM, i32 0, i32 1 %ln1IO = insertelement <4 x i32> %ln1IN, i32 0, i32 2 %ln1IP = insertelement <4 x i32> %ln1IO, i32 0, i32 3 %ln1IQ = bitcast <4 x i32> %ln1IP to <4 x i32> store <4 x i32> %ln1IQ, <4 x i32>* %ls1xX, align 1 store i64 0, i64* %ls1xL br label %s1xV s1xV: %ln1IR = load i64* %ls1xL %ln1IS = load i64* %ls1xH %ln1IT = icmp sge i64 %ln1IR, %ln1IS br i1 %ln1IT, label %c1C1, label %c1C0 c1C0: %ln1IU = load i64* %ls1xu %ln1IV = add i64 %ln1IU, 16 %ln1IW = load i64* %ls1xv %ln1IX = load i64* %ls1xL %ln1IY = add i64 %ln1IW, %ln1IX %ln1IZ = shl i64 %ln1IY, 2 %ln1J0 = add i64 %ln1IZ, 1536 %ln1J1 = add i64 %ln1IV, %ln1J0 %ln1J2 = inttoptr i64 %ln1J1 to i8* store i64 undef, i64* %R3_Var store i64 undef, i64* %R4_Var store i64 undef, i64* %R5_Var store i64 undef, i64* %R6_Var store float undef, float* %F1_Var store double undef, double* %D1_Var store float undef, float* %F2_Var store double undef, double* %D2_Var store float undef, float* %F3_Var store double undef, double* %D3_Var store float undef, float* %F4_Var store double undef, double* %D4_Var store float undef, float* %F5_Var store double undef, double* %D5_Var store float undef, float* %F6_Var store double undef, double* %D6_Var call ccc void (i8*,i32,i32,i32)* @llvm.prefetch( i8* %ln1J2, i32 0, i32 3, i32 1 ) %ln1J3 = load <4 x i32>* %ls1xX, align 1 %ln1J4 = load i64* %ls1xu %ln1J5 = add i64 %ln1J4, 16 %ln1J6 = load i64* %ls1xv %ln1J7 = load i64* %ls1xL %ln1J8 = add i64 %ln1J6, %ln1J7 %ln1J9 = shl i64 %ln1J8, 2 %ln1Ja = add i64 %ln1J5, %ln1J9 %ln1Jb = inttoptr i64 %ln1Ja to <4 x i32>* %ln1Jc = load <4 x i32>* %ln1Jb, align 1, !tbaa !5 %ln1Jd = add <4 x i32> %ln1J3, %ln1Jc %ln1Je = bitcast <4 x i32> %ln1Jd to <4 x i32> store <4 x i32> %ln1Je, <4 x i32>* %ls1xX, align 1 %ln1Jf = load i64* %ls1xL %ln1Jg = add i64 %ln1Jf, 4 store i64 %ln1Jg, i64* %ls1xL br label %s1xV c1C1: %ln1Jh = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Bm_info to i64 %ln1Ji = load i64** %Sp_Var %ln1Jj = getelementptr inbounds i64* %ln1Ji, i32 -3 store i64 %ln1Jh, i64* %ln1Jj, !tbaa !1 %ln1Jk = load i64* %ls1xL store i64 %ln1Jk, i64* %R3_Var store i64 0, i64* %R2_Var %ln1Jl = load i64* %lc1Bn store i64 %ln1Jl, i64* %R1_Var %ln1Jm = load <4 x i32>* %ls1xX, align 1 %ln1Jn = load i64** %Sp_Var %ln1Jo = getelementptr inbounds i64* %ln1Jn, i32 -2 %ln1Jp = bitcast i64* %ln1Jo to <4 x i32>* store <4 x i32> %ln1Jm, <4 x i32>* %ln1Jp, align 1, !tbaa !1 %ln1Jq = load i64** %Sp_Var %ln1Jr = getelementptr inbounds i64* %ln1Jq, i32 -3 %ln1Js = ptrtoint i64* %ln1Jr to i64 %ln1Jt = inttoptr i64 %ln1Js to i64* store i64* %ln1Jt, i64** %Sp_Var %ln1Ju = load i64** %Base_Var %ln1Jv = load i64** %Sp_Var %ln1Jw = load i64** %Hp_Var %ln1Jx = load i64* %R1_Var %ln1Jy = load i64* %R2_Var %ln1Jz = load i64* %R3_Var %ln1JA = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @s1xB_info( i64* %ln1Ju, i64* %ln1Jv, i64* %ln1Jw, i64 %ln1Jx, i64 %ln1Jy, i64 %ln1Jz, i64 undef, i64 undef, i64 undef, i64 %ln1JA ) nounwind ret void c1Cb: %ln1JB = load i64** %Base_Var %ln1JC = getelementptr inbounds i64* %ln1JB, i32 41 store i64 32, i64* %ln1JC, !tbaa !4 %ln1JD = load i64* %ls1yF store i64 %ln1JD, i64* %R1_Var %ln1JE = load i64** %Base_Var %ln1JF = load i64** %Sp_Var %ln1JG = load i64** %Hp_Var %ln1JH = load i64* %R1_Var %ln1JI = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_gc_unpt_r1( i64* %ln1JE, i64* %ln1JF, i64* %ln1JG, i64 %ln1JH, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1JI ) nounwind ret void } declare ccc void @llvm.prefetch(i8*, i32, i32, i32) declare cc 10 void @stg_gc_unpt_r1(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8 %c1Bm_entry_struct = type <{i64, i64}> @c1Bm_info_itable = internal constant %c1Bm_entry_struct<{i64 451, i64 32}>, section "X98A__STRIP,__me7", align 8 define internal cc 10 void @c1Bm_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me8" { c1Bm: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 undef, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1xX = alloca <4 x i32>, i32 1 %ln1Kr = load i64** %Sp_Var %ln1Ks = getelementptr inbounds i64* %ln1Kr, i32 1 %ln1Kt = bitcast i64* %ln1Ks to <4 x i32>* %ln1Ku = load <4 x i32>* %ln1Kt, align 1, !tbaa !1 %ln1Kv = bitcast <4 x i32> %ln1Ku to <4 x i32> store <4 x i32> %ln1Kv, <4 x i32>* %ls1xX, align 1 %ln1Kw = load i64* %R1_Var %ln1Kx = load <4 x i32>* %ls1xX, align 1 %ln1Ky = extractelement <4 x i32> %ln1Kx, i32 0 %ln1Kz = sext i32 %ln1Ky to i64 %ln1KA = add i64 %ln1Kw, %ln1Kz %ln1KB = trunc i64 %ln1KA to i32 %ln1KC = sext i32 %ln1KB to i64 %ln1KD = load <4 x i32>* %ls1xX, align 1 %ln1KE = extractelement <4 x i32> %ln1KD, i32 1 %ln1KF = sext i32 %ln1KE to i64 %ln1KG = add i64 %ln1KC, %ln1KF %ln1KH = trunc i64 %ln1KG to i32 %ln1KI = sext i32 %ln1KH to i64 %ln1KJ = load <4 x i32>* %ls1xX, align 1 %ln1KK = extractelement <4 x i32> %ln1KJ, i32 2 %ln1KL = sext i32 %ln1KK to i64 %ln1KM = add i64 %ln1KI, %ln1KL %ln1KN = trunc i64 %ln1KM to i32 %ln1KO = sext i32 %ln1KN to i64 %ln1KP = load <4 x i32>* %ls1xX, align 1 %ln1KQ = extractelement <4 x i32> %ln1KP, i32 3 %ln1KR = sext i32 %ln1KQ to i64 %ln1KS = add i64 %ln1KO, %ln1KR %ln1KT = trunc i64 %ln1KS to i32 %ln1KU = sext i32 %ln1KT to i64 store i64 %ln1KU, i64* %R1_Var %ln1KV = load i64** %Sp_Var %ln1KW = getelementptr inbounds i64* %ln1KV, i32 4 %ln1KX = ptrtoint i64* %ln1KW to i64 %ln1KY = inttoptr i64 %ln1KX to i64* store i64* %ln1KY, i64** %Sp_Var %ln1KZ = load i64** %Sp_Var %ln1L0 = getelementptr inbounds i64* %ln1KZ, i32 0 %ln1L1 = bitcast i64* %ln1L0 to i64* %ln1L2 = load i64* %ln1L1, !tbaa !1 %ln1L3 = inttoptr i64 %ln1L2 to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1L4 = load i64** %Base_Var %ln1L5 = load i64** %Sp_Var %ln1L6 = load i64** %Hp_Var %ln1L7 = load i64* %R1_Var %ln1L8 = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1L3( i64* %ln1L4, i64* %ln1L5, i64* %ln1L6, i64 %ln1L7, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1L8 ) nounwind ret void } %Test_sum1_entry_struct = type <{i64, i64, i64}> @Test_sum1_info_itable = constant %Test_sum1_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me9", align 8 define cc 10 void @Test_sum1_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me10" { c1Dh: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 %R2_Arg, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1yn = alloca i64, i32 1 %ln1LG = load i64* %R2_Var store i64 %ln1LG, i64* %ls1yn %ln1LH = load i64** %Sp_Var %ln1LI = getelementptr inbounds i64* %ln1LH, i32 -1 %ln1LJ = ptrtoint i64* %ln1LI to i64 %ln1LK = load i64* %SpLim_Var %ln1LL = icmp ult i64 %ln1LJ, %ln1LK br i1 %ln1LL, label %c1Dw, label %c1Dv c1Dv: %ln1LM = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Di_info to i64 %ln1LN = load i64** %Sp_Var %ln1LO = getelementptr inbounds i64* %ln1LN, i32 -1 store i64 %ln1LM, i64* %ln1LO, !tbaa !1 %ln1LP = load i64* %ls1yn store i64 %ln1LP, i64* %R2_Var %ln1LQ = load i64** %Sp_Var %ln1LR = getelementptr inbounds i64* %ln1LQ, i32 -1 %ln1LS = ptrtoint i64* %ln1LR to i64 %ln1LT = inttoptr i64 %ln1LS to i64* store i64* %ln1LT, i64** %Sp_Var %ln1LU = load i64** %Base_Var %ln1LV = load i64** %Sp_Var %ln1LW = load i64** %Hp_Var %ln1LX = load i64* %R1_Var %ln1LY = load i64* %R2_Var %ln1LZ = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @Test_zdwa_info( i64* %ln1LU, i64* %ln1LV, i64* %ln1LW, i64 %ln1LX, i64 %ln1LY, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1LZ ) nounwind ret void c1Dw: %ln1M0 = load i64* %ls1yn store i64 %ln1M0, i64* %R2_Var %ln1M1 = ptrtoint %Test_sum1_closure_struct* @Test_sum1_closure to i64 store i64 %ln1M1, i64* %R1_Var %ln1M2 = load i64** %Base_Var %ln1M3 = getelementptr inbounds i64* %ln1M2, i32 -1 %ln1M4 = bitcast i64* %ln1M3 to i64* %ln1M5 = load i64* %ln1M4, !tbaa !4 %ln1M6 = inttoptr i64 %ln1M5 to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1M7 = load i64** %Base_Var %ln1M8 = load i64** %Sp_Var %ln1M9 = load i64** %Hp_Var %ln1Ma = load i64* %R1_Var %ln1Mb = load i64* %R2_Var %ln1Mc = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1M6( i64* %ln1M7, i64* %ln1M8, i64* %ln1M9, i64 %ln1Ma, i64 %ln1Mb, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Mc ) nounwind ret void } %c1Di_entry_struct = type <{i64, i64}> @c1Di_info_itable = internal constant %c1Di_entry_struct<{i64 0, i64 32}>, section "X98A__STRIP,__me11", align 8 define internal cc 10 void @c1Di_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me12" { c1Di: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 undef, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ls1yp = alloca i64, i32 1 %ln1MU = load i64** %Hp_Var %ln1MV = getelementptr inbounds i64* %ln1MU, i32 2 %ln1MW = ptrtoint i64* %ln1MV to i64 %ln1MX = inttoptr i64 %ln1MW to i64* store i64* %ln1MX, i64** %Hp_Var %ln1MY = load i64* %R1_Var store i64 %ln1MY, i64* %ls1yp %ln1MZ = load i64** %Hp_Var %ln1N0 = ptrtoint i64* %ln1MZ to i64 %ln1N1 = load i64** %Base_Var %ln1N2 = getelementptr inbounds i64* %ln1N1, i32 35 %ln1N3 = bitcast i64* %ln1N2 to i64* %ln1N4 = load i64* %ln1N3, !tbaa !4 %ln1N5 = icmp ugt i64 %ln1N0, %ln1N4 br i1 %ln1N5, label %c1Ds, label %c1Dp c1Dp: %ln1N6 = ptrtoint [0 x i64]* @base_GHCziInt_I32zh_con_info to i64 %ln1N7 = load i64** %Hp_Var %ln1N8 = getelementptr inbounds i64* %ln1N7, i32 -1 store i64 %ln1N6, i64* %ln1N8, !tbaa !2 %ln1N9 = load i64* %ls1yp %ln1Na = load i64** %Hp_Var %ln1Nb = getelementptr inbounds i64* %ln1Na, i32 0 store i64 %ln1N9, i64* %ln1Nb, !tbaa !2 %ln1Nc = load i64** %Hp_Var %ln1Nd = ptrtoint i64* %ln1Nc to i64 %ln1Ne = add i64 %ln1Nd, -7 store i64 %ln1Ne, i64* %R1_Var %ln1Nf = load i64** %Sp_Var %ln1Ng = getelementptr inbounds i64* %ln1Nf, i32 1 %ln1Nh = ptrtoint i64* %ln1Ng to i64 %ln1Ni = inttoptr i64 %ln1Nh to i64* store i64* %ln1Ni, i64** %Sp_Var %ln1Nj = load i64** %Sp_Var %ln1Nk = getelementptr inbounds i64* %ln1Nj, i32 0 %ln1Nl = bitcast i64* %ln1Nk to i64* %ln1Nm = load i64* %ln1Nl, !tbaa !1 %ln1Nn = inttoptr i64 %ln1Nm to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1No = load i64** %Base_Var %ln1Np = load i64** %Sp_Var %ln1Nq = load i64** %Hp_Var %ln1Nr = load i64* %R1_Var %ln1Ns = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Nn( i64* %ln1No, i64* %ln1Np, i64* %ln1Nq, i64 %ln1Nr, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Ns ) nounwind ret void c1Ds: %ln1Nt = load i64** %Base_Var %ln1Nu = getelementptr inbounds i64* %ln1Nt, i32 41 store i64 16, i64* %ln1Nu, !tbaa !4 %ln1Nv = load i64* %ls1yp store i64 %ln1Nv, i64* %R1_Var %ln1Nw = load i64** %Base_Var %ln1Nx = load i64** %Sp_Var %ln1Ny = load i64** %Hp_Var %ln1Nz = load i64* %R1_Var %ln1NA = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_gc_unbx_r1( i64* %ln1Nw, i64* %ln1Nx, i64* %ln1Ny, i64 %ln1Nz, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1NA ) nounwind ret void } @base_GHCziInt_I32zh_con_info = external global [0 x i64] declare cc 10 void @stg_gc_unbx_r1(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8 %Test_sum_entry_struct = type <{i64, i64, i64}> @Test_sum_info_itable = constant %Test_sum_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me13", align 8 define cc 10 void @Test_sum_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me14" { c1DE: %Base_Var = alloca i64*, i32 1 store i64* %Base_Arg, i64** %Base_Var %Sp_Var = alloca i64*, i32 1 store i64* %Sp_Arg, i64** %Sp_Var %Hp_Var = alloca i64*, i32 1 store i64* %Hp_Arg, i64** %Hp_Var %R1_Var = alloca i64, i32 1 store i64 %R1_Arg, i64* %R1_Var %R2_Var = alloca i64, i32 1 store i64 %R2_Arg, i64* %R2_Var %R3_Var = alloca i64, i32 1 store i64 undef, i64* %R3_Var %R4_Var = alloca i64, i32 1 store i64 undef, i64* %R4_Var %R5_Var = alloca i64, i32 1 store i64 undef, i64* %R5_Var %R6_Var = alloca i64, i32 1 store i64 undef, i64* %R6_Var %SpLim_Var = alloca i64, i32 1 store i64 %SpLim_Arg, i64* %SpLim_Var %F1_Var = alloca float, i32 1 store float undef, float* %F1_Var %D1_Var = alloca double, i32 1 store double undef, double* %D1_Var %XMM1_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1 %F2_Var = alloca float, i32 1 store float undef, float* %F2_Var %D2_Var = alloca double, i32 1 store double undef, double* %D2_Var %XMM2_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1 %F3_Var = alloca float, i32 1 store float undef, float* %F3_Var %D3_Var = alloca double, i32 1 store double undef, double* %D3_Var %XMM3_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1 %F4_Var = alloca float, i32 1 store float undef, float* %F4_Var %D4_Var = alloca double, i32 1 store double undef, double* %D4_Var %XMM4_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1 %F5_Var = alloca float, i32 1 store float undef, float* %F5_Var %D5_Var = alloca double, i32 1 store double undef, double* %D5_Var %XMM5_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1 %F6_Var = alloca float, i32 1 store float undef, float* %F6_Var %D6_Var = alloca double, i32 1 store double undef, double* %D6_Var %XMM6_Var = alloca <4 x i32>, i32 1 store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1 %ln1NI = load i64* %R2_Var store i64 %ln1NI, i64* %R2_Var %ln1NJ = load i64** %Base_Var %ln1NK = load i64** %Sp_Var %ln1NL = load i64** %Hp_Var %ln1NM = load i64* %R1_Var %ln1NN = load i64* %R2_Var %ln1NO = load i64* %SpLim_Var tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @Test_sum1_info( i64* %ln1NJ, i64* %ln1NK, i64* %ln1NL, i64 %ln1NM, i64 %ln1NN, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1NO ) nounwind ret void } @llvm.used = appending global [4 x i8*] [i8* bitcast (%c1Di_entry_struct* @c1Di_info_itable to i8*), i8* bitcast (%c1Bm_entry_struct* @c1Bm_info_itable to i8*), i8* bitcast (%c1Bg_entry_struct* @c1Bg_info_itable to i8*), i8* bitcast (%s1xB_entry_struct* @s1xB_info_itable to i8*)], section "llvm.metadata"
.file "/tmp/ghc19964_0/ghc19964_0.bc" .data .type Test_zdwa_closure,@object # @Test_zdwa_closure .globl Test_zdwa_closure .align 8 Test_zdwa_closure: .quad Test_zdwa_info .size Test_zdwa_closure, 8 .type Test_sum1_closure,@object # @Test_sum1_closure .globl Test_sum1_closure .align 8 Test_sum1_closure: .quad Test_sum1_info .size Test_sum1_closure, 8 .type Test_sum_closure,@object # @Test_sum_closure .globl Test_sum_closure .align 8 Test_sum_closure: .quad Test_sum_info .size Test_sum_closure, 8 .section ".note.GNU-stack","",@progbits .text .type s1xB_info_itable,@object # @s1xB_info_itable .align 8 s1xB_info_itable: .quad 8589934602 # 0x20000000a .quad 8589934593 # 0x200000001 .quad 9 # 0x9 .size s1xB_info_itable, 24 .text .align 8, 0x90 .type s1xB_info,@function s1xB_info: # @s1xB_info # BB#0: # %c1AJ movq %r14, %rax movq 14(%rbx), %rcx cmpq %rsi, %rcx jle .LBB0_3 # BB#1: # %c1AM.lr.ph movq 22(%rbx), %rdx addq %rsi, %rdx movq 6(%rbx), %rdi leaq 16(%rdi,%rdx,4), %rdx .align 16, 0x90 .LBB0_2: # %c1AM # =>This Inner Loop Header: Depth=1 addl (%rdx), %eax movslq %eax, %rax addq $4, %rdx incq %rsi cmpq %rsi, %rcx jg .LBB0_2 .LBB0_3: # %c1AN movq (%rbp), %rcx movq %rax, %rbx jmpq *%rcx # TAILCALL .Ltmp0: .size s1xB_info, .Ltmp0-s1xB_info .text .type Test_zdwa_info_itable,@object # @Test_zdwa_info_itable .globl Test_zdwa_info_itable .align 8 Test_zdwa_info_itable: .quad 4294967301 # 0x100000005 .quad 0 # 0x0 .quad 15 # 0xf .size Test_zdwa_info_itable, 24 .text .globl Test_zdwa_info .align 8, 0x90 .type Test_zdwa_info,@function Test_zdwa_info: # @Test_zdwa_info # BB#0: # %c1Bf leaq -32(%rbp), %rax cmpq %r15, %rax jae .LBB1_1 # BB#2: # %c1Cf movq -8(%r13), %rax movl $Test_zdwa_closure, %ebx jmpq *%rax # TAILCALL .LBB1_1: # %c1Ce movq $c1Bg_info, -8(%rbp) addq $-8, %rbp movq %r14, %rbx jmp stg_ap_0_fast # TAILCALL .Ltmp1: .size Test_zdwa_info, .Ltmp1-Test_zdwa_info .text .type c1Bg_info_itable,@object # @c1Bg_info_itable .align 8 c1Bg_info_itable: .quad 0 # 0x0 .quad 32 # 0x20 .size c1Bg_info_itable, 16 .text .align 8, 0x90 .type c1Bg_info,@function c1Bg_info: # @c1Bg_info # BB#0: # %c1Bg movq %r12, %rax leaq 32(%rax), %r12 cmpq 280(%r13), %r12 jbe .LBB2_1 # BB#8: # %c1Cb movq $32, 328(%r13) jmp stg_gc_unpt_r1 # TAILCALL .LBB2_1: # %c1BR movq 23(%rbx), %rcx movq 7(%rbx), %rsi movq 15(%rbx), %rdi movq $s1xB_info, 8(%rax) movq %rsi, 16(%rax) movq %rcx, 24(%rax) movq %rcx, %rdx sarq $63, %rdx shrq $62, %rdx addq %rcx, %rdx movq %rdi, (%r12) andq $-4, %rdx pxor %xmm0, %xmm0 xorl %eax, %eax testq %rdx, %rdx movq %rax, %rcx jle .LBB2_4 # BB#2: # %c1C0.lr.ph leaq 1552(%rsi,%rdi,4), %rsi pxor %xmm0, %xmm0 xorl %ecx, %ecx .align 16, 0x90 .LBB2_3: # %c1C0 # =>This Inner Loop Header: Depth=1 prefetcht0 (%rsi) movdqu -1536(%rsi), %xmm1 paddd %xmm1, %xmm0 addq $16, %rsi addq $4, %rcx cmpq %rdx, %rcx jl .LBB2_3 .LBB2_4: # %c1C1 movq $c1Bm_info, -24(%rbp) movdqu %xmm0, -16(%rbp) movq -8(%r12), %rdx cmpq %rcx, %rdx jle .LBB2_7 # BB#5: # %c1AM.lr.ph.i subq %rcx, %rdx addq (%r12), %rcx movq -16(%r12), %rax leaq 16(%rax,%rcx,4), %rcx xorl %eax, %eax .align 16, 0x90 .LBB2_6: # %c1AM.i # =>This Inner Loop Header: Depth=1 addl (%rcx), %eax movslq %eax, %rax addq $4, %rcx decq %rdx jne .LBB2_6 .LBB2_7: # %s1xB_info.exit pextrd $3, %xmm0, %ecx addl %eax, %ecx pextrd $2, %xmm0, %eax addl %ecx, %eax pextrd $1, %xmm0, %ecx addl %eax, %ecx movd %xmm0, %eax addl %ecx, %eax movslq %eax, %rbx movq 8(%rbp), %rax addq $8, %rbp jmpq *%rax # TAILCALL .Ltmp2: .size c1Bg_info, .Ltmp2-c1Bg_info .text .type c1Bm_info_itable,@object # @c1Bm_info_itable .align 8 c1Bm_info_itable: .quad 451 # 0x1c3 .quad 32 # 0x20 .size c1Bm_info_itable, 16 .text .align 8, 0x90 .type c1Bm_info,@function c1Bm_info: # @c1Bm_info # BB#0: # %c1Bm movdqu 8(%rbp), %xmm0 pextrd $3, %xmm0, %eax addl %ebx, %eax pextrd $2, %xmm0, %ecx addl %eax, %ecx pextrd $1, %xmm0, %eax addl %ecx, %eax movd %xmm0, %ecx addl %eax, %ecx movslq %ecx, %rbx movq 32(%rbp), %rax addq $32, %rbp jmpq *%rax # TAILCALL .Ltmp3: .size c1Bm_info, .Ltmp3-c1Bm_info .text .type Test_sum1_info_itable,@object # @Test_sum1_info_itable .globl Test_sum1_info_itable .align 8 Test_sum1_info_itable: .quad 4294967301 # 0x100000005 .quad 0 # 0x0 .quad 15 # 0xf .size Test_sum1_info_itable, 24 .text .globl Test_sum1_info .align 8, 0x90 .type Test_sum1_info,@function Test_sum1_info: # @Test_sum1_info # BB#0: # %c1Dh leaq -8(%rbp), %rax cmpq %r15, %rax jae .LBB4_1 # BB#3: # %c1Dw movq -8(%r13), %rax movl $Test_sum1_closure, %ebx jmpq *%rax # TAILCALL .LBB4_1: # %c1Dv movq $c1Di_info, -8(%rbp) leaq -40(%rbp), %rcx cmpq %r15, %rcx jae .LBB4_4 # BB#2: # %c1Cf.i movq -8(%r13), %rcx movq %rax, %rbp movl $Test_zdwa_closure, %ebx jmpq *%rcx # TAILCALL .LBB4_4: # %c1Ce.i movq $c1Bg_info, -16(%rbp) addq $-16, %rbp movq %r14, %rbx jmp stg_ap_0_fast # TAILCALL .Ltmp4: .size Test_sum1_info, .Ltmp4-Test_sum1_info .text .type c1Di_info_itable,@object # @c1Di_info_itable .align 8 c1Di_info_itable: .quad 0 # 0x0 .quad 32 # 0x20 .size c1Di_info_itable, 16 .text .align 8, 0x90 .type c1Di_info,@function c1Di_info: # @c1Di_info # BB#0: # %c1Di movq %r12, %rax leaq 16(%rax), %r12 cmpq 280(%r13), %r12 jbe .LBB5_1 # BB#2: # %c1Ds movq $16, 328(%r13) jmp stg_gc_unbx_r1 # TAILCALL .LBB5_1: # %c1Dp movq $base_GHCziInt_I32zh_con_info, 8(%rax) movq %rbx, 16(%rax) movq 8(%rbp), %rax addq $8, %rbp leaq -7(%r12), %rbx jmpq *%rax # TAILCALL .Ltmp5: .size c1Di_info, .Ltmp5-c1Di_info .text .type Test_sum_info_itable,@object # @Test_sum_info_itable .globl Test_sum_info_itable .align 8 Test_sum_info_itable: .quad 4294967301 # 0x100000005 .quad 0 # 0x0 .quad 15 # 0xf .size Test_sum_info_itable, 24 .text .globl Test_sum_info .align 8, 0x90 .type Test_sum_info,@function Test_sum_info: # @Test_sum_info # BB#0: # %c1DE leaq -8(%rbp), %rax cmpq %r15, %rax jae .LBB6_1 # BB#3: # %c1Dw.i movq -8(%r13), %rax movl $Test_sum1_closure, %ebx jmpq *%rax # TAILCALL .LBB6_1: # %c1Dv.i movq $c1Di_info, -8(%rbp) leaq -40(%rbp), %rcx cmpq %r15, %rcx jae .LBB6_4 # BB#2: # %c1Cf.i.i movq -8(%r13), %rcx movq %rax, %rbp movl $Test_zdwa_closure, %ebx jmpq *%rcx # TAILCALL .LBB6_4: # %c1Ce.i.i movq $c1Bg_info, -16(%rbp) addq $-16, %rbp movq %r14, %rbx jmp stg_ap_0_fast # TAILCALL .Ltmp6: .size Test_sum_info, .Ltmp6-Test_sum_info .type __stginit_Test,@object # @__stginit_Test .bss .globl __stginit_Test .align 8 __stginit_Test: .size __stginit_Test, 0
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs