The bug in llvm has been fixed, can you confirm lp_test_format passes again?
Roland Am 06.12.2016 um 19:00 schrieb Roland Scheidegger: > Ok, here is the bug: > https://llvm.org/bugs/show_bug.cgi?id=31296 > > Roland > > Am 06.12.2016 um 18:47 schrieb Roland Scheidegger: >> Actually I've verified this quickly with llc. >> With -mattr=xop, it produces >> >> fetch_r32_float_float: # @fetch_r32_float_float >> .cfi_startproc >> # BB#0: # %entry >> vpermilps $65, .LCPI0_0(%rip), %xmm0 # xmm0 = mem[1,0,0,1] >> vmovaps %xmm0, (%rdi) >> retq >> >> which is very obviously garbage (it even managed to optimize out the >> actual load, just the constants are left...). So this is a llvm bug with >> xop indeed. I'm going to a file a bug, but in the interim I don't know >> what mesa should do - this is one reason why we didn't want to enable >> features which we didn't actually test previously (that said, if we >> don't enable them, the llvm bugs we hit will probably never get >> fixed...). We could of course force-disable xop (albeit in theory it's >> nice - we really can make use of that damn missing vector shift which >> otherwise requires avx2). >> >> Roland >> >> >> Am 06.12.2016 um 17:34 schrieb Roland Scheidegger: >>> Interesting. Can you show the IR / assembly? I don't get any failures here. >>> I'm wondering if it's trying to use XOP and there's some bug there (or >>> we're relying on undefined behavior which doesn't happen to work with >>> it). Albeit since there's not actually any conversion involved in this >>> case (float 1 channel -> float 4 channel) the assembly here looks >>> trivial and I can't see how it could go wrong. >>> >>> I get (with a couple days old llvm): >>> define void @fetch_r32_float_float(<4 x float>*, i8*, i32, i32, { [2048 >>> x i32], [128 x i64] }*) { >>> entry: >>> %5 = getelementptr i8, i8* %1, i32 0 >>> %6 = bitcast i8* %5 to i32* >>> %7 = load i32, i32* %6 >>> %8 = zext i32 %7 to i128 >>> %9 = bitcast i128 %8 to <4 x float> >>> %10 = shufflevector <4 x float> %9, <4 x float> <float 0.000000e+00, >>> float 1.000000e+00, float undef, float undef>, <4 x i32> <i32 0, i32 4, >>> i32 4, i32 5> >>> store <4 x float> %10, <4 x float>* %0 >>> ret void >>> } >>> >>> fetch_r32_float_float: >>> 0: pushq %rbp >>> 1: movq %rsp, %rbp >>> 4: movl (%rsi), %eax >>> 6: vmovq %rax, %xmm0 >>> 11: movabsq $140375561531392, %rax >>> 21: vmovaps (%rax), %xmm1 >>> 25: vshufps $0, %xmm1, %xmm0, %xmm0 >>> 30: vshufps $72, %xmm1, %xmm0, %xmm0 >>> 35: vmovaps %xmm0, (%rdi) >>> 39: popq %rbp >>> 40: retq >>> >>> The only thing I can think of is maybe the load/zext in combination with >>> the shuffle going wrong - the shuffle combiner in llvm has a couple xop >>> cases. >>> >>> fwiw printing of the values is a bit suboptimal, the "packed" 00 00 80 >>> bf value really is a float 0xbf800000 and you don't see the other >>> channels at all albeit in this case there aren't any... >>> >>> Roland >>> >>> Am 06.12.2016 um 07:27 schrieb Michel Dänzer: >>>> On 06/12/16 02:39 AM, Tim Rowley wrote: >>>>> Use llvm provided API based on cpuid rather than our own >>>>> manually mantained list of mattr enabling/disabling. >>>> >>>> This change broke the llvmpipe unit test lp_test_format for me: >>>> >>>> Testing PIPE_FORMAT_R32_FLOAT (float) ... >>>> FAILED >>>> Packed: 00 00 00 00 >>>> Unpacked (0,0): 1 0 0 1 obtained >>>> 0 0 0 1 expected >>>> FAILED >>>> Packed: 00 00 80 bf >>>> Unpacked (0,0): 1 0 0 1 obtained >>>> -1 0 0 1 expected >>>> >>>> >>>> This is on: >>>> >>>> processor : 0 >>>> vendor_id : AuthenticAMD >>>> cpu family : 21 >>>> model : 48 >>>> model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G >>>> stepping : 1 >>>> microcode : 0x6003106 >>>> cpu MHz : 4100.000 >>>> cache size : 2048 KB >>>> physical id : 0 >>>> siblings : 4 >>>> core id : 0 >>>> cpu cores : 2 >>>> apicid : 16 >>>> initial apicid : 0 >>>> fpu : yes >>>> fpu_exception : yes >>>> cpuid level : 13 >>>> wp : yes >>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>>> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt >>>> pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid >>>> aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 >>>> popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm >>>> sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce >>>> nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate >>>> vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale >>>> vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov >>>> bugs : fxsave_leak sysret_ss_attrs null_seg >>>> bogomips : 8200.42 >>>> TLB size : 1536 4K pages >>>> clflush size : 64 >>>> cache_alignment : 64 >>>> address sizes : 48 bits physical, 48 bits virtual >>>> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13] >>>> >>>> >>>> >>> >> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev