AndrewZhaoLuo opened a new pull request #9223: URL: https://github.com/apache/tvm/pull/9223
Got to benchmark this so until then WIP Test code: See https://github.com/AndrewZhaoLuo/TVM-Sandbox/blob/fd08f88c12c9562a0e0f72dd7ff60f398452de35/codegen/test_export_to_ll.py#L8 By setting the environment flag the generated LLVM ASM code is different: Without fastmath: ``` ; Function Attrs: nofree noinline norecurse nounwind define internal fastcc void @test_add_compute_(i8* noalias nocapture align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias nocapture readonly align 128 %2) unnamed_addr #1 { entry: %3 = bitcast i8* %1 to <2 x float>* %4 = load <2 x float>, <2 x float>* %3, align 128, !tbaa !114 %5 = bitcast i8* %2 to <2 x float>* %6 = load <2 x float>, <2 x float>* %5, align 128, !tbaa !117 %7 = fadd <2 x float> %4, %6 %8 = bitcast i8* %0 to <2 x float>* store <2 x float> %7, <2 x float>* %8, align 128, !tbaa !120 ret void } ``` With fastmath: ``` ; Function Attrs: nofree noinline norecurse nounwind define internal fastcc void @test_add_compute_(i8* noalias nocapture align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias nocapture readonly align 128 %2) unnamed_addr #1 { entry: %3 = bitcast i8* %1 to <2 x float>* %4 = load <2 x float>, <2 x float>* %3, align 128, !tbaa !114 %5 = bitcast i8* %2 to <2 x float>* %6 = load <2 x float>, <2 x float>* %5, align 128, !tbaa !117 %7 = fadd fast <2 x float> %6, %4 %8 = bitcast i8* %0 to <2 x float>* store <2 x float> %7, <2 x float>* %8, align 128, !tbaa !120 ret void } ``` Note the `fast` tag to the `fadd` operations now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
