Hi, cool release! I've updated beignet to 1.0.0 for Fedora 20, 21 and Rawhide (22) today. Let's use it!
https://bugzilla.redhat.com/show_bug.cgi?id=1142892 On Fri, Nov 14, 2014 at 9:09 AM, Zhigang Gong <[email protected]> wrote: > Beignet 1.0.0 (2014-11-14) > ========================= > > Beignet development team is proud to announce that Beignet 1.0.0 > has been released. This is an important milestone after about two > years of development. Thanks for everyone who helped us to improve > it to relatively mature state. > > Now beignet supports from 3rd to 5th Generation Intel Core Processors. > Besides the Broadwell support, this release also bring major performance > improvement for many workloads and fixed some bugs. We observed 10% to > more than 4x performance gain for some OpenCV 3.0 benchmarks. > > The highlighted items are as below: > > 1. Added 5th generation Intel Core Processors (BDW) support. > 2. Optimized constant buffer load. > 3. Implement basic transformation from unstructurized control flow to > structurized control flow to improve performance. > 4. Fixed some memory leak bugs. > 5. Implemented missing constant expression handling. > 6. Added Clang/ICC compiler support for Beignet build. > 7. Optimized unaligned char/short vector load. > 8. Speed up kernel compiling time by move built-in functions support > from header file into linked library. > 9. Implemented some missing llvm intrinsics. > 10. Optimized loop unrolling pass, boosted some OpenCV benchmarks. > 11. Several other bug fixes since last release. For OpenCV 3.0 / > OpenCV 2.4/piglit test suite, Beignet's pass rates are all > above 99%. > > Git tag: Release_v1.0.0 > Gitweb URL: http://cgit.freedesktop.org/beignet > https://01.org/sites/default/files/beignet-1.0.0-source.tar.gz > > md5sum: bfd755904c332cdd285d6058f5f3de8c Beignet-1.0.0-Source.tar.gz > sha1sum: a2b0eb53e5f9a6055cd656531532a4c6ae03fbb0 Beignet-1.0.0-Source.tar.gz > sha256sum: e30c4d0f4c8917fa0df2467b2d70a4ee524f28d54c42c582262d5f08928ea543 > Beignet-1.0.0-Source.tar.gz > > ----------------------------------------------------------------- > > Changes since 0.9.3: > > Andreas Beckmann (2): > fix some typos > use env to set environment variables for GBE_BIN_GENERATER > > Chuanbo Weng (1): > utest: add new test that trigger an assignment operation bug in if. > > Guo Yejun (18): > remove requirment as drm master in non-x environment > remove requirment as drm master in non-x environment > free build_log when the cl program is released > free build_log when the cl program is released > fix three memory leaks > clean llvm resource in compiler (libgbe.so) > fix three memory leaks > clean llvm resource in compiler (libgbe.so) > delete GEPInst when it is no longer used > delete GEPInst when it is no longer used > remove dependency for non-X runtime environment > remove dependency for non-X runtime environment > support CL_MEM_USE_HOST_PTR with userptr for cl buffer > enable CL_DEVICE_HOST_UNIFIED_MEMORY when userptr is supported > add test for cl buffer created with CL_MEM_USE_HOST_PTR > fix issue to create cl image from libva with non-zero offset > add test for clCreateImageFromLibvaIntel > use posix_memalign instead of aligned_alloc to be more compatible > > Junyan He (54): > Fix the global string bug for printf. > Fix a bug for runtime_barrier_list.cpp, event array out of bound > Fix a bug for runtime_barrier_list.cpp, event array out of bound > Fix the global string bug for printf. > Add common define header files to initialize the libocl > Add the async module into the libocl > Add the atomic module into the libocl > Add the geometric module into the libocl > Add the image module into the libocl > Add the misc module into the libocl > Add the sync module into the libocl > Add printf module into libocl > Add vload module into the libocl > Add thw workitem module into the libocl > Add the convert and as modules into the libocl > Add the gen_vector script into the libocl > Add the common module into the libocl as template > Add the integer module into libocl as template > Add the math function into libocl as template > Add the relational module into libocl as template > Add the ocl_defines header file into libocl > Add memcpy, memset and barrier bitcode files into libocl > Add the bit code linker into the module pass. > Enable libocl and disable the usage of the old huge header. > Use the PCH to accelerate the parsing speed of the ocl.h > Delete all the unused files of old huge header. > Add the missing function prototypes of any() and atom_add() > Add uncompatible PCH Options to avoid compiling failure. > Fix the global string bug for printf. > Add copyright header for all libocl files. > Fix the issue of -cl-std=CLX.X option. > Fix the issue of -cl-std=CLX.X option. > Add the switch logic for math conformance fast path > Modify the CMakeList to use the internal PCH first. > Fix the bug of LLVM_LFLAGS fail to set > Add long support for printf > BDW: Add gen8 surface state struct. > BDW: refine the gen8_surface_state_t. > BDW: Add function intel_gpgpu_setup_bti for gen8. > BDW: Correct surface base address set in setup bti. > BDW: Add function intel_gpgpu_bind_buf for gen8. > Add sampler state and tile define for gen8. > Modify the bind sampler logic for gen8 > BDW: Add gen8 into intel_driver_init > Refine the shared function ID define. > Add the libdrm version check. > Let the failure of intel_drm lib's check as a FATAL_ERROR > Fit the printf bug in loop > Fix the bug of 1D array slice pitch > Add the test case for image 1d array fill > Add the test case for image 2d array fill > Add the disasm support for Gen8 > Fix the compare_image_2d_and_1d_array test case bug > Fix the bug of multi-thread crash > > Luo (5): > remove lspci, gbe_bin_genenrater would generator llvm binary by default. > remove lspci, gbe_bin_genenrater would generator llvm binary by default. > fix piglit get kernel info FUNCTION ATTRIBUTE fail. > fix piglit get kernel info FUNCTION ATTRIBUTE fail. > add opencl-1.2 builtin function popcount. > > Luo Xionghu (28): > fix the relational built-in vector function regression. > fix opencv_test_imgproc subcase OCL_ImgProc/Accumulate.Mask regression. > fix piglit cl-api-get-program-info fail. > fix piglit cl-api-get-program-info fail. > fix clGetKernelWorkGroupInfo built-in kernel fail. > fix piglit cl-api-set-kernel-arg fail. > fix clGetKernelWorkGroupInfo built-in kernel fail. > fix piglit cl-api-set-kernel-arg fail. > fix bin/cl-program-tester tests/cl/program/execute/attributes.cl > regression. > fix bin/cl-program-tester tests/cl/program/execute/attributes.cl > regression. > remove the LinkOnceAnyLinkage since the libocl is introduced. > improve the build performance of vector type built-in function. > fix one bug at cl_get_kernel_workgroup_info. > fix utest memory leak. > Add Gen IR WHILE. > add handleSelfLoopNode to insert while instruction on Gen IR level. > Use instruction WHILE to manipulate structure. > add utest popcount for all types. > use global flag 0.0 to control unstructured simple block. > add llvm Intrinsic call support. > add utest compiler_overflow for llvm intrinsic function. > enable llvm intrinsic call usub_with_overflow funtion. > add utest for llvm intrinsic call usub_with_overflow funtion. > enable llvm intrinsic call bswap function. > add utest function bswap. > fix bswap kernel function type issue. > fix piglit clCreateProgramWithBinary fail. > fix a bug in clCompileProgram(). > > LuoXionghu (5): > add platform info in the gen binary code. > add utest load_program_from_gen_bin. > add platform info in the gen binary code. > add utest load_program_from_gen_bin. > improve the build performance of vector type built-in function. > > Lv Meng (6): > improve the clEnqueueCopyBufferRect performance in some cases > Fix compile error for ICC compiler > Fix compile errors for CLANG compiler > Fix compile warnings for ICC compiler > Fix compile warnings for CLANG compiler > Enable ICC and CLANG compiler for beignet > > Meng Mengmeng (3): > add beignet GIT_HAL1 if there is .git directory > create GIT_SHA1 without any dependency > add building dependency GIT_SHA1 > > Rebecca Palmer (7): > Fail gracefully on unsupported hardware > Fail gracefully on unsupported hardware > GBE: fix bug in pow()/pown(). > GBE: fix bug in erf()/erfc(). > GBE: fix bug in tgamma(). > utests: fix bugs in builtin_pow(). > utests: fix bugs in builtin_tgamma(). > > Ruiling Song (43): > GBE: Fix builtin tanpi. > GBE: Fix builtin tanpi. > GBE: Use varying register to save one instruction > GBE: Optimize constant load with sampler. > GBE: align the fields in union ImageInfoKey. > utests: Fix a bug in image_1D_buffer. > GBE: align the fields in union ImageInfoKey. > utests: Fix a bug in image_1D_buffer. > runtime: set correct state for constant buffer on hsw. > runtime: set correct state for constant buffer on hsw. > GBE: Refine bti usage in backend & runtime. > GBE: Handle bti allocation for internal buffer used by printf. > GBE: remove some useless code for getting printf buffer address. > GBE: Fix a warning in getConstantPointerRegister. > GBE: Fix type size for vector3 > GBE: initialize BTI structure to zero. > GBE: Fix a bug in gatherBTI. > cmake: Fix a license issue. > GBE: clear deadprintfs when current function is done. > GBE: refine the llvm multi-thread related code. > GBE: Fix type size for vector3 > cmake: Fix a license issue. > GBE: clear deadprintfs when current function is done. > GBE: refine the llvm multi-thread related code. > GBE: Optimize constant load with sampler. > GBE: Refine bti usage in backend & runtime. > GBE: Handle bti allocation for internal buffer used by printf. > GBE: initialize BTI structure to zero. > GBE: Fix a bug in gatherBTI. > GBE/libocl: Fix sub_sat corner case. > GBE: Fix sub_sat corner case. > GBE: Output linkModules's error message. > GBE/libocl: Add __gen_ocl_get_timestamp() to get timestamp. > GBE: Fix a bug when setting flag register > GBE: add legalize pass to handle wide integers > Re-apply "improve the build performance of vector type built-in > function." > GBE: workaround register allocation fail caused by custom loop unroll. > GBE: Fix live range for temporary register in replaceReg > GBE: Fix kernel argument size for vector3 > utests: add a test to trigger cl_float3 bug in clSetKernelArg. > GBE: Fix a bitcast from float vector to wide interger issue in legalize > pass. > GBE: Do topological sorting of basicblocks. > docs: update mixed_buffer_pointer document. > > Yang Rong (54): > Add some hsw missed pci ids (reserved PCI IDs). > Add some hsw missed pci ids (reserved PCI IDs). > Fix a utest compiler_async_stride_copy typo. > Fix a utest compiler_async_stride_copy typo. > Only compiler X11 files and do X11 operations when found X11. > Only compiler X11 files and do X11 operations when found X11. > Update Beignet.mdwn X11 dependency. > Two minor fix. > Fix two bugs. > Update Beignet.mdwn X11 dependency. > Two minor fix. > Fix two bugs. > Update README for the command parser in drm kernel. > Update README for the command parser in drm kernel. > Update license disclaimer. > Update license disclaimer. > Avoid use GenNativeInstruction directly out of GenEncode and > gen_insn_compact. > BDW: Add BDW pci ids and BDW device struct. > BDW: Add BDW instruction define. > BDW: Add Gen8Encoder and Gen7Encoder. > BDW: Add class Gen8Context. > BDW: Pass Jip and Uip when patchJMPI. > BDW: Refine intel_gpgpu_setup_bti and add intel_gpgpu_set_base_address > for BDW. > BDW: add some BDW function. > BDW: Fix Pointer argument curbe alloce size. > BDW: enable SLM in BDW. > BDW: Fix unsample bug. > BDW: Refine BDW's int 32*32 multiply. > BDW: BDW don't need add slm offset, remove it. > BDW: Add BDW Device id to gen binary generater and binary serialize in > backend. > BDW: Add device's sub slice field, for cl_get_kernel_max_wg_sz. > BDW: Correct scratch buffer of BDW. > BDW: Forgot to set UIP of else in BDW. > BDW: Correct BDW device name. > BDW: Fix a scaler int 32*32 bug. > BDW: Need not restore SLM setting in BDW. > BDW: Correct stack setting in BDW. > Fix a segment fault. > Fix a HSW regression. > Fix memcpy and memset bug. > Fix HSW thread_n <= 64 assert. > Fix a HSW constant buffer regression. > BDW: Change BDW's max work group size to 512. > BDW: Fix load/store half error. > BDW: Also need set Shader Channel Select for constant buffer in BDW. > Fix a upsample regression. > Fix a HSW regression. > Refine the the error handling in function > cl_command_queue_ND_range_gen7. > Refine the intel gpgpu delete. > Fix a size assert when setup bti. > BDW: Fix bwd 32*32 scalar multiplication bug. > IVB/HSW/BYT: Revert the Dynamic state Base Addr and relative buffers > address setting. > BDW: Set the URB/REST size to 384K/384K when SLM disable. > BDW: Change the default tiling mode to TILING_Y on BDW. > > Yichao Yu (1): > Use ${PYTHON_EXECUTABLE} to run python scripts. > > Yongjia Zhang (6): > Add Gen IR IF, ELSE and ENDIF > Add Gen instruction 'else' > Add structure identification on ir level > Use instruction if else and endif manipulate structures > Enable structural analysis > GBE: fix empty block disassemble bug. > > Zhenyu Wang (5): > Make use of write enable flag for mem bo map > Clear batch buffer pointer after unmap > Use pread/pwrite for buffer enqueue read/write > Fix AUX buffer for page alignment > Remove intel_gpgpu_check_binded_buf_address() > > Zhigang Gong (111): > Build: Change versioning policy. > runtime/driver: refine error handlings. > runtime: fix some subtle event bugs. > runtime/driver: refine error handlings. > runtime: fix some subtle event bugs. > gbe: add the new else instruction to the assert checking. > docs: add a NEWS document to point to the release notes pages. > docs: add a NEWS document to point to the release notes pages. > Bump to 0.9.2. > NEWS: update for 0.9.2. > GBE: cleanup image base index related code. > GBE: refine post register allocation scheduling for global buffers. > GBE: refactor the immediate class to support vector data type. > GBE: simplify processConstant. > GBE: complete constant expression processing. > GBE: enable constant expression processing. > utest: add new test for constant expression processing. > GBE: Reduce random behaviour of the code generation > GBE: adjust preferred vector length. > GBE: refactor the immediate class to support vector data type. > GBE: simplify processConstant. > GBE: complete constant expression processing. > GBE: enable constant expression processing. > utest: add new test for constant expression processing. > Revert "GBE: refine post register allocation scheduling for global > buffers." > utests: fix two utest bugs. > GBE: fix error in the rootn fastpath function for some special input. > utests: fix two utest bugs. > GBE: fix error in the rootn fastpath function for some special input. > Add new vload benchmark/test case. > GBE: optimize unaligned char and short data vector's load. > GBE: relax the batch byte/short load vector size restrication. > GBE: refine the unaligned data gathering. > GBE: adjust preferred vector length. > GBE: fixup/refine a bug for image1D array's extra binding index > handling. > GBE: remove the user defined macro cl_khr_fp64. > GBE: avoid one optimization pass to generate wide integer. > GBE: avoid one optimization pass to generate wide integer. > GBE: fix a bug with LLVM 3.3. > GBE: fallback if we get a wider than i64 constant. > GBE: fix a bug with LLVM 3.3. > GBE: fallback if we get a wider than i64 constant. > GBE: cleanup image base index related code. > GBE: fixup/refine a bug for image1D array's extra binding index > handling. > build: fix a CXXFLAGS override bug in backend directory. > GBE: fix some predfeined OCL macros. > Runtime: Implement clGetExtensionFunctionAddressForPlatform. > Runtime: Implement clGetExtensionFunctionAddressForPlatform. > GBE/libocl: fix the wrong prototype of scalar native_powr. > GBE: fix bugs when handling -cl-std option. > GBE: fix bugs when handling -cl-std option. > GBE/libocl: Added one missing prototype fma(). > GBE: don't return error if we get an empty module. > GBE: Fix a potential segfault. > GBE: Fix a potential segfault. > GBE: fix a potential memory leak bug. > GBE: fix a potential memory leak bug. > GBE: don't enable double by default. > GBE: don't enable double by default. > GBE: fix multiple files compilation bugs. > runtime: fix program binary type bug. > runtime: fix build status handling. > runtime: fix program binary type bug. > runtime: fix build status handling. > GBE: fix multiple files compilation bugs. > Update readme. > Update readme. > Document fixup. > Remove out-of-date document. > Bump to 0.9.3. > Remove out-of-date document. > Update NEWS. > GBE/libocl: add missing vector builtin definition for fma. > GBE/libocl: fix a regression after libocl change. > Revert "improve the build performance of vector type built-in function." > GBE/libocl: fix build dependency issue. > GBE: fix a loop header file including bug. > GBE: structurized loop exit need an extra branching instruction when do > reordering. > GBE: fix a bug in legalize pass. > GBE: do intrinsics lowering pass earlier. > GBE: fix a legalize pass bug when bitcast wide integer to incompaitble > vector. > GBE: Add a customized loop unrolling handling mechanism. > GBE: disable custom loop unroll for LLVM 3.3/3.4. > GBE: add Selection instruction handler at legalize pass. > GBE: increase maximum src/dst operands to 32. > GBE: add basic PHINode support in legalize pass. > GBE: fix regression caused by simple block optimization. > GBE: handle dead loop BBs in liveness analysis. > GBE: set default address space to -1 to avoid incorrect unroll hint. > GBE: fix a wrong type of cl_device_info. > utest: change the box_blur_image to be identical to box_blur. > utests: replace the nodistriutable picture. > GBE: fix disassembly bug. > GBE: fix a bool handling bug when SEL on a uniform bool variable. > GBE: Support more instructions for constant expression handling. > GBE: remove useless debug info. > Revert "add test for clCreateImageFromLibvaIntel" > Revert "fix issue to create cl image from libva with non-zero offset" > utests: remove all shader toy test cases. > License: adjust all license version to LGPL v2.1+. > GBE: fix relocatable issue for pch file. > Revert "BDW: Change the default tiling mode to TILING_Y on BDW." > GBE: fix one double related bugs for post register scheduling. > update some documents. > runtime: fix one bug in BDW image. > Update documents. > runtime: refine version handling. > runtime: fix bug in cl_enqueue_read_buffer. > runtime: disable userptr due to random fail. > GBE: work around error reporting for unresolved symbols > Bump to 1.0.0. > > > -- > Zhigang Gong, > Thanks. > _______________________________________________ > Beignet mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/beignet -- -Igor Gnatenko _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
