alamb opened a new issue, #9561:
URL: https://github.com/apache/arrow-datafusion/issues/9561

   > Add a section to the documentation explaining that PGO can help up 
substantially (25%) and maybe offer some tips for users to use it?
   
   Yes, it would be a great option. It requires almost no resources to maintain 
(write once and link to this discussion for the results). In this case, users 
who are interested in optimizing `arrow-datafusion` more will be able to use 
this information as an additional optimization opportunity. I have several 
examples of how such documentation can be written (it's for applications but 
anyway - for a library case it should look a similar way):
   
   * ClickHouse: 
https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
   * Databend: https://databend.rs/doc/contributing/pgo
   * Vector: https://vector.dev/docs/administration/tuning/pgo/
   * Nebula: 
https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
   * GCC: Official [docs](https://gcc.gnu.org/install/build.html), section 
"Building with profile feedback" (even AutoFDO build is supported)
   * Clang:
     - https://llvm.org/docs/HowToBuildWithPGO.html
     - https://llvm.org/docs/AdvancedBuilds.html
   * Rustc: 
https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
   * tsv-utils: 
https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md
   
   > Provide pre-gathered PGO data somehow, so users could build DataFusion 
with profiles guided from TPCH (or clickbench).
   
   Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO 
profiles have multiple issues - e.g. incompatibilities between different 
compiler versions, a profile skew (when a PGO profile is gathered for an older 
version of the code. When time flies, pre-gathered PGO profiles become less and 
less efficient so some kind of regular PGO profile regeneration is required). 
   
   I could suggest another similar way - integrate into the build scripts the 
way to build the library with enabled PGO (based on some workload like TPCH, 
Clickbench, any other target workload, or any combination of them - it's up to 
discussion). On the one hand, users will be able to build the PGO-optimized 
version of the library. On another hand, you won't waste your maintenance 
resources on maintaining always up-to-date pre-gathered PGO profiles (however, 
this process can be simplified with CI).
   
   Some examples of PGO build integration into the build scripts:
   
   * Rustc: a CI 
[tool](https://github.com/rust-lang/rust/tree/master/src/tools/opt-dist) for 
the multi-stage build
   * GCC:
     - Official [docs](https://gcc.gnu.org/install/build.html), section 
"Building with profile feedback" (even AutoFDO build is supported)
     - A [part](https://github.com/gcc-mirror/gcc/blob/master/configure#L7896) 
in a "wonderful" `configure` script.
   * Clang:
     - [Docs](https://llvm.org/docs/HowToBuildWithPGO.html)
     - [MinGW build 
script](https://github.com/msys2/MINGW-packages/commit/4dd91d1d4dfef17f1f451c3a8f59303be855e4b5)
   * Python:
     - CPython: 
[README](https://github.com/python/cpython#profile-guided-optimization)
     - Pyston: [README](https://github.com/pyston/pyston#building)
   * Go: [Bash 
script](https://github.com/golang/go/blob/master/src/cmd/compile/profile.sh)
   * Swift: [CMake 
script](https://github.com/apple/swift/blob/main/CMakeLists.txt#L364)
   * V8: [Bazel flag](https://github.com/v8/v8/blob/main/BUILD.gn#L184)
   * ChakraCore: 
[Scripts](https://github.com/chakra-core/ChakraCore/tree/master/Build/scripts/pgo)
   * Chromium: 
[Script](https://chromium.googlesource.com/chromium/src/build/config/+/refs/heads/main/compiler/pgo/BUILD.gn)
   * Firefox: 
[Docs](https://firefox-source-docs.mozilla.org/build/buildsystem/pgo.html)
      - Thunderbird has PGO support too
   * PHP - [Makefile 
command](https://github.com/php/php-src/blob/master/build/Makefile.global#L138) 
and old Centminmod 
[scripts](https://github.com/centminmod/php_pgo_training_scripts)
   * MySQL: [CMake 
script](https://github.com/mysql/mysql-server/blob/8.0/cmake/fprofile.cmake)
   * YugabyteDB: [GitHub 
commit](https://github.com/yugabyte/yugabyte-db/commit/34cb791ed9d3d5f8ae9a9b9e9181a46485e1981d)
   * FoundationDB: 
[Script](https://github.com/apple/foundationdb/blob/1a6114a66f3de508c0cf0a45f72f3687ba05750c/contrib/generate_profile.sh)
   * Zstd: 
[Makefile](https://github.com/facebook/zstd/blob/dev/programs/Makefile#L232)
   * [Foot](https://codeberg.org/dnkl/foot): 
[Scripts](https://codeberg.org/dnkl/foot/src/branch/master/pgo)
   * Windows Terminal: [GitHub 
PR](https://github.com/microsoft/terminal/pull/10071)
   * Pydantic-core: [GitHub 
PR](https://github.com/pydantic/pydantic-core/pull/741)
   * file.d: [GitHub PR](https://github.com/ozontech/file.d/pull/469)
   * OceanBase: [CMake 
flag](https://github.com/oceanbase/oceanbase/blob/master/cmake/Env.cmake#L55)
   * ISPC: [CMake scipts](https://github.com/ispc/ispc/tree/main/superbuild)
   * NodeJS: [Configure 
script](https://github.com/nodejs/node/commit/9be15559cc0bfe506d9cdfba4ad0f4beacf5ce17)
   * Android Open Source Project (AOSP):
     - [Official documentation](https://source.android.com/docs/core/perf/pgo)
     - Committed PGO profiles: 
[repository](https://android.googlesource.com/toolchain/pgo-profiles/+/refs/heads/main)
   * DMD: [Custom build 
rule](https://github.com/dlang/dmd/blob/master/compiler/src/build.d#L553)
   * LDC: [GitHub 
action](https://github.com/ldc-developers/ldc/blob/master/.github/actions/2a-build-pgo/action.yml)
   * tsv-utils: 
[Makefile](https://github.com/eBay/tsv-utils/blob/master/makefile#L56)
   * Erlang OTP: 
[Makefile](https://github.com/erlang/otp/blob/master/Makefile.in#L484)
   * Clingo (PGO enabled only in Spack): [Package 
recipe](https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/clingo-bootstrap/package.py#L96)
   * SWI-Prolog:
     - 
[Script](https://github.com/SWI-Prolog/swipl-devel/blob/master/scripts/pgo-compile.sh)
     - [CMake 
module](https://github.com/SWI-Prolog/swipl-devel/blob/master/cmake/PGO.cmake)
   * hck: [Justfile](https://github.com/sstadick/hck/blob/master/justfile#L27)
   
   If you have some prebuilt versions of the library (e.g. a Python wheel), you 
can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, 
Clickbench, etc.). As an example - Pydantic-core: [GitHub 
PR](https://github.com/pydantic/pydantic-core/pull/741).
   
   > In general I don't think many organizations will set up PGO with their 
workload (as they often don't have an easy-to-run / representative benchmark 
available during builds or perhaps don't want to slow down their build process 
by running benchmarks at the same time)
   
   It's a pity but I agree with you. Current PGO adoption across the industry 
is low (except for companies like Google and Facebook - which use PGO and 
similar optimization technologies like LLVM BOLT). This is the situation that I 
am trying to change by showing positive PGO effects for different applications.
   
   _Originally posted by @zamazan4ik in 
https://github.com/apache/arrow-datafusion/discussions/9507#discussioncomment-8730858_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to