jackylee-ch opened a new pull request, #12100:
URL: https://github.com/apache/gluten/pull/12100

   ## What changes are proposed in this pull request?
   
   When loading `libgluten.dylib` on macOS arm64, the JVM aborts during the
   `System.loadLibrary` call with:
   
   ```
   ERROR: flag 'flagfile' was defined more than once
          (in files '.../gflags.cc' and '.../gflags.cc')
          ... is being linked both statically and dynamically
   ```
   
   The root cause is dyld weak-symbol coalescing across two dylibs that each
   contain their own copy of gflags:
   
   | Dylib              | gflags origin                                         
                                     |
   
|--------------------|--------------------------------------------------------------------------------------------|
   | `libvelox.dylib`   | static `libgflags.a` baked in via Folly (Velox builds 
Folly with `-DGFLAGS_SHARED=FALSE`) |
   | `libgluten.dylib`  | dynamic `libgflags.dylib` pulled transitively through 
`glog::glog` / `Folly::folly` `INTERFACE_LINK_LIBRARIES` |
   
   On macOS, dyld coalesces the weak C++ function-local-static guard inside
   `FlagRegistry::GlobalRegistry()` between the two dylibs. Both copies then
   register `--flagfile` against the same registry and gflags' duplicate-flag
   check aborts the process before any user code runs.
   
   Linux is unaffected because (a) ELF does not coalesce weak symbols across
   shared objects by default, and (b) Gluten already uses `symbols.map` to
   control the export surface of `libgluten.so`. macOS has no version-script
   equivalent, so this PR uses a different mechanism. All Darwin-specific
   logic is gated on `APPLE` / `CMAKE_SYSTEM_NAME STREQUAL "Darwin"`; Linux
   and Windows build and link semantics are untouched.
   
   The fix has five parts that all need to be in place to fully eliminate the
   abort across the production load path *and* the test executables:
   
   1. **`cpp/CMake/Findglog.cmake`** — On Darwin, prefer the static
      `libglog.a` and force `gflags_component=static`. When both archives
      are available we replace the imported `google::glog` target with an
      `INTERFACE IMPORTED` target whose `INTERFACE_LINK_OPTIONS` carry
      `LINKER:-load_hidden,<libglog.a>` and
      `LINKER:-load_hidden,<libgflags.a>`. `-load_hidden` is the Apple ld64
      flag that gives every symbol pulled from the archive *hidden*
      visibility, which prevents dyld from coalescing them across dylibs.
      We resolve the static gflags archive path by inspecting
      `IMPORTED_LOCATION_RELEASE / _NOCONFIG / *` on
      `gflags::gflags_static`.
   
   2. **`cpp/core/utils/GflagsStubDarwin.cc` (new)** — Exports a no-op
      `google::HandleCommandLineHelpFlags` with default visibility. Velox's
      archive of gflags pulls `gflags.cc.o` but never references
      `gflags_reporting.cc.o`, so once `-load_hidden` makes the real copy
      invisible, the dynamic linker would fail to resolve this symbol at
      dlopen time. The stub resolves it from `libgluten.dylib` instead.
   
   3. **`cpp/core/CMakeLists.txt`** — Conditionally adds the stub to the
      `gluten` target on `APPLE`.
   
   4. **`cpp/velox/CMakeLists.txt`** — On Darwin, links `google::glog` as
      `PUBLIC` on the `velox` target so its `INTERFACE_LINK_OPTIONS`
      propagate through `libvelox.dylib` to test binaries and benchmarks.
      The default PRIVATE linkage on `gluten` is intentional for Linux
      (`symbols.map` handles it), but on Darwin `Folly::folly`'s
      `INTERFACE_LINK_LIBRARIES` pulls `libgflags.a` into `libvelox.dylib`
      and any test executables with default visibility, reviving the same
      dual-registration abort at test startup.
   
   5. **`cpp/velox/compute/VeloxBackend.cc`** — Guards
      `google::InitGoogleLogging` with `IsGoogleLoggingInitialized()` and
      makes `VeloxBackend::create()` idempotent. Multi-suite gtest binaries
      on macOS re-enter `VeloxBackend::init` from each `SetUpTestSuite`,
      otherwise triggering glog's `"You called InitGoogleLogging() twice!"`
      check and Gluten's `Registry "Required object already registered"`
      check.
   
   ## How was this patch tested?
   
   Built on macOS 14 arm64 with Apple Clang 17 and the Homebrew toolchain.
   
   **Symbol audit (after the fix):**
   
   ```
   $ nm -g libvelox.dylib | grep "google.*ParseCommandLine"
   (empty)
   
   $ nm libvelox.dylib | awk '/FlagRegistry/ {print $2}' | sort | uniq -c
      3 b
     21 t
   ```
   
   All `FlagRegistry` symbols are lowercase (`t` = local text, `b` = local
   bss); none are exported across the dylib boundary, so dyld has nothing
   to coalesce.
   
   **Behavioral validation:**
   
   - Before the fix, `dlopen("libgluten.dylib")` aborts before any test
     reaches `main()`.
   - After the fix, `cpp/build/velox/tests/velox_shuffle_writer_test` runs
     5436 / 5436 cases cleanly on macOS 14 arm64.
   - Spark 3.5 + Velox backend Java JUnit canaries (the JNI-only suites
     that exercise native load without query execution) all pass on macOS
     arm64:
       - `org.apache.gluten.utils.VeloxBloomFilterTest`
       - `org.apache.gluten.columnarbatch.ColumnarBatchTest`
       - `org.apache.gluten.backendsapi.VeloxListenerApiTest`
       - `org.apache.gluten.fs.OnHeapFileSystemTest`
       - `org.apache.gluten.vectorized.ArrowColumnVectorTest`
   - Full ctest of `cpp/build` reports 5574 / 5585 pass; the 11 failures
     are unrelated upstream Velox issues exposed by the recent
     `dft-2026_05_13` bump (HYPERLOGLOG cast registration tightening,
     `Type::equivalent()` regression on identically-printed ROW types) —
     not caused by this PR.
   
   **Linux:**
   
   - Linux x86_64 build green; all changes are gated behind `APPLE` /
     Darwin checks, so no behavioral change on Linux is expected. Local
     Ubuntu build verified clean.
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   co-auth: Claude (Sonnet/Opus) via Claude Code 1.x
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to