Copilot commented on code in PR #11712:
URL: https://github.com/apache/gluten/pull/11712#discussion_r3406193703
##########
ep/build-velox/src/socket.h:
##########
@@ -0,0 +1,105 @@
+#ifndef __ASM_GENERIC_SOCKET_H
+#define __ASM_GENERIC_SOCKET_H
+
+#include <asm/sockios.h>
+
Review Comment:
This new header is missing the standard Apache Software Foundation license
header that other source files in this repository include. Additionally, the
contents appear to be a verbatim copy of Linux's asm-generic socket
definitions, which can introduce licensing incompatibilities and long-term
maintenance risk. Prefer relying on the OS-provided kernel headers in the build
image (or generating a minimal local shim with clear provenance) instead of
vendoring/copying this file into the repo.
##########
cpp/velox/utils/ConfigExtractor.cc:
##########
@@ -233,8 +233,6 @@ std::shared_ptr<facebook::velox::config::ConfigBase>
createHiveConnectorSessionC
configs[facebook::velox::connector::hive::HiveConfig::kReadTimestampUnitSession]
= std::string("6");
configs[facebook::velox::connector::hive::HiveConfig::kMaxPartitionsPerWritersSession]
=
conf->get<std::string>(kMaxPartitions, "10000");
- configs[facebook::velox::connector::hive::HiveConfig::kMaxTargetFileSize] =
- conf->get<std::string>(kMaxTargetFileSize, "0B"); // 0 means no limit on
target file size
configs[facebook::velox::connector::hive::HiveConfig::kIgnoreMissingFilesSession]
=
conf->get<bool>(kIgnoreMissingFiles, false) ? "true" : "false";
configs[facebook::velox::connector::hive::HiveConfig::kParquetUseColumnNamesSession]
=
Review Comment:
The Spark config `spark.gluten.sql.columnar.backend.velox.maxTargetFileSize`
is still defined (Scala: VeloxConfig.MAX_TARGET_FILE_SIZE_SESSION; C++:
cpp/velox/config/VeloxConfig.h:kMaxTargetFileSize), but it is no longer
propagated into the Velox Hive session config map here. This silently changes
behavior for file-writing (target file sizing) and can confuse users because
the config appears to be supported but has no effect.
##########
ep/build-velox/src/get-velox.sh:
##########
@@ -17,8 +17,8 @@
set -exu
CURRENT_DIR=$(cd "$(dirname "$BASH_SOURCE")"; pwd)
-VELOX_REPO=https://github.com/IBM/velox.git
-VELOX_BRANCH=dft-2026_06_06
+VELOX_REPO=https://github.com/zhouyuan/velox.git
+VELOX_BRANCH=dft-2026_06_06-fbthrift
Review Comment:
Changing the default Velox repository to a personal fork makes builds
non-reproducible and introduces a supply-chain risk for anyone running the
default scripts. It would be safer to keep the default pointing at the
project’s canonical/upstream repository and require forks to be specified via
--velox_repo.
##########
dev/ci-velox-buildstatic-centos-7.sh:
##########
@@ -17,6 +17,6 @@
set -e
-export NUM_THREADS=4
+cp ep/build-velox/src/socket.h /usr/include/asm-generic/socket.h
Review Comment:
This script overwrites a system header in /usr/include, which is brittle
(requires root), can break other builds on the same runner, and makes CI
behavior depend on global state. If the goal is to supply a missing
asm-generic/socket.h for CentOS 7, prefer providing a local include directory
earlier on the include path (e.g., via CPATH) and keep the header out of system
locations.
##########
ep/build-velox/src/get-velox.sh:
##########
@@ -69,6 +69,7 @@ function process_setup_ubuntu {
}
function process_setup_centos9 {
+ sed -i "s|run_and_time install_arrow||g" scripts/setup-centos9.sh
echo "Using setup script from Velox"
}
Review Comment:
process_setup_centos9 unconditionally removes `run_and_time install_arrow`
from Velox’s setup script. This will also affect builds where Gluten is invoked
with --build_arrow=OFF (i.e., relying on Velox’s setup to install Arrow),
potentially leaving Arrow missing and causing later build failures. Consider
gating this edit on the Gluten build options (or an explicit flag) so Arrow
installation is only skipped when Gluten is going to build/install Arrow itself.
##########
cpp/velox/CMakeLists.txt:
##########
@@ -375,6 +375,51 @@ else()
endif()
target_link_libraries(velox PUBLIC external::geos)
+find_package(veloxthrift CONFIG)
+if(veloxthrift_FOUND AND TARGET veloxthrift::veloxthrift)
+ add_library(external::veloxthrift ALIAS veloxthrift::veloxthrift)
+else()
+ message(STATUS "import Velox bundled veloxthrift")
+ import_library(external::veloxthrift
${VELOX_BUILD_PATH}/velox/dwio/parquet/thrift/libvelox_dwio_parquet_thrift_raw.a)
+endif()
+target_link_libraries(velox PUBLIC external::veloxthrift)
+
+# Link thrift libraries - check vcpkg first, then fall back to system libraries
+# Determine vcpkg triplet directory based on architecture
+if(CMAKE_SYSTEM_PROCESSOR MATCHES "(x86)|(X86)|(amd64)|(AMD64)")
+ set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(aarch64)|(arm64)")
+ set(VCPKG_TRIPLET_DIR "arm64-linux-neon")
+else()
+ message(WARNING "Unknown processor type for vcpkg:
${CMAKE_SYSTEM_PROCESSOR}, defaulting to x64-linux-avx")
+ set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+endif()
+
+if(DEFINED VCPKG_INSTALLED_DIR AND EXISTS
"${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftcpp2.a")
+ message(STATUS "Using vcpkg thrift libraries from
${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}")
+ target_link_libraries(
+ velox
+ PRIVATE
+ ${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftcpp2.a
+ ${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftprotocol.a
+ )
+else()
+ message(STATUS "Using system thrift libraries from /usr/local/lib")
+ find_library(THRIFTCPP2_LIB NAMES thriftcpp2 PATHS /usr/local/lib
NO_DEFAULT_PATH)
+ find_library(THRIFTPROTOCOL_LIB NAMES thriftprotocol PATHS /usr/local/lib
NO_DEFAULT_PATH)
+
Review Comment:
The system-library fallback is restricted to /usr/local/lib with
NO_DEFAULT_PATH, which will fail on common distros where system packages
install into /usr/lib, /usr/lib64, or multiarch dirs. Since this CMakeLists
already defines SYSTEM_LIB_PATH / SYSTEM_LIB64_PATH /
SYSTEM_LIB_MULTIARCH_PATH, it should search those paths (and /usr/local/lib64)
for thrift as well.
##########
dev/vcpkg/ports/mvfst/vcpkg.json:
##########
@@ -0,0 +1,30 @@
+{
+ "name": "mvfst",
+ "version-string": "2026.02.23.00",
+ "description": "mvfst (Pronounced move fast) is a client and server
implementation of IETF QUIC protocol in C++ by Facebook.",
+ "homepage": "https://github.com/facebook/mvfst",
+ "license": "MIT",
+ "supports": "!windows | static",
+ "dependencies": [
Review Comment:
The vcpkg `supports` expression here uses `|` (OR): `!windows | static`.
That will evaluate to true for *any* static triplet (including Windows static),
which likely isn’t intended for mvfst and can cause vcpkg to try building it on
unsupported platforms. If the intent is “non-Windows and static”, use `&` (AND).
##########
dev/vcpkg/init.sh:
##########
@@ -92,7 +92,7 @@ if [ "${VCPKG_DYNAMIC_OPENSSL:-OFF}" = "ON" ]; then
fi
-$VCPKG install --no-print-usage \
+$VCPKG install --no-print-usage --allow-unsupported \
--triplet="${VCPKG_TRIPLET}" --host-triplet="${VCPKG_TRIPLET}"
${EXTRA_FEATURES}
Review Comment:
`--allow-unsupported` disables vcpkg’s platform/feature support checks and
can mask real incompatibilities (e.g., a port declaring it doesn’t support the
current triplet). It’s safer to keep the default strict and only enable this
flag conditionally (or after tightening `supports` expressions in overlay
ports) so CI fails fast on unsupported dependency graphs.
##########
dev/vcpkg/ports/fbthrift/vcpkg.json:
##########
@@ -0,0 +1,35 @@
+{
+ "name": "fbthrift",
+ "version-string": "2026.02.23.00",
+ "description": "Facebook's branch of Apache Thrift, including a new C++
server.",
+ "homepage": "https://github.com/facebook/fbthrift",
+ "license": "Apache-2.0",
+ "supports": "x64 & static",
+ "dependencies": [
Review Comment:
This port is marked as `supports: "x64 & static"`, but
`dev/vcpkg/vcpkg.json` now pulls `fbthrift` unconditionally as part of the
default `velox` feature (and the repo has ARM CI that uses vcpkg). On non-x64
platforms (e.g., arm64-linux-neon), vcpkg will consider this port unsupported;
the added `--allow-unsupported` in dev/vcpkg/init.sh appears to be compensating
for that rather than addressing platform support. Please either (a)
broaden/adjust the `supports` expression if fbthrift is expected to build on
arm64, or (b) make the dependency conditional so ARM CI doesn’t attempt to
build an unsupported port.
##########
dev/build-arrow.sh:
##########
@@ -53,11 +53,11 @@ function build_arrow_cpp() {
-DARROW_PROTOBUF_USE_SHARED=OFF \
-DARROW_DEPENDENCY_USE_SHARED=OFF \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
- -DARROW_WITH_THRIFT=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB} \
-DARROW_WITH_ZSTD=ON \
+ -DBoost_NO_BOOST_CMAKE=TRUE \
-DARROW_JEMALLOC=OFF \
Review Comment:
The Arrow CMake invocation no longer explicitly enables Thrift (removed
`-DARROW_WITH_THRIFT=ON`), but this script still unconditionally installs
Thrift from `_build/thrift_ep-*` later in `build_arrow_cpp()`. This is
internally inconsistent and can break if Arrow decides not to build the
`thrift_ep` external project when Thrift is disabled/auto-detected. Either keep
Thrift explicitly enabled here, or remove/guard the Thrift install step below.
##########
cpp/velox/CMakeLists.txt:
##########
@@ -375,6 +375,51 @@ else()
endif()
target_link_libraries(velox PUBLIC external::geos)
+find_package(veloxthrift CONFIG)
+if(veloxthrift_FOUND AND TARGET veloxthrift::veloxthrift)
+ add_library(external::veloxthrift ALIAS veloxthrift::veloxthrift)
+else()
+ message(STATUS "import Velox bundled veloxthrift")
+ import_library(external::veloxthrift
${VELOX_BUILD_PATH}/velox/dwio/parquet/thrift/libvelox_dwio_parquet_thrift_raw.a)
+endif()
+target_link_libraries(velox PUBLIC external::veloxthrift)
+
+# Link thrift libraries - check vcpkg first, then fall back to system libraries
+# Determine vcpkg triplet directory based on architecture
+if(CMAKE_SYSTEM_PROCESSOR MATCHES "(x86)|(X86)|(amd64)|(AMD64)")
+ set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(aarch64)|(arm64)")
+ set(VCPKG_TRIPLET_DIR "arm64-linux-neon")
+else()
+ message(WARNING "Unknown processor type for vcpkg:
${CMAKE_SYSTEM_PROCESSOR}, defaulting to x64-linux-avx")
+ set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+endif()
Review Comment:
This hard-codes the vcpkg triplet based on CPU type (x64-linux-avx /
arm64-linux-neon). That’s brittle because the actual triplet is already known
to CMake when using the vcpkg toolchain (VCPKG_TARGET_TRIPLET), and users/CI
may use different triplet names. Prefer using VCPKG_TARGET_TRIPLET when
available and only fall back to guessing when vcpkg isn’t in use.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]