Copilot commented on code in PR #11712:
URL: https://github.com/apache/gluten/pull/11712#discussion_r3406193703


##########
ep/build-velox/src/socket.h:
##########
@@ -0,0 +1,105 @@
+#ifndef __ASM_GENERIC_SOCKET_H
+#define __ASM_GENERIC_SOCKET_H
+
+#include <asm/sockios.h>
+

Review Comment:
   This new header is missing the standard Apache Software Foundation license 
header that other source files in this repository include. Additionally, the 
contents appear to be a verbatim copy of Linux's asm-generic socket 
definitions, which can introduce licensing incompatibilities and long-term 
maintenance risk. Prefer relying on the OS-provided kernel headers in the build 
image (or generating a minimal local shim with clear provenance) instead of 
vendoring/copying this file into the repo.



##########
cpp/velox/utils/ConfigExtractor.cc:
##########
@@ -233,8 +233,6 @@ std::shared_ptr<facebook::velox::config::ConfigBase> 
createHiveConnectorSessionC
   
configs[facebook::velox::connector::hive::HiveConfig::kReadTimestampUnitSession]
 = std::string("6");
   
configs[facebook::velox::connector::hive::HiveConfig::kMaxPartitionsPerWritersSession]
 =
       conf->get<std::string>(kMaxPartitions, "10000");
-  configs[facebook::velox::connector::hive::HiveConfig::kMaxTargetFileSize] =
-      conf->get<std::string>(kMaxTargetFileSize, "0B"); // 0 means no limit on 
target file size
   
configs[facebook::velox::connector::hive::HiveConfig::kIgnoreMissingFilesSession]
 =
       conf->get<bool>(kIgnoreMissingFiles, false) ? "true" : "false";
   
configs[facebook::velox::connector::hive::HiveConfig::kParquetUseColumnNamesSession]
 =

Review Comment:
   The Spark config `spark.gluten.sql.columnar.backend.velox.maxTargetFileSize` 
is still defined (Scala: VeloxConfig.MAX_TARGET_FILE_SIZE_SESSION; C++: 
cpp/velox/config/VeloxConfig.h:kMaxTargetFileSize), but it is no longer 
propagated into the Velox Hive session config map here. This silently changes 
behavior for file-writing (target file sizing) and can confuse users because 
the config appears to be supported but has no effect.



##########
ep/build-velox/src/get-velox.sh:
##########
@@ -17,8 +17,8 @@
 set -exu
 
 CURRENT_DIR=$(cd "$(dirname "$BASH_SOURCE")"; pwd)
-VELOX_REPO=https://github.com/IBM/velox.git
-VELOX_BRANCH=dft-2026_06_06
+VELOX_REPO=https://github.com/zhouyuan/velox.git
+VELOX_BRANCH=dft-2026_06_06-fbthrift

Review Comment:
   Changing the default Velox repository to a personal fork makes builds 
non-reproducible and introduces a supply-chain risk for anyone running the 
default scripts. It would be safer to keep the default pointing at the 
project’s canonical/upstream repository and require forks to be specified via 
--velox_repo.



##########
dev/ci-velox-buildstatic-centos-7.sh:
##########
@@ -17,6 +17,6 @@
 
 set -e
 
-export NUM_THREADS=4
+cp ep/build-velox/src/socket.h /usr/include/asm-generic/socket.h 

Review Comment:
   This script overwrites a system header in /usr/include, which is brittle 
(requires root), can break other builds on the same runner, and makes CI 
behavior depend on global state. If the goal is to supply a missing 
asm-generic/socket.h for CentOS 7, prefer providing a local include directory 
earlier on the include path (e.g., via CPATH) and keep the header out of system 
locations.



##########
ep/build-velox/src/get-velox.sh:
##########
@@ -69,6 +69,7 @@ function process_setup_ubuntu {
 }
 
 function process_setup_centos9 {
+  sed -i "s|run_and_time install_arrow||g" scripts/setup-centos9.sh
   echo "Using setup script from Velox"
 }

Review Comment:
   process_setup_centos9 unconditionally removes `run_and_time install_arrow` 
from Velox’s setup script. This will also affect builds where Gluten is invoked 
with --build_arrow=OFF (i.e., relying on Velox’s setup to install Arrow), 
potentially leaving Arrow missing and causing later build failures. Consider 
gating this edit on the Gluten build options (or an explicit flag) so Arrow 
installation is only skipped when Gluten is going to build/install Arrow itself.



##########
cpp/velox/CMakeLists.txt:
##########
@@ -375,6 +375,51 @@ else()
 endif()
 target_link_libraries(velox PUBLIC external::geos)
 
+find_package(veloxthrift CONFIG)
+if(veloxthrift_FOUND AND TARGET veloxthrift::veloxthrift)
+  add_library(external::veloxthrift ALIAS veloxthrift::veloxthrift)
+else()
+  message(STATUS "import Velox bundled veloxthrift")
+  import_library(external::veloxthrift 
${VELOX_BUILD_PATH}/velox/dwio/parquet/thrift/libvelox_dwio_parquet_thrift_raw.a)
+endif()
+target_link_libraries(velox PUBLIC external::veloxthrift)
+
+# Link thrift libraries - check vcpkg first, then fall back to system libraries
+# Determine vcpkg triplet directory based on architecture
+if(CMAKE_SYSTEM_PROCESSOR MATCHES "(x86)|(X86)|(amd64)|(AMD64)")
+  set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(aarch64)|(arm64)")
+  set(VCPKG_TRIPLET_DIR "arm64-linux-neon")
+else()
+  message(WARNING "Unknown processor type for vcpkg: 
${CMAKE_SYSTEM_PROCESSOR}, defaulting to x64-linux-avx")
+  set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+endif()
+
+if(DEFINED VCPKG_INSTALLED_DIR AND EXISTS 
"${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftcpp2.a")
+  message(STATUS "Using vcpkg thrift libraries from 
${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}")
+  target_link_libraries(
+    velox
+    PRIVATE
+      ${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftcpp2.a
+      ${VCPKG_INSTALLED_DIR}/${VCPKG_TRIPLET_DIR}/lib/libthriftprotocol.a
+  )
+else()
+  message(STATUS "Using system thrift libraries from /usr/local/lib")
+  find_library(THRIFTCPP2_LIB NAMES thriftcpp2 PATHS /usr/local/lib 
NO_DEFAULT_PATH)
+  find_library(THRIFTPROTOCOL_LIB NAMES thriftprotocol PATHS /usr/local/lib 
NO_DEFAULT_PATH)
+  

Review Comment:
   The system-library fallback is restricted to /usr/local/lib with 
NO_DEFAULT_PATH, which will fail on common distros where system packages 
install into /usr/lib, /usr/lib64, or multiarch dirs. Since this CMakeLists 
already defines SYSTEM_LIB_PATH / SYSTEM_LIB64_PATH / 
SYSTEM_LIB_MULTIARCH_PATH, it should search those paths (and /usr/local/lib64) 
for thrift as well.



##########
dev/vcpkg/ports/mvfst/vcpkg.json:
##########
@@ -0,0 +1,30 @@
+{
+  "name": "mvfst",
+  "version-string": "2026.02.23.00",
+  "description": "mvfst (Pronounced move fast) is a client and server 
implementation of IETF QUIC protocol in C++ by Facebook.",
+  "homepage": "https://github.com/facebook/mvfst";,
+  "license": "MIT",
+  "supports": "!windows | static",
+  "dependencies": [

Review Comment:
   The vcpkg `supports` expression here uses `|` (OR): `!windows | static`. 
That will evaluate to true for *any* static triplet (including Windows static), 
which likely isn’t intended for mvfst and can cause vcpkg to try building it on 
unsupported platforms. If the intent is “non-Windows and static”, use `&` (AND).



##########
dev/vcpkg/init.sh:
##########
@@ -92,7 +92,7 @@ if [ "${VCPKG_DYNAMIC_OPENSSL:-OFF}" = "ON" ]; then
 fi
 
 
-$VCPKG install --no-print-usage \
+$VCPKG install --no-print-usage --allow-unsupported \
     --triplet="${VCPKG_TRIPLET}" --host-triplet="${VCPKG_TRIPLET}" 
${EXTRA_FEATURES}

Review Comment:
   `--allow-unsupported` disables vcpkg’s platform/feature support checks and 
can mask real incompatibilities (e.g., a port declaring it doesn’t support the 
current triplet). It’s safer to keep the default strict and only enable this 
flag conditionally (or after tightening `supports` expressions in overlay 
ports) so CI fails fast on unsupported dependency graphs.



##########
dev/vcpkg/ports/fbthrift/vcpkg.json:
##########
@@ -0,0 +1,35 @@
+{
+  "name": "fbthrift",
+  "version-string": "2026.02.23.00",
+  "description": "Facebook's branch of Apache Thrift, including a new C++ 
server.",
+  "homepage": "https://github.com/facebook/fbthrift";,
+  "license": "Apache-2.0",
+  "supports": "x64 & static",
+  "dependencies": [

Review Comment:
   This port is marked as `supports: "x64 & static"`, but 
`dev/vcpkg/vcpkg.json` now pulls `fbthrift` unconditionally as part of the 
default `velox` feature (and the repo has ARM CI that uses vcpkg). On non-x64 
platforms (e.g., arm64-linux-neon), vcpkg will consider this port unsupported; 
the added `--allow-unsupported` in dev/vcpkg/init.sh appears to be compensating 
for that rather than addressing platform support. Please either (a) 
broaden/adjust the `supports` expression if fbthrift is expected to build on 
arm64, or (b) make the dependency conditional so ARM CI doesn’t attempt to 
build an unsupported port.



##########
dev/build-arrow.sh:
##########
@@ -53,11 +53,11 @@ function build_arrow_cpp() {
        -DARROW_PROTOBUF_USE_SHARED=OFF \
        -DARROW_DEPENDENCY_USE_SHARED=OFF \
        -DARROW_DEPENDENCY_SOURCE=BUNDLED \
-       -DARROW_WITH_THRIFT=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB} \
        -DARROW_WITH_ZSTD=ON \
+       -DBoost_NO_BOOST_CMAKE=TRUE \
        -DARROW_JEMALLOC=OFF \

Review Comment:
   The Arrow CMake invocation no longer explicitly enables Thrift (removed 
`-DARROW_WITH_THRIFT=ON`), but this script still unconditionally installs 
Thrift from `_build/thrift_ep-*` later in `build_arrow_cpp()`. This is 
internally inconsistent and can break if Arrow decides not to build the 
`thrift_ep` external project when Thrift is disabled/auto-detected. Either keep 
Thrift explicitly enabled here, or remove/guard the Thrift install step below.



##########
cpp/velox/CMakeLists.txt:
##########
@@ -375,6 +375,51 @@ else()
 endif()
 target_link_libraries(velox PUBLIC external::geos)
 
+find_package(veloxthrift CONFIG)
+if(veloxthrift_FOUND AND TARGET veloxthrift::veloxthrift)
+  add_library(external::veloxthrift ALIAS veloxthrift::veloxthrift)
+else()
+  message(STATUS "import Velox bundled veloxthrift")
+  import_library(external::veloxthrift 
${VELOX_BUILD_PATH}/velox/dwio/parquet/thrift/libvelox_dwio_parquet_thrift_raw.a)
+endif()
+target_link_libraries(velox PUBLIC external::veloxthrift)
+
+# Link thrift libraries - check vcpkg first, then fall back to system libraries
+# Determine vcpkg triplet directory based on architecture
+if(CMAKE_SYSTEM_PROCESSOR MATCHES "(x86)|(X86)|(amd64)|(AMD64)")
+  set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "(aarch64)|(arm64)")
+  set(VCPKG_TRIPLET_DIR "arm64-linux-neon")
+else()
+  message(WARNING "Unknown processor type for vcpkg: 
${CMAKE_SYSTEM_PROCESSOR}, defaulting to x64-linux-avx")
+  set(VCPKG_TRIPLET_DIR "x64-linux-avx")
+endif()

Review Comment:
   This hard-codes the vcpkg triplet based on CPU type (x64-linux-avx / 
arm64-linux-neon). That’s brittle because the actual triplet is already known 
to CMake when using the vcpkg toolchain (VCPKG_TARGET_TRIPLET), and users/CI 
may use different triplet names. Prefer using VCPKG_TARGET_TRIPLET when 
available and only fall back to guessing when vcpkg isn’t in use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to