kou commented on code in PR #35672:
URL: https://github.com/apache/arrow/pull/35672#discussion_r1203200360
##########
ci/scripts/cpp_build.sh:
##########
@@ -129,6 +129,7 @@ cmake \
-DARROW_WITH_UTF8PROC=${ARROW_WITH_UTF8PROC:-ON} \
-DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB:-OFF} \
-DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-OFF} \
+ -DARROW_ENABLE_THREADING=${ARROW_ENABLE_THREADING:-ON} \
Review Comment:
Could you keep this list in alphabetical order?
##########
cpp/cmake_modules/ThirdpartyToolchain.cmake:
##########
@@ -5190,4 +5190,4 @@ message(STATUS "All bundled static libraries:
${ARROW_BUNDLED_STATIC_LIBS}")
configure_file("src/arrow/util/config.h.cmake" "src/arrow/util/config.h"
ESCAPE_QUOTES)
install(FILES "${ARROW_BINARY_DIR}/src/arrow/util/config.h"
- DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")
+ DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")
Review Comment:
Could you revert this needless change?
##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
export PYTHON="${PYTHON:-python3}"
fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they
wont make loads of threads
+ ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}
+ n_jobs=$((n_jobs*4))
Review Comment:
I don't think that this is a good idea...
If we use more test processes than the number of cores, each test process
competes CPU resource with others...
##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
export PYTHON="${PYTHON:-python3}"
fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they
wont make loads of threads
Review Comment:
```suggestion
# if threading is disabled, some tests take longer to run
# but we can get away with running more tests in parallel because we know
they won't make loads of threads
```
##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -1514,14 +1514,19 @@ class AsofJoinNode : public ExecNode {
}
Status StartProducing() override {
- ARROW_ASSIGN_OR_RAISE(process_task_,
plan_->query_context()->BeginExternalTask(
- "AsofJoinNode::ProcessThread"));
- if (!process_task_.is_valid()) {
- // Plan has already aborted. Do not start process thread
+ #ifndef ARROW_ENABLE_THREADING
+ return Status::NotImplemented("ASOF join requires threading enabled");
+ #else
Review Comment:
Can we use the following approach?
```cpp
Status StartProducing() override {
#ifndef ARROW_ENABLE_THREADING
return Status::NotImplemented("ASOF join requires threading enabled");
#endif
ARROW_ASSIGN_OR_RAISE(process_task_,
plan_->query_context()->BeginExternalTask(
// use existing code as-is.
```
Putting existing code to `#else ... #endif` reduces readability.
##########
cpp/src/arrow/acero/bloom_filter_test.cc:
##########
@@ -507,7 +507,11 @@ TEST(BloomFilter, Scaling) {
num_build.push_back(4000000);
std::vector<BloomFilterBuildStrategy> strategy;
+#ifndef ARROW_ENABLE_THREADING
+ strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
+#else
strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
+#endif
Review Comment:
Could you use `#ifdef ... #else ... #endif` not `#ifndef ... #else ...
#endif` as much as possible? A `#ifndef` (invert) and `#else` (invert) pair
reduces readability.
```cpp
#ifdef ARROW_ENABLE_THREADING
strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
#else
strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
#endif
```
##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
export PYTHON="${PYTHON:-python3}"
fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they
wont make loads of threads
+ ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}
Review Comment:
It seems that max test time is "239.99 sec" in our CI:
https://github.com/ursacomputing/crossbow/actions/runs/5018372445/jobs/8997694994#step:6:3234
> ```text
> 89/103 Test #21: arrow-compute-scalar-if-else-test ......... Passed
239.99 sec
> ```
Why is this needed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]