[GitHub] [arrow] kou commented on a diff in pull request #35672: GH-35176: [C++] Add support for disabling threading for emscripten (from sub-branch)

via GitHub Tue, 23 May 2023 17:40:05 -0700


kou commented on code in PR #35672:
URL: https://github.com/apache/arrow/pull/35672#discussion_r1203200360



##########
ci/scripts/cpp_build.sh:
##########
@@ -129,6 +129,7 @@ cmake \
   -DARROW_WITH_UTF8PROC=${ARROW_WITH_UTF8PROC:-ON} \
   -DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB:-OFF} \
   -DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-OFF} \
+  -DARROW_ENABLE_THREADING=${ARROW_ENABLE_THREADING:-ON} \

Review Comment:
   Could you keep this list in alphabetical order?



##########
cpp/cmake_modules/ThirdpartyToolchain.cmake:
##########
@@ -5190,4 +5190,4 @@ message(STATUS "All bundled static libraries: 
${ARROW_BUNDLED_STATIC_LIBS}")
 
 configure_file("src/arrow/util/config.h.cmake" "src/arrow/util/config.h" 
ESCAPE_QUOTES)
 install(FILES "${ARROW_BINARY_DIR}/src/arrow/util/config.h"
-        DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")
+        DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/util")

Review Comment:
   Could you revert this needless change?



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they 
wont make loads of threads
+  ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}
+  n_jobs=$((n_jobs*4))

Review Comment:
   I don't think that this is a good idea...
   If we use more test processes than the number of cores, each test process 
competes CPU resource with others...



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they 
wont make loads of threads

Review Comment:
   ```suggestion
     # if threading is disabled, some tests take longer to run
     # but we can get away with running more tests in parallel because we know 
they won't make loads of threads
   ```



##########
cpp/src/arrow/acero/asof_join_node.cc:
##########
@@ -1514,14 +1514,19 @@ class AsofJoinNode : public ExecNode {
   }
 
   Status StartProducing() override {
-    ARROW_ASSIGN_OR_RAISE(process_task_, 
plan_->query_context()->BeginExternalTask(
-                                             "AsofJoinNode::ProcessThread"));
-    if (!process_task_.is_valid()) {
-      // Plan has already aborted.  Do not start process thread
+    #ifndef ARROW_ENABLE_THREADING
+      return Status::NotImplemented("ASOF join requires threading enabled");
+    #else

Review Comment:
   Can we use the following approach?
   
   ```cpp
   Status StartProducing() override {
   #ifndef ARROW_ENABLE_THREADING
     return Status::NotImplemented("ASOF join requires threading enabled");
   #endif
     ARROW_ASSIGN_OR_RAISE(process_task_, 
plan_->query_context()->BeginExternalTask(
     // use existing code as-is.
   ```
   
   Putting existing code to `#else ... #endif` reduces readability.



##########
cpp/src/arrow/acero/bloom_filter_test.cc:
##########
@@ -507,7 +507,11 @@ TEST(BloomFilter, Scaling) {
   num_build.push_back(4000000);
 
   std::vector<BloomFilterBuildStrategy> strategy;
+#ifndef ARROW_ENABLE_THREADING
+  strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
+#else
   strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
+#endif

Review Comment:
   Could you use `#ifdef ... #else ... #endif` not `#ifndef ... #else ... 
#endif` as much as possible? A `#ifndef` (invert) and `#else` (invert) pair 
reduces readability.
   
   ```cpp
   #ifdef ARROW_ENABLE_THREADING
     strategy.push_back(BloomFilterBuildStrategy::PARALLEL);
   #else
     strategy.push_back(BloomFilterBuildStrategy::SINGLE_THREADED);
   #endif
   ```



##########
ci/scripts/cpp_test.sh:
##########
@@ -80,6 +80,12 @@ pushd ${build_dir}
 if [ -z "${PYTHON}" ] && ! which python > /dev/null 2>&1; then
   export PYTHON="${PYTHON:-python3}"
 fi
+if [[ "${ARROW_ENABLE_THREADING}" == "OFF" ]]; then
+# if threading is disabled, some tests take longer to run
+# but we can get away with running more tests in parallel because we know they 
wont make loads of threads
+  ARROW_CTEST_TIMEOUT=${ARROW_CTEST_TIMEOUT:-900}

Review Comment:
   It seems that max test time is "239.99 sec" in our CI:
   
   
https://github.com/ursacomputing/crossbow/actions/runs/5018372445/jobs/8997694994#step:6:3234
   
   > ```text
   > 89/103 Test  #21: arrow-compute-scalar-if-else-test .........   Passed  
239.99 sec
   > ```
   
   Why is this needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] kou commented on a diff in pull request #35672: GH-35176: [C++] Add support for disabling threading for emscripten (from sub-branch)

Reply via email to