Copilot commented on code in PR #49996:
URL: https://github.com/apache/arrow/pull/49996#discussion_r3510124820


##########
r/tests/testthat/test-dplyr-arrange.R:
##########
@@ -238,6 +238,7 @@ test_that("Can use across() within arrange()", {
       collect(),
     example_data
   )
+  skip_on_emscripten()
   compare_dplyr_binding(

Review Comment:
   skip_on_emscripten() is placed after the first compare_dplyr_binding() call, 
so the test still executes (and can fail) on Emscripten before it ever reaches 
the skip. Move the skip to the start of the test block so it reliably skips the 
whole test.



##########
ci/scripts/r_wasm_test.cjs:
##########
@@ -0,0 +1,165 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Smoke-test the arrow R package under webR, then run the testthat suite.
+// Called by r_wasm_test.sh. Requires env vars:
+//   ARROW_WASM_REPO_DIR - local CRAN-like repo with the arrow .tgz
+//   ARROW_R_TESTS_DIR   - path to tests/testthat in the source tree
+
+const { WebR } = require("webr");
+const http = require("http");
+const fs = require("fs");
+const path = require("path");
+
+const repoDir = process.env.ARROW_WASM_REPO_DIR;
+if (!repoDir) {
+  console.error("ERROR: ARROW_WASM_REPO_DIR not set");
+  process.exit(1);
+}
+
+const testsDir = process.env.ARROW_R_TESTS_DIR;
+if (!testsDir) {
+  console.error("ERROR: ARROW_R_TESTS_DIR not set");
+  process.exit(1);
+}
+
+function listFilesRecursive(dir) {
+  const results = [];
+  for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
+    const full = path.join(dir, entry.name);
+    if (entry.isDirectory()) {
+      results.push(...listFilesRecursive(full));
+    } else {
+      results.push(full);
+    }
+  }
+  return results;
+}
+
+async function main() {
+  // Serve the repo over HTTP (webR can't access the host filesystem directly)
+  const server = http.createServer((req, res) => {
+    const filePath = path.join(repoDir, decodeURIComponent(req.url));
+    if (!filePath.startsWith(path.resolve(repoDir))) {
+      res.writeHead(403);
+      res.end();
+      return;
+    }

Review Comment:
   The HTTP file server joins repoDir with req.url directly. Because req.url 
starts with a leading '/', path.join(repoDir, req.url) becomes an absolute path 
and drops repoDir, so the subsequent startsWith(repoDir) check 403s all 
legitimate requests (breaking installPackages). Strip the leading '/' and 
validate using resolved paths to both serve files correctly and prevent path 
traversal.



##########
r/src/safe-call-into-r-impl.cpp:
##########
@@ -45,7 +45,16 @@ bool SetEnableSignalStopSource(bool enabled) {
 }
 
 // [[arrow::export]]
-bool CanRunWithCapturedR() { return MainRThread::GetInstance().Executor() == 
nullptr; }
+bool CanRunWithCapturedR() {
+#ifdef __EMSCRIPTEN__
+  // Threading is not supported under Emscripten/WASM. Always take the

Review Comment:
   The comment claims threading is not supported under Emscripten/WASM, but 
Emscripten can support pthreads depending on how it is built/served. Consider 
wording this as threading being disabled/unavailable in this build to avoid 
documenting an overly broad limitation.



##########
r/R/arrow-package.R:
##########
@@ -183,19 +189,35 @@ configure_tzdb <- function() {
     tryCatch(
       {
         tzdb::tzdb_initialize()
-        set_timezone_database(tzdb::tzdb_path("text"))
+        tz_path <- tzdb::tzdb_path("text")
+        packageStartupMessage("[configure_tzdb] tzdb path: ", tz_path)
+        packageStartupMessage("[configure_tzdb] path exists: ", 
dir.exists(tz_path))
+        if (dir.exists(tz_path)) {
+          tz_files <- list.files(tz_path, recursive = TRUE)

Review Comment:
   configure_tzdb() now emits multiple debug-level startup messages and even 
lists timezone files on every successful load. This is very noisy for normal 
users (especially during package attach) and can significantly clutter logs; 
consider emitting details only when arrow.debug is enabled, while keeping the 
functional initialization unchanged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to