Copilot commented on code in PR #49996: URL: https://github.com/apache/arrow/pull/49996#discussion_r3509806521
########## ci/scripts/r_wasm_test.sh: ########## @@ -0,0 +1,80 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# Test the arrow R package built for WebAssembly. +# +# This script is intended to run inside the ghcr.io/r-universe-org/build-wasm +# Docker container after rwasm::build() has produced a .tgz binary. It: +# 1. Sets up a CRAN-like repo structure from the built .tgz +# 2. Installs the npm webr package (Node.js webR runtime) +# 3. Boots webR, installs arrow from the local repo, and verifies: +# - The package can be installed and loaded +# - Multithreading is disabled (arrow.use_threads == FALSE) +# - The testthat test suite runs +# +# Tests that require threading are automatically skipped via +# skip_if_not(CanRunWithCapturedR()) since CanRunWithCapturedR() returns +# FALSE under Emscripten. +# +# Usage: +# r_wasm_test.sh <path-to-arrow-r-dir> +# +# Example: +# r_wasm_test.sh /work +# +# The arrow .tgz file(s) should already exist in <path-to-arrow-r-dir>. + +set -euxo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +arrow_r_dir="${1:-.}" + +# Set up a fake CRAN-like repo so we can install the package +tgz_file=$(ls "${arrow_r_dir}"/arrow_*.tgz 2>/dev/null | head -1) +if [ -z "${tgz_file}" ]; then + echo "ERROR: No arrow_*.tgz found in ${arrow_r_dir}" >&2 + exit 1 +fi Review Comment: With `set -euo pipefail`, the `ls ... | head -1` pipeline will exit the script when no `arrow_*.tgz` exists (because `ls` fails and `pipefail` propagates that), so the subsequent explicit empty-check is never reached. It also breaks on filenames with whitespace. Use Bash globbing (nullglob) / an array to detect matches robustly. ########## r/R/arrow-package.R: ########## @@ -182,19 +189,35 @@ configure_tzdb <- function() { tryCatch( { tzdb::tzdb_initialize() - set_timezone_database(tzdb::tzdb_path("text")) + tz_path <- tzdb::tzdb_path("text") + packageStartupMessage("[configure_tzdb] tzdb path: ", tz_path) + packageStartupMessage("[configure_tzdb] path exists: ", dir.exists(tz_path)) + if (dir.exists(tz_path)) { + tz_files <- list.files(tz_path, recursive = TRUE) Review Comment: `configure_tzdb()` now emits multiple debug-style startup messages (including listing tzdb directory contents) on every load when tzdb is available. This is noisy for users and can leak filesystem structure in logs; it also affects non-Emscripten paths (e.g., Windows MinGW) where `configure_tzdb()` is called. Consider removing these debug messages or gating them behind an explicit debug option/env var. ########## r/tests/testthat/test-dplyr-filter.R: ########## @@ -415,6 +415,8 @@ test_that("filter() with namespaced functions", { }) test_that("filter() with across()", { + skip_on_emscripten() # TODO(xxx): need to figure out what warnings this throws Review Comment: Avoid placeholder TODO tags like `TODO(xxx)` in committed tests. Either reference a real issue/PR or remove the TODO so it's clear how to follow up. ########## ci/scripts/r_wasm_test.cjs: ########## @@ -0,0 +1,160 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// Smoke-test the arrow R package under webR, then run the testthat suite. +// Called by r_wasm_test.sh. Requires env vars: +// ARROW_WASM_REPO_DIR - local CRAN-like repo with the arrow .tgz +// ARROW_R_TESTS_DIR - path to tests/testthat in the source tree + +const { WebR } = require("webr"); +const http = require("http"); +const fs = require("fs"); +const path = require("path"); + +const repoDir = process.env.ARROW_WASM_REPO_DIR; +if (!repoDir) { + console.error("ERROR: ARROW_WASM_REPO_DIR not set"); + process.exit(1); +} + +const testsDir = process.env.ARROW_R_TESTS_DIR; +if (!testsDir) { + console.error("ERROR: ARROW_R_TESTS_DIR not set"); + process.exit(1); +} + +function listFilesRecursive(dir) { + const results = []; + for (const entry of fs.readdirSync(dir, { withFileTypes: true })) { + const full = path.join(dir, entry.name); + if (entry.isDirectory()) { + results.push(...listFilesRecursive(full)); + } else { + results.push(full); + } + } + return results; +} + +async function main() { + // Serve the repo over HTTP (webR can't access the host filesystem directly) + const server = http.createServer((req, res) => { + const filePath = path.join(repoDir, decodeURIComponent(req.url)); + fs.readFile(filePath, (err, data) => { + if (err) { + res.writeHead(404); + res.end(); + } else { + res.writeHead(200); + res.end(data); + } + }); + }); Review Comment: The HTTP server path resolution is unsafe/incorrect: `path.join(repoDir, decodeURIComponent(req.url))` will ignore `repoDir` when `req.url` starts with `/` (typical for HTTP requests), and it also allows `..` traversal. This can both break installs (serving wrong path) and expose files outside the intended repo directory. Normalize the URL to a relative path and ensure the resolved path stays under `repoDir`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
