pitrou commented on code in PR #37822: URL: https://github.com/apache/arrow/pull/37822#discussion_r1651056439
########## ci/docker/conda-python-emscripten.dockerfile: ########## @@ -0,0 +1,83 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +ARG repo +ARG arch +ARG python="3.12" +FROM ${repo}:${arch}-conda-python-${python} + +ARG selenium_version="4.15.2" +ARG pyodide_version="0.26.0" +ARG emscripten_version="3.1.58" +ARG chrome_version="latest" +ARG required_python_min="(3,12)" +# fail if python version < 3.12 +RUN echo "check PYTHON>=${required_python_min}" && python -c "import sys;sys.exit(0 if sys.version_info>=${required_python_min} else 1)" + +# install selenium and pyodide-build and recent python + +# needs to be a login shell so ~/.profile is read +SHELL ["/bin/bash","--login","-c"] + +RUN python --version && \ +python -m pip install selenium==${selenium_version} &&\ + python -m pip install --upgrade --pre pyodide-build==${pyodide_version} + +RUN cd ~ && (ls emsdk || git clone https://github.com/emscripten-core/emsdk.git) && \ + cd emsdk && \ + ./emsdk install ${emscripten_version} && \ + ./emsdk activate ${emscripten_version} && \ + echo "Installed emsdk to:" ~/emsdk + +# make sure zlib is cached in the EMSDK folder +RUN source ~/emsdk/emsdk_env.sh && embuilder --pic build zlib + +# install pyodide dist directory to /pyodide +RUN cd / \ + && pyodide_dist_url="https://github.com/pyodide/pyodide/releases/download/${pyodide_version}/pyodide-${pyodide_version}.tar.bz2"\ + && wget ${pyodide_dist_url} -O- |tar -xj + +# install chrome for testing browser based runner +# zip needed for chrome, libpthread stubs installs pthread module to cmake, build-essential makes +# sure that unix make is available in isolated python environments +RUN apt-get update && apt-get install -y -q zip libpthread-stubs0-dev build-essential + +RUN if [ $chrome_version = "latest" ]; \ + then CHROME_VERSION_FULL=$(wget --no-verbose -O - "https://googlechromelabs.github.io/chrome-for-testing/LATEST_RELEASE_STABLE"); \ + else CHROME_VERSION_FULL=$(wget --no-verbose -O - "https://googlechromelabs.github.io/chrome-for-testing/LATEST_RELEASE_${chrome_version}"); \ + fi \ + && CHROME_DOWNLOAD_URL="https://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-stable/google-chrome-stable_${CHROME_VERSION_FULL}-1_amd64.deb" \ + && CHROMEDRIVER_DOWNLOAD_URL="https://storage.googleapis.com/chrome-for-testing-public/${CHROME_VERSION_FULL}/linux64/chromedriver-linux64.zip" \ + && wget --no-verbose -O /tmp/google-chrome.deb ${CHROME_DOWNLOAD_URL} \ + && apt-get update \ + && apt install -qqy /tmp/google-chrome.deb \ + && rm -f /tmp/google-chrome.deb \ + && rm -rf /var/lib/apt/lists/* \ + && wget --no-verbose -O /tmp/chromedriver-linux64.zip ${CHROMEDRIVER_DOWNLOAD_URL} \ + && unzip /tmp/chromedriver-linux64.zip -d /opt/ \ + && rm /tmp/chromedriver-linux64.zip \ + && ln -fs /opt/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver \ + && echo "Using Chrome version: $(google-chrome --version)" \ + && echo "Using Chrome Driver version: $(chromedriver --version)" + + + +SHELL ["/bin/bash", "--login", "-i", "-c"] +# install node 18 (needed for async call support) +RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash - +RUN nvm install 18 Review Comment: This should probably be a separate script as well? ########## ci/docker/conda-python-emscripten.dockerfile: ########## @@ -0,0 +1,83 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +ARG repo +ARG arch +ARG python="3.12" +FROM ${repo}:${arch}-conda-python-${python} + +ARG selenium_version="4.15.2" +ARG pyodide_version="0.26.0" +ARG emscripten_version="3.1.58" +ARG chrome_version="latest" +ARG required_python_min="(3,12)" +# fail if python version < 3.12 +RUN echo "check PYTHON>=${required_python_min}" && python -c "import sys;sys.exit(0 if sys.version_info>=${required_python_min} else 1)" + +# install selenium and pyodide-build and recent python + +# needs to be a login shell so ~/.profile is read +SHELL ["/bin/bash","--login","-c"] + +RUN python --version && \ +python -m pip install selenium==${selenium_version} &&\ + python -m pip install --upgrade --pre pyodide-build==${pyodide_version} + +RUN cd ~ && (ls emsdk || git clone https://github.com/emscripten-core/emsdk.git) && \ + cd emsdk && \ + ./emsdk install ${emscripten_version} && \ + ./emsdk activate ${emscripten_version} && \ + echo "Installed emsdk to:" ~/emsdk Review Comment: Should this be a dedicated install script? Also, why the `cd ~` specifically here? ########## ci/docker/conda-python-emscripten.dockerfile: ########## @@ -0,0 +1,83 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +ARG repo +ARG arch +ARG python="3.12" +FROM ${repo}:${arch}-conda-python-${python} + +ARG selenium_version="4.15.2" +ARG pyodide_version="0.26.0" +ARG emscripten_version="3.1.58" +ARG chrome_version="latest" +ARG required_python_min="(3,12)" +# fail if python version < 3.12 +RUN echo "check PYTHON>=${required_python_min}" && python -c "import sys;sys.exit(0 if sys.version_info>=${required_python_min} else 1)" + +# install selenium and pyodide-build and recent python + +# needs to be a login shell so ~/.profile is read +SHELL ["/bin/bash","--login","-c"] + +RUN python --version && \ +python -m pip install selenium==${selenium_version} &&\ + python -m pip install --upgrade --pre pyodide-build==${pyodide_version} + +RUN cd ~ && (ls emsdk || git clone https://github.com/emscripten-core/emsdk.git) && \ + cd emsdk && \ + ./emsdk install ${emscripten_version} && \ + ./emsdk activate ${emscripten_version} && \ + echo "Installed emsdk to:" ~/emsdk + +# make sure zlib is cached in the EMSDK folder +RUN source ~/emsdk/emsdk_env.sh && embuilder --pic build zlib + +# install pyodide dist directory to /pyodide +RUN cd / \ + && pyodide_dist_url="https://github.com/pyodide/pyodide/releases/download/${pyodide_version}/pyodide-${pyodide_version}.tar.bz2"\ + && wget ${pyodide_dist_url} -O- |tar -xj + +# install chrome for testing browser based runner +# zip needed for chrome, libpthread stubs installs pthread module to cmake, build-essential makes +# sure that unix make is available in isolated python environments +RUN apt-get update && apt-get install -y -q zip libpthread-stubs0-dev build-essential + +RUN if [ $chrome_version = "latest" ]; \ Review Comment: Can we create a script for this instead of having this as a gigantic one-liner in the dockerfile? ########## docker-compose.yml: ########## @@ -875,6 +876,46 @@ services: /arrow/ci/scripts/python_build.sh /arrow /build && /arrow/ci/scripts/python_test.sh /arrow"] + conda-python-emscripten: + # Usage: + # docker-compose build conda-python-emscripten + # docker-compose run --rm conda-python-emscripten + # Parameters: + # ARCH: amd64, arm64v8, ... + # UBUNTU: 22.04 + image: ${REPO}:${ARCH}-conda-python-emscripten + build: + context: . + dockerfile: ci/docker/conda-python-emscripten.dockerfile + cache_from: + - ${REPO}:${ARCH}-conda-python-${PYTHON} + args: + repo: ${REPO} + arch: ${ARCH} + clang_tools: ${CLANG_TOOLS} + llvm: ${LLVM} + emscripten_version: "3.1.58" # must use correct emscripten for pyodide version + pyodide_version: "0.26.0" + chrome_version: "122" + selenium_version: "4.15.2" + required_python_min: "(3,12)" + python: ${PYTHON} + shm_size: *shm-size + volumes: *ubuntu-volumes + environment: + <<: [*common, *ccache, *sccache, *cpp] + ARROW_EMSCRIPTEN: "ON" + ARROW_DEPENDENCY_SOURCE: "BUNDLED" + UBUNTU: "22.04" + PYODIDE_VERSION: "0.26.0" + ARROW_BUILD_TYPE: "release" + ARROW_BUILD_TESTS: "off" + command: [" + source ~/.nvm/nvm.sh && Review Comment: Shouldn't the `source` command be part of the script(s) below wherever necessary? ########## python/pyarrow/_dataset_parquet.pyx: ########## @@ -787,6 +789,8 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): @pre_buffer.setter def pre_buffer(self, bint pre_buffer): + if pre_buffer and not is_threading_enabled(): Review Comment: Same question here ########## ci/scripts/python_test_emscripten.sh: ########## @@ -0,0 +1,35 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# run tests against Chrome and node.js as representative +# WebAssembly platforms (i.e. one browser, one non-browser). + +set -ex +build_dir=${1}/python +cd ${build_dir} + +pyodide_dist_dir=${2} + +# note: this assumes that there is only one wheel built into dist +# (which is true if you build using emscripten_python_build.sh) Review Comment: This may turn out false in developers' environment, as opposed to CI. Can we make this stricter and less error-prone? ########## ci/docker/conda-python-emscripten.dockerfile: ########## @@ -0,0 +1,83 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +ARG repo +ARG arch +ARG python="3.12" +FROM ${repo}:${arch}-conda-python-${python} + +ARG selenium_version="4.15.2" +ARG pyodide_version="0.26.0" +ARG emscripten_version="3.1.58" +ARG chrome_version="latest" +ARG required_python_min="(3,12)" +# fail if python version < 3.12 +RUN echo "check PYTHON>=${required_python_min}" && python -c "import sys;sys.exit(0 if sys.version_info>=${required_python_min} else 1)" + +# install selenium and pyodide-build and recent python + +# needs to be a login shell so ~/.profile is read +SHELL ["/bin/bash","--login","-c"] + +RUN python --version && \ Review Comment: Why the `python --version`? ########## python/pyarrow/_dataset_parquet.pyx: ########## @@ -737,6 +737,8 @@ cdef class ParquetFragmentScanOptions(FragmentScanOptions): new CParquetFragmentScanOptions())) self.use_buffered_stream = use_buffered_stream self.buffer_size = buffer_size + if pre_buffer and not is_threading_enabled(): + pre_buffer = False Review Comment: Shouldn't we instead raise an error if the setting is incompatible with emscripten? cc @jorisvandenbossche -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
