kylebarron commented on code in PR #994: URL: https://github.com/apache/datafusion-python/pull/994#discussion_r1918910519
########## README.md: ########## @@ -164,61 +170,69 @@ You can verify the installation by running: ## How to develop Review Comment: Just a personal nit that I usually put this in a separate `DEVELOP.md` file because it's not directly relevant to end-users, but it's ok to leave it here too ########## README.md: ########## @@ -164,61 +170,69 @@ You can verify the installation by running: ## How to develop -This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). +This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). The Maturin tools used in this workflow can be installed either via `uv` or `pip`. Both approaches should offer the same experience. It is recommended to use `uv` since it has significant performance improvements +over `pip`. -The Maturin tools used in this workflow can be installed either via Conda or Pip. Both approaches should offer the same experience. Multiple approaches are only offered to appease developer preference. Bootstrapping for both Conda and Pip are as follows. +Bootstrap (`uv`): -Bootstrap (Conda): +By default `uv` will attempt to build the datafusion python package. For our development we prefer to build manually. This means +that when creating your virtual environment using `uv sync` you need to pass in the additional `--no-install-package datafusion` +and for `uv run` commands the additional parameter `--no-project` ```bash # fetch this repo git clone [email protected]:apache/datafusion-python.git -# create the conda environment for dev -conda env create -f ./conda/environments/datafusion-dev.yaml -n datafusion-dev -# activate the conda environment -conda activate datafusion-dev +# create the virtual enviornment +uv sync --dev --no-install-package datafusion +# activate the environment +source .venv/bin/activate ``` -Or alternatively, if you are on an OS that supports CUDA Toolkit, you can use `-f ./conda/environments/datafusion-cuda-dev.yaml`. - -Bootstrap (Pip): +Bootstrap (`pip`): ```bash # fetch this repo git clone [email protected]:apache/datafusion-python.git # prepare development environment (used to build wheel / install in development) -python3 -m venv venv +python3 -m venv .venv # activate the venv -source venv/bin/activate +source .venv/bin/activate # update pip itself if necessary python -m pip install -U pip -# install dependencies (for Python 3.8+) -python -m pip install -r requirements.in +# install dependencies +python -m pip install -r pyproject.toml Review Comment: You can just `pip install .` too. That will read the pep 517-compliant build backend and install dependencies and build datafusion ########## .github/workflows/build.yml: ########## @@ -97,8 +92,15 @@ jobs: version: "27.4" repo-token: ${{ secrets.GITHUB_TOKEN }} + - name: Install dependencies and build Review Comment: This isn't installing dependencies in this step. ########## README.md: ########## @@ -164,61 +170,69 @@ You can verify the installation by running: ## How to develop -This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). +This assumes that you have rust and cargo installed. We use the workflow recommended by [pyo3](https://github.com/PyO3/pyo3) and [maturin](https://github.com/PyO3/maturin). The Maturin tools used in this workflow can be installed either via `uv` or `pip`. Both approaches should offer the same experience. It is recommended to use `uv` since it has significant performance improvements +over `pip`. -The Maturin tools used in this workflow can be installed either via Conda or Pip. Both approaches should offer the same experience. Multiple approaches are only offered to appease developer preference. Bootstrapping for both Conda and Pip are as follows. +Bootstrap (`uv`): -Bootstrap (Conda): +By default `uv` will attempt to build the datafusion python package. For our development we prefer to build manually. This means +that when creating your virtual environment using `uv sync` you need to pass in the additional `--no-install-package datafusion` Review Comment: Ah maybe this is enough docs and you don't need to repeat it in the actions.yml ########## .github/workflows/build.yml: ########## @@ -31,28 +31,31 @@ jobs: - name: Install Python uses: actions/setup-python@v5 with: - python-version: "3.11" + python-version: "3.12" + + - uses: astral-sh/setup-uv@v5 + with: + enable-cache: true + - name: Install dependencies - run: | - python -m pip install --upgrade pip - pip install ruff + run: uv sync --dev --no-install-package datafusion Review Comment: Nit: maybe just add a comment here saying that `--no-install-dependencies` is to only install the dependencies but not build the rust library -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
