Thanks for the feedback so far! I have a PR up to fix the segfault issue.

https://github.com/apache/datafusion-ballista/pull/1769

On Sun, May 24, 2026 at 2:47 PM Kevin Liu <[email protected]> wrote:
>
> Found an issue with running on MacOS for `ballista==53.0.0` wheels on
> TestPyPI. Tested on linux successfully, passed all the examples from
> python/README.
>
> MacOS Issue:
> Constructing the in-process BallistaScheduler or BallistaExecutor
> segfaults. Reproduced on Python 3.10 and 3.12, with both uv and pip/venv.
> Minimal repro on MacOS:
> ```
>     uv run --python 3.10 --with "ballista==53.0.0" \
>         --index-url https://test.pypi.org/simple/ \
>         --extra-index-url https://pypi.org/simple/ \
>         --index-strategy unsafe-best-match \
>         python -X faulthandler -u -c \
>         "from ballista import BallistaScheduler; BallistaScheduler()"
> ```
> Output:
> """
> Fatal Python error: Segmentation fault
>
> Current thread 0x00000001effd1e80 (most recent call first):
>   File "<string>", line 1 in <module>
>
> Extension modules: pyarrow.lib (total: 1)
> """
>
> Linux via docker:
> ```
> docker run --rm --platform linux/amd64 -v "$PWD/python/testdata:/data:ro"
> python:3.10-slim bash -c '
>   pip install --quiet --index-url https://test.pypi.org/simple/ \
>       --extra-index-url https://pypi.org/simple/ "ballista==53.0.0"
> datafusion &&
>   python -u <<PY
> from ballista import BallistaSessionContext, BallistaScheduler,
> BallistaExecutor
> from datafusion import col, lit
> import time
>
> sched = BallistaScheduler(bind_port=50050); sched.start()
> execu = BallistaExecutor(bind_port=50051, scheduler_port=50050);
> execu.start()
> time.sleep(3)
>
> ctx = BallistaSessionContext("df://localhost:50050")
> ctx.sql("create external table t stored as parquet location
> \"/data/test.parquet\"")
> ctx.sql("select * from t limit 5").show()
> ctx.sql("select count(*) as n from t").show()
>
> df = ctx.read_parquet("/data/test.parquet").filter(col("id") >
> lit(4)).limit(5)
> batches = df.collect()
> print("rows:", sum(b.num_rows for b in batches))
>
> execu.close(); sched.close()
> PY'
> ```
> Outputs:
> """
> DataFrame()
> +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+
> | id | bool_col | tinyint_col | smallint_col | int_col | bigint_col |
> float_col | double_col | date_string_col  | string_col | timestamp_col
>   |
> +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+
> | 4  | true     | 0           | 0            | 0       | 0          | 0.0
>     | 0.0        | 30332f30312f3039 | 30         | 2009-03-01T00:00:00 |
> | 5  | false    | 1           | 1            | 1       | 10         | 1.1
>     | 10.1       | 30332f30312f3039 | 31         | 2009-03-01T00:01:00 |
> | 6  | true     | 0           | 0            | 0       | 0          | 0.0
>     | 0.0        | 30342f30312f3039 | 30         | 2009-04-01T00:00:00 |
> | 7  | false    | 1           | 1            | 1       | 10         | 1.1
>     | 10.1       | 30342f30312f3039 | 31         | 2009-04-01T00:01:00 |
> | 2  | true     | 0           | 0            | 0       | 0          | 0.0
>     | 0.0        | 30322f30312f3039 | 30         | 2009-02-01T00:00:00 |
> +----+----------+-------------+--------------+---------+------------+-----------+------------+------------------+------------+---------------------+
> DataFrame()
> +---+
> | n |
> +---+
> | 8 |
> +---+
> rows: 3
> """
>
> Best,
> Kevin Liu
>
> On Sun, May 24, 2026 at 7:49 AM Shekhar Rajak <[email protected]>
> wrote:
>
> > +1 (non-binding) — installed from test.pypi.org and ran the smoke import.
> > $ pip show ballista
> > python -c "import ballista; print(ballista.__version__)"
> > Name: ballista
> > Version: 53.0.0
> > Summary: Python client for Apache Arrow Ballista Distributed SQL Query
> > Engine
> > Home-page: https://datafusion.apache.org/ballista/
> > Author:
> > Author-email:
> > License:
> > Location: /private/tmp/ballista-rc-verify/lib/python3.13/site-packages
> > Requires: datafusion, pyarrow
> > Required-by:  53.0.0
> > Result:  `from ballista import BallistaSessionContext; print('ok')` -> ok
> >
> >
> >
> >
> > Regards,
> > Shekharrajak
> >
> >
> >     On Sunday 24 May 2026 at 06:53:18 am GMT+5:30, Andy Grove <
> > [email protected]> wrote:
> >
> >  I have published a test version of Ballista to test.pypi.org [1] and I
> > am looking for help testing this.
> >
> > Instructions for installing Ballista from test.pypi.org can be found
> > in the release verification documentation [2].
> >
> > Please note that this is NOT an official Apache release. This is a
> > test of the new PyPi publishing process.
> >
> > This release was built from GitHub tag 53.0.0-rc1-pypitest-3.
> >
> > I plan on creating an official 53.x.x release to PyPi pretty soon,
> > once I have feedback from this test.
> >
> > Thanks,
> >
> > Andy.
> >
> > [1] https://test.pypi.org/project/ballista/
> > [2]
> > https://github.com/apache/datafusion-ballista/blob/main/dev/release/README.md#optional-verify-the-python-wheels-from-testpypi
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to