jorisvandenbossche commented on PR #34616:
URL: https://github.com/apache/arrow/pull/34616#issuecomment-1678630935
It does _build_, but I think with the current changes to the dataset cython
code, I would still expect that it will raise an error at run-time.
Testing that locally while fetching this branch to review it, I first build
it with my normal setup that has parquet enabled but not encryption, and then
my build indeed passes, but trying to use Parquet through the dataset API
raises an error:
```
In [1]: import pyarrow.dataset as ds
In [2]: ds.dataset("test.parquet")
...
File ~/scipy/repos/arrow/python/pyarrow/dataset.py:298, in
_ensure_format(obj)
296 elif obj == "parquet":
297 if not _parquet_available:
--> 298 raise ValueError(_parquet_msg)
299 return ParquetFileFormat()
300 elif obj in {"ipc", "arrow"}:
ValueError: The pyarrow installation is not built with support for the
Parquet file format.
```
It thinks that Parquet is not available, because it cannot import then
``pyarrow._dataset_parquet`` cython module. And this is because that module now
tries to import the encryption module, which isn't available in my installation:
```
In [3]: import pyarrow._dataset_parquet
...
ModuleNotFoundError: No module named 'pyarrow._parquet_encryption'
```
To verify this (and also in general ensure this is covered), we should check
with one of the CI builds that have parquet encryption disabled (although given
all builds are green, we might lack such CI build).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]