jorisvandenbossche commented on PR #34616:
URL: https://github.com/apache/arrow/pull/34616#issuecomment-1678630935

   It does _build_, but I think with the current changes to the dataset cython 
code, I would still expect that it will raise an error at run-time. 
   
   Testing that locally while fetching this branch to review it, I first build 
it with my normal setup that has parquet enabled but not encryption, and then 
my build indeed passes, but trying to use Parquet through the dataset API 
raises an error:
   
   ```
   In [1]: import pyarrow.dataset as ds
   
   In [2]: ds.dataset("test.parquet")
   ...
   
   File ~/scipy/repos/arrow/python/pyarrow/dataset.py:298, in 
_ensure_format(obj)
       296 elif obj == "parquet":
       297     if not _parquet_available:
   --> 298         raise ValueError(_parquet_msg)
       299     return ParquetFileFormat()
       300 elif obj in {"ipc", "arrow"}:
   
   ValueError: The pyarrow installation is not built with support for the 
Parquet file format.
   ```
   
   It thinks that Parquet is not available, because it cannot import then 
``pyarrow._dataset_parquet`` cython module. And this is because that module now 
tries to import the encryption module, which isn't available in my installation:
   
   ```
   In [3]: import pyarrow._dataset_parquet
   ...
   ModuleNotFoundError: No module named 'pyarrow._parquet_encryption'
   ```
   
   To verify this (and also in general ensure this is covered), we should check 
with one of the CI builds that have parquet encryption disabled (although given 
all builds are green, we might lack such CI build).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to