jorisvandenbossche edited a comment on pull request #10991:
URL: https://github.com/apache/arrow/pull/10991#issuecomment-915999303


   > Well, it seems it should be as simple as:
   
   Ah, indeed it is. I was assuming that since the class is being used in the 
cdef `wrap` method, the class would have to be cimported (which wouldn't be 
possible in such a conditional import).
   
   Now, there is still some circular import issue. 
   For normal usage, everything seems fine. But if you import 
`pyarrow._dataset_orc` directly (and before importing `pyarrow.dataset`) in 
python, you get errors. I don't know exactly how the import machinery is 
working at this point (and the interaction with the cython-generated module), 
but so it seems that when importing `pyarrow._dataset_orc`, it also (logically) 
does import `pyarrow._dataset`. And at that point, `pyarrow._dataset` cannot 
import OrcFileFormat, and it will be set to None (leading to 
`FileFormat.wrap(..)` not recognizing an orc file format). But if you first 
import `pyarrow._dataset` (which will import `pyarrow._dataset_orc`), then 
everything is fine:
   
   ```python
   # working with only import pyarrow.dataset
   >>> import pyarrow.dataset as ds
   >>> dataset = ds.dataset("test.orc", format="orc")
   >>> dataset.format
   <pyarrow._dataset_orc.OrcFileFormat at 0x7f203ddc78b0>
   ```
   
   vs
   
   ```python
   # not working when first import pyarrow._dataset_orc
   >>> from pyarrow._dataset_orc import OrcFileFormat
   cannot import name OrcFileFormat   # print statement added in _dataset.pyx 
to print ImportError
   >>> import pyarrow.dataset as ds
   >>> dataset = ds.dataset("test.orc", format="orc")
   >>> dataset.format
   ...
   ~/scipy/repos/arrow/python/pyarrow/_dataset.pyx in 
pyarrow._dataset.FileFormat.wrap()
   TypeError: orc
   ```
   
   Looking at the generated cpp code, the difference might be that for 
`_dataset_orc`, the import of `_dataset` happens at module initialization, 
while for `_dataset`, the import of `_dataset_orc` is a runtime import. 
   
   Would there be a way to resolve this issue? Or are we fine to accept this 
limitation? (in practice, a user should never import from the private 
`_dataset_orc` module, but it's a confusing error message if you did for some 
reason)
   
   UPDATE: solved with importing it only when needed (on first usage in 
`FileFormat.wrap`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to