Hi Joris, Thanks for replying!
No but dataset isn’t the only thing missing here. It is also complaining that pyarrow.fs is not a package. Of course it is actually a module. Moreover the build process errors out and no docs appear at all so these can’t simply be ignored. As for sphinx-build I can do some testing tonight. If it works I will file the PR. I guess I should also remove the manual installation from the CI, right? On a separate subject why do we have ORC support off by default? Ian On Monday, March 21, 2022, Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > On Sun, 20 Mar 2022 at 07:50, Ian Joiner <iajoiner...@gmail.com> wrote: > > > Hi, > > > > I’d like to ask about how the documentation is built. I have followed the > > instructions to build and install the C++ and Python libraries in my > > virtual environment and then followed the instructions for building the > > documentation. However unfortunately it failed. Moreover manually adding > to > > sys.path in conf.py to help it find the packages doesn’t help either. In > > case people wonder, the development version of pyarrow has been installed > > in the virtual environment. > > > > (pyarrow-dev) karlkatzen@chloes docs % make html > > sphinx-build -b html -d _build/doctrees -j8 source _build/html > > Running Sphinx v4.4.0 > > [autosummary] generating autosummary for: c_glib/index.rst, cpp/api.rst, > > cpp/api/array.rst, cpp/api/async.rst, cpp/api/builder.rst, > > cpp/api/c_abi.rst, cpp/api/compute.rst, cpp/api/cuda.rst, > > cpp/api/dataset.rst, cpp/api/datatype.rst, ..., python/json.rst, > > python/memory.rst, python/numpy.rst, python/orc.rst, python/pandas.rst, > > python/parquet.rst, python/plasma.rst, python/timestamps.rst, > r/index.rst, > > status.rst > > WARNING: [autosummary] failed to import pyarrow.dataset.CsvFileFormat. > > Possible hints: > > * AttributeError: module 'pyarrow' has no attribute 'dataset' > > * ImportError: no module named pyarrow.dataset > > * ModuleNotFoundError: No module named 'pyarrow._dataset' > > WARNING: [autosummary] failed to import > > pyarrow.dataset.CsvFragmentScanOptions. > > Possible hints: > > * AttributeError: module 'pyarrow' has no attribute 'dataset' > > * ImportError: no module named pyarrow.dataset > > * ModuleNotFoundError: No module named 'pyarrow._dataset' > > WARNING: [autosummary] failed to import pyarrow.dataset.Dataset. > > Possible hints: > > * AttributeError: module 'pyarrow' has no attribute 'dataset' > > * ImportError: no module named pyarrow.dataset > > * ModuleNotFoundError: No module named 'pyarrow._dataset' > > > > Did you enable the dataset module when building Arrow C++ and pyarrow? > Because the errors above seem to indicate this module is not built. It's > not really a problem for building the docs, though, as it should normally > still generate all other pages (only not the docstring pages for those > methods, and you get a bunch of annoying warnings) > > > > There is also another issue, namely sphinx-tabs not being installable > with > > conda so using ci/conda_env_sphinx.txt only is not adequate in making > docs > > building possible. This is not exactly documented in > > developers/documentation.rst. I wonder whether this is a temporary > > situation or we should actually change the docs to remind people to > install > > it. If we decide to change it I will file a PR to fix that. > > > > Yes, this is a known issue but indeed not really documented. For CI, we are > currently manually installing sphinx-tabs in the docker image for the doc > build. > Now, I think nowadays we can actually install it with conda-forge, so I > *think* that the conda_env_sphinx.tst file can be updated to include it. A > PR for that is certainly welcome. > > Joris > > > > > > Ian >