I just updated pyarrow from 0.9.0 to 0.10.0 and I got a name error while trying
to read a directory containing parquet files.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-5-d0f3e11a6edb> in <module>()
----> 1 _ = pq.ParquetDataset(path).read(nthreads=8)
~/Programs/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py in
read(self, columns, nthreads, use_pandas_metadata)
896 partitions=self.partitions,
897 open_file_func=open_file,
--> 898
use_pandas_metadata=use_pandas_metadata)
899 tables.append(table)
900
~/Programs/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py in
read(self, columns, nthreads, partitions, open_file_func, file,
use_pandas_metadata)
484 dictionary = partitions.levels[i].dictionary
485
--> 486 arr = lib.DictionaryArray.from_arrays(indices,
dictionary)
487 col = lib.Column.from_array(name, arr)
488 table = table.append_column(col)
~/Programs/miniconda3/lib/python3.6/site-packages/pyarrow/array.pxi in
pyarrow.lib.DictionaryArray.from_arrays()
~/Programs/miniconda3/lib/python3.6/site-packages/pyarrow/array.pxi in
pyarrow.lib.array()
NameError: name 'pdcompat' is not defined
The module only produces this error when I use it in Jupyter notebook. It works
fine in terminal python.
[ Full content available at: https://github.com/apache/arrow/issues/2455 ]
This message was relayed via gitbox.apache.org for [email protected]