Hi,
PyArrow throws an exception when reading Parquet file generated from the
version 2.0 of Parquet writer in Hive:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
1732, in read_table
use_pandas_metadata=use_pandas_metadata)
File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
1610, in read
use_threads=use_threads
File "pyarrow/_dataset.pyx", line 458, in
pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 2889, in
pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 141, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
OSError: Not yet implemented: Unsupported encoding.
I notice that there are several unsupported encodings as described in:
https://arrow.apache.org/docs/cpp/parquet.html#encodings
Is there any plan to support these encodings in the near future? If not, I
would like to try to implement it by myself. Any advice would be
appreciated!
Best regards,
Shan Huang