[C++] Unsupported encodings in reading Parquet files

Shan Huang Mon, 31 May 2021 03:13:10 -0700

Hi,

PyArrow throws an exception when reading Parquet file generated from the
version 2.0 of Parquet writer in Hive:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
1732, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
1610, in read
    use_threads=use_threads
  File "pyarrow/_dataset.pyx", line 458, in
pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2889, in
pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 141, in
pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
OSError: Not yet implemented: Unsupported encoding.

I notice that there are several unsupported encodings as described in:
https://arrow.apache.org/docs/cpp/parquet.html#encodings

Is there any plan to support these encodings in the near future? If not, I
would like to try to implement it by myself. Any advice would be
appreciated!

Best regards,
Shan Huang

[C++] Unsupported encodings in reading Parquet files

Reply via email to