HI Shan, It is something we would like to support but nobody has had bandwidth to work on them. So contributions would be welcome.
I think there are already some open JIRAs on this. e.g [1]. Note these are tracked under the Parquet project in JIRA. -Micah [1] https://issues.apache.org/jira/browse/PARQUET-490 On Mon, May 31, 2021 at 3:13 AM Shan Huang <shanhuu...@gmail.com> wrote: > Hi, > > PyArrow throws an exception when reading Parquet file generated from the > version 2.0 of Parquet writer in Hive: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line > 1732, in read_table > use_pandas_metadata=use_pandas_metadata) > File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line > 1610, in read > use_threads=use_threads > File "pyarrow/_dataset.pyx", line 458, in > pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 2889, in > pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 141, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status > OSError: Not yet implemented: Unsupported encoding. > > I notice that there are several unsupported encodings as described in: > https://arrow.apache.org/docs/cpp/parquet.html#encodings > > Is there any plan to support these encodings in the near future? If not, I > would like to try to implement it by myself. Any advice would be > appreciated! > > Best regards, > Shan Huang >