HI Shan,
It is something we would like to support but nobody has had bandwidth to
work on them.  So contributions would be welcome.

I think there are already some open JIRAs on this.  e.g [1].  Note these
are tracked under the Parquet project in JIRA.

-Micah

[1] https://issues.apache.org/jira/browse/PARQUET-490

On Mon, May 31, 2021 at 3:13 AM Shan Huang <shanhuu...@gmail.com> wrote:

> Hi,
>
> PyArrow throws an exception when reading Parquet file generated from the
> version 2.0 of Parquet writer in Hive:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
> 1732, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File "/usr/local/lib/python3.7/site-packages/pyarrow/parquet.py", line
> 1610, in read
>     use_threads=use_threads
>   File "pyarrow/_dataset.pyx", line 458, in
> pyarrow._dataset.Dataset.to_table
>   File "pyarrow/_dataset.pyx", line 2889, in
> pyarrow._dataset.Scanner.to_table
>   File "pyarrow/error.pxi", line 141, in
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
> OSError: Not yet implemented: Unsupported encoding.
>
> I notice that there are several unsupported encodings as described in:
> https://arrow.apache.org/docs/cpp/parquet.html#encodings
>
> Is there any plan to support these encodings in the near future? If not, I
> would like to try to implement it by myself. Any advice would be
> appreciated!
>
> Best regards,
> Shan Huang
>

Reply via email to