[ 
https://issues.apache.org/jira/browse/PARQUET-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358776#comment-16358776
 ] 

Luke Higgins commented on PARQUET-1122:
---------------------------------------

In our nifi config we were able to find how to set 
parquet.avro.write-old-list-structure to false (suggested by Ryan Blue above 
but I didn't think we had that as an option).  I now can get a little further.
{quote}>>>import pyarrow.parquet as pq

>>>pf = pq.ParquetFile(filename)

>>> pf.metadata
<pyarrow._parquet.FileMetaData object at 0x7f6bf3588f98>
 created_by: parquet-mr version 1.8.2 (build 
c6522788629e590a53eb79874b95f6c3ff11f16c)
 num_columns: 45
 num_rows: 223240
 num_row_groups: 1
 format_version: 1.0
 serialized_size: 8483

>>table = pq.read_table(filename)

>>> table.to_pandas()
Empty DataFrame
Columns: [field1, ...field45]]  #fields are listed in output, just suppressing 
here
Index: []

[0 rows x 45 columns]


{quote}
 So I don't get the error on read_table but the converted df is empty (even 
though the ParquetFile.metadata has rows).

Any other thoughts on config changes?

 

> [C++] Support 2-level list encoding in Arrow decoding
> -----------------------------------------------------
>
>                 Key: PARQUET-1122
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1122
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>         Environment: centos 7.3, Anaconda 4.4.0 python 3.6.1
>            Reporter: Luke Higgins
>            Priority: Minor
>             Fix For: cpp-1.4.0
>
>
> While trying to read a parquetfile (written by nifi) I am getting an error.
> code:
> import pyarrow.parquet as pq
> t = pq.read_table('test.parq')
> error:
> Traceback (most recent call last):
>   File "parquet_reader.py", line 2, in <module>
>     t = pq.read_table('test.parq')
>   File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 
> 823, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 
> 119, in read
>     nthreads=nthreads)
>   File "pyarrow/_parquet.pyx", line 466, in 
> pyarrow._parquet.ParquetReader.read_all 
> (/arrow/python/build/temp.linux-x86_64-3.6/_parquet.cxx:9181)
>   File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8115)
> pyarrow.lib.ArrowNotImplementedError: No support for reading columns of type 
> list<array: string not null>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to