[ 
https://issues.apache.org/jira/browse/ARROW-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193520#comment-16193520
 ] 

Wes McKinney commented on ARROW-1650:
-------------------------------------

What can you tell me about the Parquet implementation that produced this file 
(I see you said nifi, but can you show me the written-by version string to see 
if it's parquet-mr and if so what version)? I believe that whoever wrote the 
file is using a deprecated list encoding. 

See https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists

The schema shown here is not using the 3-level list encoding structure, instead 
the 2-level structure. We will need to implement support for this in 
parquet-cpp; I don't think it should be too hard but I haven't seen a file like 
this in the wild in a while

> No support for reading columns of type list<array: string not null>
> -------------------------------------------------------------------
>
>                 Key: ARROW-1650
>                 URL: https://issues.apache.org/jira/browse/ARROW-1650
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.7.1
>         Environment: centos 7.3, Anaconda 4.4.0 python 3.6.1
>            Reporter: Luke Higgins
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> While trying to read a parquetfile (written by nifi) I am getting an error.
> code:
> import pyarrow.parquet as pq
> t = pq.read_table('test.parq')
> error:
> Traceback (most recent call last):
>   File "parquet_reader.py", line 2, in <module>
>     t = pq.read_table('test.parq')
>   File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 
> 823, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line 
> 119, in read
>     nthreads=nthreads)
>   File "pyarrow/_parquet.pyx", line 466, in 
> pyarrow._parquet.ParquetReader.read_all 
> (/arrow/python/build/temp.linux-x86_64-3.6/_parquet.cxx:9181)
>   File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8115)
> pyarrow.lib.ArrowNotImplementedError: No support for reading columns of type 
> list<array: string not null>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to