[
https://issues.apache.org/jira/browse/ARROW-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193298#comment-16193298
]
Luke Higgins edited comment on ARROW-1650 at 10/5/17 5:45 PM:
--------------------------------------------------------------
sorry, not easily. Here is the schema of the file erroring out (seen from
fastparquet):
>>> t = fastparquet.ParquetFile('test.parq')
>>> type(t)
<class 'fastparquet.api.ParquetFile'>
>>> print(t.schema)
{code:java}
-avro.mylog:
| - field0: BYTE_ARRAY, UTF8, REQUIRED
| - field1: BYTE_ARRAY, UTF8, REQUIRED
| - field2: INT64, REQUIRED
| - field3: BYTE_ARRAY, UTF8, REQUIRED
| - field4: BYTE_ARRAY, UTF8, REQUIRED
| - field5: INT64, REQUIRED
| - field6: BYTE_ARRAY, UTF8, REQUIRED
| - field7: INT64, REQUIRED
| - field8: BOOLEAN, OPTIONAL
| - field9: BOOLEAN, OPTIONAL
| - field10: INT64, REQUIRED
| - field11: BYTE_ARRAY, UTF8, OPTIONAL
| - field12: BYTE_ARRAY, UTF8, OPTIONAL
| - field13: BYTE_ARRAY, UTF8, OPTIONAL
| - field14: BYTE_ARRAY, UTF8, OPTIONAL
| - field15: BYTE_ARRAY, UTF8, OPTIONAL
| - field16: BYTE_ARRAY, UTF8, OPTIONAL
| - field17: INT64, OPTIONAL
| - field18: INT64, OPTIONAL
| - field19: INT64, OPTIONAL
| - field20: BYTE_ARRAY, UTF8, OPTIONAL
| - field21: INT64, OPTIONAL
| - field22: BYTE_ARRAY, UTF8, OPTIONAL
| - field23: LIST, REQUIRED
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field24: BYTE_ARRAY, UTF8, OPTIONAL
| - field25: BYTE_ARRAY, UTF8, OPTIONAL
| - field26: BOOLEAN, OPTIONAL
| - field27: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field28: BOOLEAN, OPTIONAL
| - field29: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field30: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field31: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field32: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field33: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field34: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field35: BYTE_ARRAY, UTF8, OPTIONAL
| - field36: INT64, OPTIONAL
| - field37: INT64, OPTIONAL
| - field38: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field39: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field40: BOOLEAN, OPTIONAL
| - field41: BYTE_ARRAY, UTF8, OPTIONAL
| - field42: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
- field43: LIST, OPTIONAL
- array: BYTE_ARRAY, UTF8, REPEATED
{code}
was (Author: virtualluke):
sorry, not easily. Here is the schema of the file erroring out (seen from
fastparquet):
>>> t = fastparquet.ParquetFile('test.parq')
>>> type(t)
<class 'fastparquet.api.ParquetFile'>
>>> print(t.schema)
'''
-avro.mylog:
| - field0: BYTE_ARRAY, UTF8, REQUIRED
| - field1: BYTE_ARRAY, UTF8, REQUIRED
| - field2: INT64, REQUIRED
| - field3: BYTE_ARRAY, UTF8, REQUIRED
| - field4: BYTE_ARRAY, UTF8, REQUIRED
| - field5: INT64, REQUIRED
| - field6: BYTE_ARRAY, UTF8, REQUIRED
| - field7: INT64, REQUIRED
| - field8: BOOLEAN, OPTIONAL
| - field9: BOOLEAN, OPTIONAL
| - field10: INT64, REQUIRED
| - field11: BYTE_ARRAY, UTF8, OPTIONAL
| - field12: BYTE_ARRAY, UTF8, OPTIONAL
| - field13: BYTE_ARRAY, UTF8, OPTIONAL
| - field14: BYTE_ARRAY, UTF8, OPTIONAL
| - field15: BYTE_ARRAY, UTF8, OPTIONAL
| - field16: BYTE_ARRAY, UTF8, OPTIONAL
| - field17: INT64, OPTIONAL
| - field18: INT64, OPTIONAL
| - field19: INT64, OPTIONAL
| - field20: BYTE_ARRAY, UTF8, OPTIONAL
| - field21: INT64, OPTIONAL
| - field22: BYTE_ARRAY, UTF8, OPTIONAL
| - field23: LIST, REQUIRED
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field24: BYTE_ARRAY, UTF8, OPTIONAL
| - field25: BYTE_ARRAY, UTF8, OPTIONAL
| - field26: BOOLEAN, OPTIONAL
| - field27: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field28: BOOLEAN, OPTIONAL
| - field29: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field30: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field31: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field32: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field33: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field34: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field35: BYTE_ARRAY, UTF8, OPTIONAL
| - field36: INT64, OPTIONAL
| - field37: INT64, OPTIONAL
| - field38: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field39: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
| - field40: BOOLEAN, OPTIONAL
| - field41: BYTE_ARRAY, UTF8, OPTIONAL
| - field42: LIST, OPTIONAL
| - array: BYTE_ARRAY, UTF8, REPEATED
- field43: LIST, OPTIONAL
- array: BYTE_ARRAY, UTF8, REPEATED
'''
> No support for reading columns of type list<array: string not null>
> -------------------------------------------------------------------
>
> Key: ARROW-1650
> URL: https://issues.apache.org/jira/browse/ARROW-1650
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Environment: centos 7.3, Anaconda 4.4.0 python 3.6.1
> Reporter: Luke Higgins
> Priority: Minor
> Fix For: 0.8.0
>
>
> While trying to read a parquetfile (written by nifi) I am getting an error.
> code:
> import pyarrow.parquet as pq
> t = pq.read_table('test.parq')
> error:
> Traceback (most recent call last):
> File "parquet_reader.py", line 2, in <module>
> t = pq.read_table('test.parq')
> File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line
> 823, in read_table
> use_pandas_metadata=use_pandas_metadata)
> File "/opt/anaconda3/lib/python3.6/site-packages/pyarrow/parquet.py", line
> 119, in read
> nthreads=nthreads)
> File "pyarrow/_parquet.pyx", line 466, in
> pyarrow._parquet.ParquetReader.read_all
> (/arrow/python/build/temp.linux-x86_64-3.6/_parquet.cxx:9181)
> File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8115)
> pyarrow.lib.ArrowNotImplementedError: No support for reading columns of type
> list<array: string not null>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)