[ 
https://issues.apache.org/jira/browse/PARQUET-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941857#comment-15941857
 ] 

Wes McKinney commented on PARQUET-918:
--------------------------------------

Since we want to support nested data reads in libparquet_arrow, it would make 
sense to write some tests cases and fix the schema conversion. We won't be able 
to _read_ the data yet, so pruning unsupported columns also seems reasonable. 
Are you interested in putting together a patch? Thanks

> FromParquetSchema API crashes on nested schemas
> -----------------------------------------------
>
>                 Key: PARQUET-918
>                 URL: https://issues.apache.org/jira/browse/PARQUET-918
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>    Affects Versions: cpp-1.0.0
>            Reporter: Itai Incze
>
> {{FromParquetSchema@src/parquet/arrow/schema.cc:276}} misbehaves by using its 
> column_indices parameter in the second version of the function as indices to 
> the direct schema root fields. 
> This is problematic with nested schema parquet files - the bug crashes the 
> process by accessing the fields vector out of bounds.
> This bug is masked by another bug in the first version of the 
> {{FromParquetSchema}} function which constructs a complete indices list the 
> size of the number of schema fields (instead of the # of columns).
> The bug is triggered in many significant use-cases, for example when using 
> the {{arrow::ReadTable}} API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to