[jira] [Commented] (ARROW-17459) [C++] Support nested data conversions for chunked array

Arthur Passos (Jira) Tue, 30 Aug 2022 05:36:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597846#comment-17597846
 ]


Arthur Passos commented on ARROW-17459:
---------------------------------------

[~willjones127] Thank you for sharing this!

 

While your `GetRecordBatchReader` suggestion works for the use case I shared, 
it won't work for this one. Are there any docs I could read to understand the 
internals of arrow lib in order to implement it? Any tips would be 
appreciated.. The only thing that comes to mind right now is to somehow build a 
giant array with all the chunks, but it certainly has a set of implications.

> [C++] Support nested data conversions for chunked array
> -------------------------------------------------------
>
>                 Key: ARROW-17459
>                 URL: https://issues.apache.org/jira/browse/ARROW-17459
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Arthur Passos
>            Priority: Blocker
>
> `FileReaderImpl::ReadRowGroup` fails with "Nested data conversions not 
> implemented for chunked array outputs". It fails on 
> [ChunksToSingle]([https://github.com/apache/arrow/blob/7f6b074b84b1ca519b7c5fc7da318e8d47d44278/cpp/src/parquet/arrow/reader.cc#L95])
> Data schema is: 
> {code:java}
>   optional group fields_map (MAP) = 217 {
>     repeated group key_value {
>       required binary key (STRING) = 218;
>       optional binary value (STRING) = 219;
>     }
>   }
> fields_map.key_value.value-> Size In Bytes: 13243589 Size In Ratio: 0.20541047
> fields_map.key_value.key-> Size In Bytes: 3008860 Size In Ratio: 0.046667963
> {code}
> Is there a way to work around this issue in the cpp lib?
> In any case, I am willing to implement this, but I need some guidance. I am 
> very new to parquet (as in started reading about it yesterday).
>  
> Probably related to: https://issues.apache.org/jira/browse/ARROW-10958



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17459) [C++] Support nested data conversions for chunked array

Reply via email to