Re: [C++] read Parquet columns into 64-bit offset types

Antoine Pitrou Sat, 09 Jan 2021 05:28:51 -0800


This should be possible already, at least on git master but perhaps also
in 2.0.0.  Which problem are you encountering?



Le 09/01/2021 à 05:27, Steve Kim a écrit :
> Is it possible to read Parquet columns into an Arrow schema that has
> variable-width types with 64-bit offsets (LargeBinary, LargeList, etc.)?
> 
> For my current use case, I prefer the large types because the data overflow
> 32-bit offsets, and it is easier to waste memory with 8 bytes per offset
> than it is to work with chunked arrays. (I need to access the Arrow buffers
> from Java, and the Java library does not yet provide a convenient
> abstraction for chunked arrays.)
> 
> I would like an option to use large types when reading Parquet files with
> the Dataset API. My feature request could be satisfied more generally by
> enabling users to specify type coercion/promotion when mapping Parquet
> types to Arrow types.
> 
> Are other users interested in this feature? Is anyone opposed?
> 
> Steve Kim
>

Re: [C++] read Parquet columns into 64-bit offset types

Reply via email to