Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

via GitHub Fri, 17 May 2024 10:56:08 -0700


thinkharderdev commented on issue #5770:
URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2118124262


   > > I don't think so because you need the column chunk metadata to actually 
read the column.
   > 
   > 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L863-L867
 `ColumnChunk` is neccessary, but `ColumnChunkMeta` is `optional`. This might 
making io-estimating a bit tricky but it making `ColumnChunkMeta` not in 
footer? And you're right because we need the column chunk metadata to actually 
read the column.
   
   Yeah, you're right and I think I just misunderstood the original comment (it 
gets confusing between the rust types in parquet and the thrift structs defined 
by the parquet thrift IDL :)). But I guess you could just have the column chunk 
as the bare minimum of just an offset and path and then just do decoding purely 
based on the data page headers. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

Reply via email to