Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

via GitHub Mon, 20 May 2024 02:08:16 -0700


marcin-krystianc commented on issue #5770:
URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2120014743


   > > that we built an entire separate system similar
   > 
   > My reading of https://github.com/G-Research/PalletJack is it is filling a 
similar role to a catalog like Hive, Deltalake or iceberg. It makes sense, at 
least to me, that applications would want to build additional metadata 
structures over the top of collections of parquet files, that are then 
optimised for their particular read/write workloads?
   
   PalletJack is on a lower level than catalogs like Hive, Deltalake or 
Iceberg. It is designed for use with a individual parquet files for a specific 
use case of "To be able to decode/parse only a minimum amount of parquet 
metadata to be able to read the requested sample of data from a parquet file". 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Report / blog on parquet metadata sizes for "large" (1000+) numbers of columns [arrow-rs]

Reply via email to