marcin-krystianc commented on issue #5770: URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2120014743
> > that we built an entire separate system similar > > My reading of https://github.com/G-Research/PalletJack is it is filling a similar role to a catalog like Hive, Deltalake or iceberg. It makes sense, at least to me, that applications would want to build additional metadata structures over the top of collections of parquet files, that are then optimised for their particular read/write workloads? PalletJack is on a lower level than catalogs like Hive, Deltalake or Iceberg. It is designed for use with a individual parquet files for a specific use case of "To be able to decode/parse only a minimum amount of parquet metadata to be able to read the requested sample of data from a parquet file". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
