tustvold commented on issue #5770: URL: https://github.com/apache/arrow-rs/issues/5770#issuecomment-2117079521
> we use parquet files with 100 row groups and 50k columns (and this is after the dataset has been split into many individual parquet files). What is worse, our use case is reading individual row groups and only subset of columns. How big are these files? At 1MB per page (the recommended size) this would come to 5TB?! For extremely sparse data, such as feature stores, I can't help wondering if MapArray or similar would be a better way to encode this. You would lose the ability to do projection pushdown in the traditional sense, but maybe this is ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
