Hi all, Uber EngBlog site just pushed two articles about Apache Parquet: Cost Efficiency @ Scale in Big Data File Format <https://eng.uber.com/cost-efficiency-big-data/> and One Stone, Three Birds: Finer-Grained Encryption @ Apache Parquetâ„¢ <https://eng.uber.com/one-stone-three-birds-finer-grained-encryption-apache-parquet/>. Please checkout out!
The first one is about how to use Parquet ZSTD, Column Prunning(deletion) tool, Precision Reduction, Multi-Column Ordering, and fast translation tool in Parquet to reduce storage space to improve cost efficiency. This project alone saves the storage size at hundred PB level which is equivalent to several millions of dollars savings per year. The second one talks about using Apache Parquet's fine-grained encryption feature to solve three challenges: encryption, access control, and data retention! This wraps up the work we have done with the community in the last 3 years around Parquet Modular Encryption. I would like to thank Gidon for his continuous collaborations with us! If you have any questions about the blog, feel free to reach out! Xinli Shang Tech Lead Manager at Uber Data Infra VP Apache Parquet PMC Chair
