Hi all,

Uber EngBlog site just pushed two articles about Apache Parquet: Cost
Efficiency @ Scale in Big Data File Format
<https://eng.uber.com/cost-efficiency-big-data/> and One Stone, Three
Birds: Finer-Grained Encryption @ Apache Parquetâ„¢
<https://eng.uber.com/one-stone-three-birds-finer-grained-encryption-apache-parquet/>.
Please checkout out!


The first one is about how to use Parquet ZSTD, Column Prunning(deletion)
tool, Precision Reduction, Multi-Column Ordering, and fast translation tool
in Parquet to reduce storage space to improve cost efficiency. This project
alone saves the storage size at hundred PB level which is equivalent to
several millions of dollars savings per year.

The second one talks about using Apache Parquet's fine-grained encryption
feature to solve three challenges: encryption, access control, and data
retention! This wraps up the work we have done with the community in the
last 3 years around Parquet Modular Encryption. I would like to thank Gidon
for his continuous collaborations with us!

If you have any questions about the blog, feel free to reach out!

Xinli Shang

Tech Lead Manager at Uber Data Infra

VP Apache Parquet PMC Chair

Reply via email to