Two blogs about Apache Parquet were just published on the Uber EngBlog site

Xinli shang Fri, 11 Mar 2022 07:43:59 -0800

Hi all,

Uber EngBlog site just pushed two articles about Apache Parquet: Cost
Efficiency @ Scale in Big Data File Format
<https://eng.uber.com/cost-efficiency-big-data/> and One Stone, Three
Birds: Finer-Grained Encryption @ Apache Parquet™
<https://eng.uber.com/one-stone-three-birds-finer-grained-encryption-apache-parquet/>.
Please checkout out!



The first one is about how to use Parquet ZSTD, Column Prunning(deletion)
tool, Precision Reduction, Multi-Column Ordering, and fast translation tool
in Parquet to reduce storage space to improve cost efficiency. This project
alone saves the storage size at hundred PB level which is equivalent to
several millions of dollars savings per year.

The second one talks about using Apache Parquet's fine-grained encryption
feature to solve three challenges: encryption, access control, and data
retention! This wraps up the work we have done with the community in the
last 3 years around Parquet Modular Encryption. I would like to thank Gidon
for his continuous collaborations with us!

If you have any questions about the blog, feel free to reach out!

Xinli Shang

Tech Lead Manager at Uber Data Infra

VP Apache Parquet PMC Chair

Two blogs about Apache Parquet were just published on the Uber EngBlog site

Reply via email to