The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Sally Khudairi Thu, 04 Jun 2020 06:06:40 -0700

[this announcement is available online at https://s.apache.org/odtwv ]


Open Source data lake technology for stream processing on top of Apache Hadoop 
in use at Alibaba, Tencent, Uber, and more.

Wakefield, MA --4 June 2020-- The Apache Software Foundation (ASF), the 
all-volunteer developers, stewards, and incubators of more than 350 Open Source 
projects and initiatives, announced today Apache® Hudi™ as a Top-Level Project 
(TLP).

Apache Hudi (Hadoop Upserts Deletes and Incrementals) data lake technology 
enables stream processing on top of Apache Hadoop compatible cloud stores & 
distributed file systems. The project was originally developed at Uber in 2016 
(code-named and pronounced "Hoodie"), open-sourced in 2017, and submitted to 
the Apache Incubator in January 2019.

"Learning and growing the Apache way in the incubator was a rewarding 
experience," said Vinoth Chandar, Vice President of Apache Hudi. "As a 
community, we are humbled by how far we have advanced the project together, 
while at the same time, excited about the challenges ahead."

Apache Hudi is used to manage petabyte-scale data lakes using stream processing 
primitives like upserts and incremental change streams on Apache Hadoop 
Distributed File System (HDFS) or cloud stores. Hudi data lakes provide fresh 
data while being an order of magnitude efficient over traditional batch 
processing. Features include:

 - Upsert/Delete support with fast, pluggable indexing
 - Transactionally commit/rollback data
 - Change capture from Hudi tables for stream processing
 - Support for Apache Hive, Apache Spark, Apache Impala and Presto query engines
 - Built-in data ingestion tool supporting Apache Kafka, Apache Sqoop and other 
common data sources
 - Optimize query performance by managing file sizes, storage layout
 - Fast row based ingestion format with async compaction into columnar format
 - Timeline metadata for audit tracking

Apache Hudi is in use at organizations such as Alibaba Group, EMIS Health, 
Linknovate, Tathastu.AI, Tencent, and Uber, and is supported as part of Amazon 
EMR by Amazon Web Services. A partial list of those deploying Hudi is available 
at https://hudi.apache.org/docs/powered_by.html

"We are very pleased to see Apache Hudi graduate to an Apache Top-Level 
Project. Apache Hudi is supported in Amazon EMR release 5.28 and higher, and 
enables customers with data in Amazon S3 data lakes to perform record-level 
inserts, updates, and deletes for privacy regulations, change data capture 
(CDC), and simplified data pipeline development," said Rahul Pathak, General 
Manager, Analytics, AWS. “We look forward to working with our customers and the 
Apache Hudi community to help advance the project."

"At Uber, Hudi powers one of the largest transactional data lakes on the planet 
in near real time to provide meaningful experiences to users worldwide," said 
Nishith Agarwal, member of the Apache Hudi Project Management Committee. "With 
over 150 petabytes of data and more than 500 billion records ingested per day, 
Uber’s use cases range from business critical workflows to analytics and 
machine learning."

"Using Apache Hudi, end-users can handle either read-heavy or write-heavy use 
cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using 
Apache Parquet and Apache Avro," said Felix Zheng, Lead of Cloud Real-Time 
Computing Service Technology at Tencent.

"As cloud infrastructure becomes more sophisticated, data analysis and 
computing solutions gradually begin to build data lake platforms based on cloud 
object storage and computing resources," said Li Wei, Technical Lead on Data 
Lake Analytics, at Alibaba Cloud. "Apache Hudi is a very good incremental 
storage engine that helps users manage the data in the data lake in an open way 
and accelerate users' computing and analysis."

"Apache Hudi is a key building block for the Hopsworks Feature Store, providing 
versioned features, incremental and atomic updates to features, and indexed 
time-travel queries for features," said Jim Dowling, CEO/Co-Founder at Logical 
Clocks. "The graduation of Hudi to a top-level Apache project is also the 
graduation of the open-source data lake from its earlier data swamp incarnation 
to a modern ACID-enabled, enterprise-ready data platform."

"Hudi's graduation to a top-level Apache project is a result of the efforts of 
many dedicated contributors in the Hudi community," said Jennifer Anderson, 
Senior Director of Platform Engineering at Uber. "Hudi is critical to the 
performance and scalability of Uber's big data infrastructure. We're excited to 
see it gain traction and achieve this major milestone."

"Thus far, Hudi has started a meaningful discussion in the industry about the 
wide gaps between data warehouses and data lakes. We have also taken strides to 
bridge some of them, with the help of the Apache community," added Chandar. 
"But, we are only getting started with our deeply technical roadmap. We 
certainly look forward to a lot more contributions and collaborations from the 
community to get there. Everyone’s invited!"

Catch Apache Hudi in action at Virtual Berlin Buzzwords 7-12 June 2020, as well 
as at MeetUps, and other events.

Availability and Oversight
Apache Hudi software is released under the Apache License v2.0 and is overseen 
by a self-selected team of active contributors to the project. A Project 
Management Committee (PMC) guides the Project's day-to-day operations, 
including community development and product releases. For downloads, 
documentation, and ways to become involved with Apache Hudi, visit 
http://hudi.apache.org/ and https://twitter.com/apachehudi

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases 
wishing to become part of the efforts at The Apache Software Foundation. All 
code donations from external organizations and existing external projects enter 
the ASF through the Incubator to: 1) ensure all donations are in accordance 
with the ASF legal standards; and 2) develop new communities that adhere to our 
guiding principles. Incubation is required of all newly accepted projects until 
a further review indicates that the infrastructure, communications, and 
decision making process have stabilized in a manner consistent with other 
successful ASF projects. While incubation status is not necessarily a 
reflection of the completeness or stability of the code, it does indicate that 
the project has yet to be fully endorsed by the ASF. For more information, 
visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 
leading Open Source projects, including Apache HTTP Server --the world's most 
popular Web server software. Through the ASF's meritocratic process known as 
"The Apache Way," more than 813 individual Members and 7,800 Committers across 
six continents successfully collaborate to develop freely available 
enterprise-grade software, benefiting millions of users worldwide: thousands of 
software solutions are distributed under the Apache License; and the community 
actively participates in ASF mailing lists, mentoring initiatives, and 
ApacheCon, the Foundation's official user conference, trainings, and expo. The 
ASF is a US 501(c)(3) charitable organization, funded by individual donations 
and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web 
Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus. 
Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Inspur, 
Leaseweb, Microsoft, Pineapple Fund, Red Hat, Target, Tencent, Union 
Investment, Verizon Media, and Workday. For more information, visit 
http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hudi", "Apache Hudi", "Hadoop", 
"Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the 
Apache Software Foundation in the United States and/or other countries. All 
other brands and trademarks are the property of their respective owners.

= = =

NOTE: you are receiving this message because you are subscribed to the 
[email protected] distribution list. To unsubscribe, send email from the 
recipient account to [email protected] with the word 
"Unsubscribe" in the subject line.

The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Reply via email to