>> this announcement is available online at http://s.apache.org/L0H


Open Source storage format for the Apache™ Hadoop® ecosystem in use at 
Cloudera, NASA, Netflix, Stripe, and Twitter, among other organizations 

Forest Hill, MD --27 April 2015-- The Apache Software Foundation (ASF), the 
all-volunteer developers, stewards, and incubators of more than 350 Open Source 
projects and initiatives, announced today that Apache™ Parquet™ has graduated 
from the Apache Incubator to become a Top-Level Project (TLP), signifying that 
the project's community and products have been well-governed under the ASF's 
meritocratic process and principles. 

"The incubation process at Apache has been fantastic and really the last step 
of making Parquet a community driven standard fully integrated within the 
greater Hadoop ecosystem," said Julien Le Dem, Vice President of Apache 
Parquet. 

Apache Parquet is an Open Source columnar storage format for the Apache™ 
Hadoop® ecosystem, built to work across programming languages and much more:

 - processing frameworks (MapReduce, Apache Spark, Scalding, Cascading, Crunch, 
Kite) 
 - data models (Apache Avro, Apache Thrift, Protocol Buffers, POJOs) 
 - query engines (Apache Hive, Impala, HAWQ, Apache Drill, Apache Tajo, Apache 
Pig, Presto, Apache Spark SQL) 


"At Twitter, Parquet has helped us scale our big data usage by in some cases 
reducing storage requirements by one third on large datasets as well as scan 
and deserialization time. This translated into hardware savings as well as 
reduced latency for accessing the data. Furthermore, Parquet being integrated 
with so many tools creates opportunities and flexibility regarding query 
engines," said Chris Aniszczyk, Head of Open Source at Twitter. "Finally, it's 
just fantastic to see it graduate to a top-level project and we look forward to 
further collaborating with the Apache Parquet community to continually improve 
performance." 

"Parquet's integration with other object models, like Avro and Thrift, has been 
a key feature for our customers," said Ryan Blue, Software Engineer at 
Cloudera. "They can take advantage of columnar storage without changing the 
classes they already use in their production applications." 

"At Netflix, Parquet is the primary storage format for data warehousing. More 
than 7 petabytes of our 10+ Petabyte warehouse is Parquet formatted data that 
we query across a wide range of tools including Apache Hive, Apache Pig, Apache 
Spark, PigPen, Presto, and native MapReduce. The performance benefit of 
columnar projection and statistics is a game changer for our big data 
platform," said Daniel Weeks, Software Engineer at Netflix. "We look forward to 
working with the Apache community to advance the state of big data storage with 
Parquet and are excited to see the project graduate to full Apache status." 

"Stripe's data warehouse has been built on Parquet from the beginning," said 
Avi Bryant, Engineering Manager at Stripe. "Every aspect of our pipeline, from 
data import to machine learning to adhoc SQL analysis, uses Apache Parquet as 
the common interchange format." 

"I was extremely happy to see Parquet arrive as an Incubator project," said 
Chris Mattmann, Apache Parquet Incubator Mentor, and Chief Architect, 
Instrument and Science Data Systems Section at NASA Jet Propulsion Laboratory. 
"After talking with some in its community there was a real match with this 
columnar data format technology and its community with the way that we do 
things here at the ASF. Parquet has had an exemplar Incubation, and the project 
has big things ahead of it. I am encouraging my Data Science Team at NASA to 
evaluate it for data representation especially as it relates to our science 
holdings in Earth, planetary and space sciences, and astrophysics." 

Catch Apache Parquet in action at the Hadoop Summit, 9-11 June 2015 in San 
Jose, California. The Apache Parquet project welcomes contributions and 
community participation through mailing lists, face-to-face MeetUps, and user 
events. For more information, visit http://parquet.apache.org/community/ 

Availability and Oversight 
Apache Parquet software is released under the Apache License v2.0 and is 
overseen by a self-selected team of active contributors to the project. A 
Project Management Committee (PMC) guides the Project's day-to-day operations, 
including community development and product releases. For downloads, 
documentation, and ways to become involved with Apache Parquet, visit 
http://parquet.apache.org/ and https://twitter.com/ApacheParquet 

About the Apache Incubator 
The Apache Incubator is the entry path for projects and codebases wishing to 
become part of the efforts at The Apache Software Foundation. All code 
donations from external organizations and existing external projects wishing to 
join the ASF enter through the Incubator to: 1) ensure all donations are in 
accordance with the ASF legal standards; and 2) develop new communities that 
adhere to our guiding principles. Incubation is required of all newly accepted 
projects until a further review indicates that the infrastructure, 
communications, and decision making process have stabilized in a manner 
consistent with other successful ASF projects. While incubation status is not 
necessarily a reflection of the completeness or stability of the code, it does 
indicate that the project has yet to be fully endorsed by the ASF. For more 
information, visit http://incubator.apache.org/. 

About The Apache Software Foundation (ASF) 
Established in 1999, the all-volunteer Foundation oversees more than 350 
leading Open Source projects, including Apache HTTP Server --the world's most 
popular Web server software. Through the ASF's meritocratic process known as 
"The Apache Way," more than 500 individual Members and 4,500 Committers 
successfully collaborate to develop freely available enterprise-grade software, 
benefiting millions of users worldwide: thousands of software solutions are 
distributed under the Apache License; and the community actively participates 
in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's 
official user conference, trainings, and expo. The ASF is a US 501(c)(3) 
charitable organization, funded by individual donations and corporate sponsors 
including Bloomberg, Budget Direct, Cerner, Citrix, Cloudera, Comcast, 
Facebook, Google, Hortonworks, HP, IBM, InMotion Hosting, iSigma, Matt 
Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more 
information, visit http://www.apache.org/ or follow @TheASF on Twitter. 

© The Apache Software Foundation. "Apache", "Avro", "Apache Avro", "Drill", 
"Apache Drill", "Hadoop", "Apache Hadoop", "Parquet", "Apache Parquet", "Pig", 
"Apache Pig", "Spark", "Apache Spark", "Thrift", "Apache Thrift", and 
"ApacheCon" are registered trademarks or trademarks of the Apache Software 
Foundation in the United States and/or other countries. All other brands and 
trademarks are the property of their respective owners. 

# # #

NOTE: you are receiving this message because you are subscribed to the 
[email protected] distribution list. To unsubscribe, send email from the 
recipient account to [email protected] with the word 
"Unsubscribe" in the subject line. 

Reply via email to