The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project

Sally Khudairi Wed, 17 Feb 2016 11:29:21 -0800

 >> this announcement is available online at https://s.apache.org/5Mc8


Open source Big Data in-memory columnar layer accelerates analytical processing 
and interchange by more than 100x. 

Forest Hill, MD --17 Feb 2016-- The Apache Software Foundation (ASF), the 
all-volunteer developers, stewards, and incubators of more than 350 Open Source 
projects and initiatives, announced today Apache Arrow as a new Top-Level 
Project. 

A high-performance cross-system data layer for columnar in-memory analytics, 
Apache Arrow provides the following benefits for Big Data workloads: 
Accelerates the performance of analytical workloads by more than 100x in some 
cases 
Enables multi-system workloads by eliminating cross-system communication 
overhead 

Initially seeded by code from the Apache Drill project, Apache Arrow was built 
on top of a number of Open Source collaborations, and establishes a de-facto 
standard for columnar in-memory processing and interchange. 

"The Open Source community has joined forces on Apache Arrow," said Jacques 
Nadeau, Vice President of Apache Arrow and Vice President Apache Drill. 
"Developers from 13 major Open Source Big Data projects are already on board 
--by introducing a new era of columnar in-memory analytics, we anticipate the 
majority of the world's data will be processed through Arrow within the next 
few years." 

Code committers to Apache Arrow include developers from Apache Big Data 
projects Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu (incubating), 
Parquet, Phoenix, Spark, and Storm as well as established and emerging Open 
Source projects such as Pandas and Ibis. 

"Arrow's cross platform and cross system strengths will enable Python and R to 
become first-class languages across the entire Big Data stack," said Wes 
McKinney, creator of Pandas. 

Apache Arrow accelerates analytical processing by providing a high performance 
columnar in-memory representation. A number of processing algorithms benefit 
greatly from this memory design. 

"A columnar in-memory data layer enables systems and applications to process 
data at full hardware speeds," said Todd Lipcon, original Apache Kudu creator 
and member of the Apache Arrow Project Management Committee. "Modern CPUs are 
designed to exploit data-level parallelism via vectorized operations and SIMD 
instructions. Arrow facilitates such processing." 

In many workloads, 70-80% of CPU cycles are spent serializing and deserializing 
data. Arrow solves this problem by enabling data to be shared between systems 
and processes with no serialization, deserialization or memory copies. 

"An industry-standard columnar in-memory data layer enables users to combine 
multiple systems, applications and programming languages in a single workload 
without the usual overhead," said Ted Dunning, Vice President of the Apache 
Incubator and member of the Apache Arrow Project Management Committee. 

In addition to traditional relational data, Arrow supports complex data with 
dynamic schemas. For example, Arrow can handle JSON data which is commonly used 
in IoT workloads, modern applications and log files. Implementations are also 
available (or underway) for a number of programming languages including Java, 
C++ and Python to allow greater interoperability among a number of Big Data 
solutions. 
"Real world use cases often include complex combinations of structured and 
rapidly growing complex-data. Already tested with Apache Drill, the efficient 
in-memory columnar representation and processing in Arrow will enable users to 
enjoy the performance of columnar processing with the flexibility of JSON," 
said Parth Chandra, member of the Apache Drill and Apache Arrow Project 
Management Committees. 

Catch Apache Arrow in action at Strata + Hadoop World (San Jose: 30 March 2016, 
and London: 1-3 June 2016), as well as upcoming MeetUps and local events 
http://arrow.apache.org/events 

Availability and Oversight 
Apache Arrow software is released under the Apache License v2.0 and is overseen 
by a self-selected team of active contributors to the project. A Project 
Management Committee (PMC) guides the Project's day-to-day operations, 
including community development and product releases. For downloads, 
documentation, and ways to become involved with Apache Arrow, visit 
http://arrow.apache.org/ 

About The Apache Software Foundation (ASF) 
Established in 1999, the all-volunteer Foundation oversees more than 350 
leading Open Source projects, including Apache HTTP Server --the world's most 
popular Web server software. Through the ASF's meritocratic process known as 
"The Apache Way," more than 550 individual Members and 5,300 Committers 
successfully collaborate to develop freely available enterprise-grade software, 
benefiting millions of users worldwide: thousands of software solutions are 
distributed under the Apache License; and the community actively participates 
in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's 
official user conference, trainings, and expo. The ASF is a US 501(c)(3) 
charitable organization, funded by individual donations and corporate sponsors 
including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, 
Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, 
InMotion Hosting, iSigma, LeaseWeb, Matt Mullenweg, Microsoft, PhoenixNAP, 
Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, 
WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or 
follow @TheASF on Twitter. 

© The Apache Software Foundation. "Apache", "Apache Arrow", "Arrow", "Apache 
Calcite", "Calcite", "Apache Cassandra", "Cassandra", "Apache Drill", "Drill", 
"Apache Hadoop", "Hadoop", "Apache HBase", "HBase", "Apache Impala", "Impala", 
"Apache Kudu (incubating)", "Kudu (incubating)", "Apache Parquet", "Parquet", 
"Apache Phoenix", "Phoenix", "Apache Spark", "Spark", "Apache Storm", "Storm", 
"ApacheCon", and their logos are registered trademarks or trademarks of The 
Apache Software Foundation in the U.S. and/or other countries. All other brands 
and trademarks are the property of their respective owners. 

# # # 


NOTE: you are receiving this message because you are subscribed to the 
announce@apache.org distribution list. To unsubscribe, send email from the 
recipient account to announce-unsubscr...@apache.org with the word 
"Unsubscribe" in the subject line.

The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project

Reply via email to