[GitHub] [arrow-site] wesm commented on a change in pull request #63: Revamp website for 1.0 release

GitBox Wed, 24 Jun 2020 13:46:24 -0700


wesm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445134010




##########
File path: _includes/header.html
##########
@@ -50,22 +33,44 @@
           </a>
           <div class="dropdown-menu" 
aria-labelledby="navbarDropdownDocumentation">
             <a class="dropdown-item" href="{{ site.baseurl }}/docs">Project 
Docs</a>
-            <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/python">Python</a>
+            <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/format/Columnar.html">Specification</a>
+            <hr/>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C 
GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/cpp">C++</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>
+            <a class="dropdown-item" 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>
             <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/java">Java</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C 
GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/js">JavaScript</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>
+            <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/python">Python</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/r">R</a>
+            <a class="dropdown-item" 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>
+            <a class="dropdown-item" 
href="https://docs.rs/crate/arrow/";>Rust</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" href="{{ site.baseurl 
}}/community/">Mailing Lists</a>

Review comment:
       "Communications"?

##########
File path: _includes/header.html
##########
@@ -50,22 +33,44 @@
           </a>
           <div class="dropdown-menu" 
aria-labelledby="navbarDropdownDocumentation">
             <a class="dropdown-item" href="{{ site.baseurl }}/docs">Project 
Docs</a>
-            <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/python">Python</a>
+            <a class="dropdown-item" href="{{ site.baseurl 
}}/docs/format/Columnar.html">Specification</a>

Review comment:
       "Columnar Format"?

##########
File path: community.md
##########
@@ -0,0 +1,73 @@
+---
+layout: default
+title: Apache Arrow Community
+description: Links and resources for participating in Apache Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Apache Arrow Community
+
+We welcome participation from everyone and encourage you to join us, ask 
questions, and get involved.
+
+All participation in the Apache Arrow project is governed by the Apache 
Software Foundation's [code of 
conduct](https://www.apache.org/foundation/policies/conduct.html).
+
+## Questions?
+
+### Mailing lists
+
+These arrow.apache.org mailing lists are for project discussion:
+
+<ul>
+  <li> <code>user@</code> is for questions on using Apache Arrow libraries {% 
include mailing_list_links.html list="user" %} </li>
+  <li> <code>dev@</code> is for discussions about contributing to the project 
development {% include mailing_list_links.html list="dev" %} </li>
+</ul>
+
+When emailing one of the lists, you may want to prefix the subject line with 
one or more tags, like `[C++] why did this segfault?`, `[Python] trouble with 
wheels`, etc., so that the appropriate people in the community notice the 
message.
+
+You may also wish to subscript to these lists, which capture some activity 
streams:

Review comment:
       subscribe

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.

Review comment:
       Let's link to the versioning backward/forward compatibility guarantees 
in the docs

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.

Review comment:
       Here's a reframing -- I have been encouraging us to move away from 
creating a false equivalence between "Apache Arrow The Project" and the "Arrow 
Columnar Format". So anyplace where someone might say "Arrow _is_ the columnar 
format" we should correct them to say that "Arrow _contains_ a columnar 
format". Please edit / wordsmith as desired
   
   Apache Arrow is a software development platform for building high 
performance applications that process and transport large data sets. It is 
designed to both improve the performance of analytical algorithms and the 
efficiency of moving data from one system (or programming language to another). 
   
   A critical component of Apache Arrow is its **in-memory columnar format**, a 
standardized language-agnostic data structure specification for representing 
structured, table-like datasets in-memory. This data format has a rich data 
type system (included nested and user-defined data types) designed to support 
the needs of analytic database systems, data frame libraries, and more. The 
project contains many implementation of the Arrow columnar format along with 
utilities for reading and writing it to many common storage formats. 
   
   We do not anticipate that many third-party projects will choose to implement 
the Arrow columnar format themselves, instead choosing to depend on one of the 
official libraries. For projects that want to implement a small subset of the 
format, we have created some tools (like a C data interface) to assist with 
interoperability with the official Arrow libraries.
   
   The Arrow libraries contain many software components that assist with 
systems problems related to getting data in and out of remote storage systems 
and moving Arrow-formatted data over network interfaces. Some of these 
components can be used even in scenarios where the columnar format is not used 
at all. 
   
   Lastly, alongside software that helps with data access and IO-related 
issues, there are libraries of algorithms for performing analytical operations 
or queries against Arrow datasets.

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.

Review comment:
       I think this para can be removed as of 1.0.0

##########
File path: _layouts/home.html
##########
@@ -0,0 +1,21 @@
+{% include top.html %}
+
+<body class="wrap">
+  <header>
+    {% include header.html %}
+  </header>
+  <div class="big-arrow-bg">
+    <div class="container p-lg-4 centered">
+      <img src="{{ site.baseurl }}/img/arrow-inverse.png" style="max-width: 
80%;"/>

Review comment:
       Smaller also better imho

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.

Review comment:
       I don't think we need to hedge regarding people storage Arrow data on 
disk starting with 1.0.0. We should state explicitly here however that we don't 
intend for Arrow to be replacement for Parquet (an exceedingly common question) 
and where relevant the columnar format makes trade-offs to support the 
performance requirements of in-memory analytics over purely file storage 
considerations. Parquet is not a "runtime in-memory format" and file formats 
almost always have to be deserialized into some in-memory data structure for 
processing, and we intend for Arrow to be that in-memory data structure

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading

Review comment:
       "expensive" is in the eye of the beholder. How about "requires 
efficient, but relatively complex decoding"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.

Review comment:
       "We are not yet making this assertion about long-term stability of the 
Arrow format."
   
   --> "While the Arrow on-disk format is stable and will be readable by future 
versions of the libraries, it is not intended for long-term archival storage."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.

Review comment:
       Instead of "just a matter of transferring raw bytes from the storage 
hardware." how about the more precise statement "reading Arrow IPC files does 
not involve any decoding because the on-disk representation is the same as the 
in-memory representation."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->

Review comment:
       perhaps merge this with some of the thoughts above

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file 
today, we expect that any system that says they can "read Parquet" will be able 
to read the file in 5 years or 7 years. We are not yet making this assertion 
about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded 
into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of 
the data encoding schemes that Parquet uses. If your disk storage or network is 
slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or 
caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.
 
 ### How does Arrow relate to Flatbuffers?
 
-Flatbuffers is a domain-agnostic low-level building block for binary data 
formats. It cannot be used directly for data analysis tasks without a lot of 
manual scaffolding. Arrow is a data layer aimed directly at the needs of data 
analysis, providing elaborate data types (including extensible logical types), 
built-in support for "null" values (a.k.a "N/A"), and an expanding toolbox of 
I/O and computing facilities.
+Flatbuffers is a low-level building block for binary data serialization.
+It is not adapted to the representation of large, structured, homogenous
+data, and does not sit at the right abstraction layer for data analysis tasks.
+
+Arrow is a data layer aimed directly at the needs of data analysis, providing
+elaborate data types (including extensible logical types), built-in support

Review comment:
       Use a more neutral word than "elaborate". How about, "providing a 
comprehensive collection of data types required to analytics" or something 
similar

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory 
data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" 
href="mailto:[email protected]"; role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" 
href="{{ site.baseurl }}/install/" role="button">Install 
({{site.data.versions['current'].number}} Release - 
{{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";><strong>mailing 
list</strong></a> or check out the <a 
href="https://cwiki.apache.org/confluence/display/ARROW";><strong>developer 
wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of 
the latest SIMD (Single instruction, multiple data) operations included in 
modern processors, for native vectorized optimization of analytical data 
processing. Columnar layout is optimized for data locality for better 
performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for 
lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format 
for flat and hierarchical data, organized for efficient analytic operations on 
modern hardware like CPUs and GPUs. The Arrow memory format also supports 
<strong>zero-copy reads</strong> for lightning-fast data access without 
serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the 
design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the 
specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various 
systems. It is also focused on supporting a wide variety of industry-standard 
programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, 
Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory 
specification in many languages. They enable you to use the Arrow format as an 
efficient means of sharing data across languages and processes. Libraries are 
available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ 
site.baseurl }}/docs/cpp/">C++</a>, <a 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>, <a 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>, <a href="{{ 
site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl 
}}/docs/js/">JavaScript</a>, <a 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>, 
<a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl 
}}/docs/r/">R</a>, <a 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>, and 
<a href="https://docs.rs/crate/arrow/";>Rust</a>.

Review comment:
       Arrow's libraries provide building blocks for creating high performance 
analytics applications. The libraries implement the Arrow columnar format and 
address a wide spectrum of problems related to data access, in-memory data 
management, and analytical query processing. 

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory 
data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" 
href="mailto:[email protected]"; role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" 
href="{{ site.baseurl }}/install/" role="button">Install 
({{site.data.versions['current'].number}} Release - 
{{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";><strong>mailing 
list</strong></a> or check out the <a 
href="https://cwiki.apache.org/confluence/display/ARROW";><strong>developer 
wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of 
the latest SIMD (Single instruction, multiple data) operations included in 
modern processors, for native vectorized optimization of analytical data 
processing. Columnar layout is optimized for data locality for better 
performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for 
lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format 
for flat and hierarchical data, organized for efficient analytic operations on 
modern hardware like CPUs and GPUs. The Arrow memory format also supports 
<strong>zero-copy reads</strong> for lightning-fast data access without 
serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the 
design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the 
specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various 
systems. It is also focused on supporting a wide variety of industry-standard 
programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, 
Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory 
specification in many languages. They enable you to use the Arrow format as an 
efficient means of sharing data across languages and processes. Libraries are 
available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ 
site.baseurl }}/docs/cpp/">C++</a>, <a 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>, <a 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>, <a href="{{ 
site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl 
}}/docs/js/">JavaScript</a>, <a 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>, 
<a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl 
}}/docs/r/">R</a>, <a 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>, and 
<a href="https://docs.rs/crate/arrow/";>Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a 
href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source 
projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, 
Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto 
standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl 
}}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast 
analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular 
projects</a> use Arrow to ship columnar data efficiently or as the basis for 
analytic engines.
+      <p>The libraries also include built-in features for working with data 
directly, including Parquet file reading and querying large datasets. See more 
Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>
   </div>
 </div>
-<hr />
+
+<h1>Why Arrow?</h1>

Review comment:
       "Why use the Arrow Columnar Format?" 

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory 
data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" 
href="mailto:[email protected]"; role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" 
href="{{ site.baseurl }}/install/" role="button">Install 
({{site.data.versions['current'].number}} Release - 
{{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";><strong>mailing 
list</strong></a> or check out the <a 
href="https://cwiki.apache.org/confluence/display/ARROW";><strong>developer 
wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of 
the latest SIMD (Single instruction, multiple data) operations included in 
modern processors, for native vectorized optimization of analytical data 
processing. Columnar layout is optimized for data locality for better 
performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for 
lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format 
for flat and hierarchical data, organized for efficient analytic operations on 
modern hardware like CPUs and GPUs. The Arrow memory format also supports 
<strong>zero-copy reads</strong> for lightning-fast data access without 
serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the 
design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the 
specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various 
systems. It is also focused on supporting a wide variety of industry-standard 
programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, 
Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory 
specification in many languages. They enable you to use the Arrow format as an 
efficient means of sharing data across languages and processes. Libraries are 
available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ 
site.baseurl }}/docs/cpp/">C++</a>, <a 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>, <a 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>, <a href="{{ 
site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl 
}}/docs/js/">JavaScript</a>, <a 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>, 
<a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl 
}}/docs/r/">R</a>, <a 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>, and 
<a href="https://docs.rs/crate/arrow/";>Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a 
href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source 
projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, 
Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto 
standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl 
}}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>

Review comment:
       Ecosystem?

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       +1

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->

Review comment:
       Traditionally, data processing engine developers have created custom 
data structures to represent datasets in-memory while they are being processed. 
Given the "custom" nature of these data structures, they must also develop 
serialization interfaces to convert between these data structures and different 
file formats, network wire protocols, database clients, and other data 
transport interface. The net result of this is an incredible amount of waste 
both in developer time and in CPU cycles spend serializing data from one format 
to another.
   
   Therefore, the rationale for Arrow's in-memory columnar data format is to 
provide an out-of-the-box solution to several interrelated problems:
   
   * A general purpose tabular data representation that is highly efficient to 
process on modern hardware while also being suitable for a wide spectrum of use 
cases. We believe that fewer and fewer systems will create their own data 
structures and simply use Arrow.
   * Supports both random access and streaming / scan-based workloads.
   * A standardized memory format facilitates reuse of libraries of algorithms. 
When custom in-memory data formats are used, common algorithms must often be 
rewritten to target those custom data formats.
   * Systems that both use or support Arrow can transfer data between them at 
little-to-no cost. This results in a radical reduction in the amount of 
serialization overhead in analytical workloads that can often represent 80-90% 
of computing costs. 
   * The language-agnostic design of the Arrow format enables systems written 
in different programming languages (even running on the JVM) to communicate 
datasets without serialization overhead. For example, a Java application can 
call a C or C++ algorithm on data that originated in the JVM.  
   
   ... probably some other stuff can be added here

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?

Review comment:
       "Apache Arrow"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file 
today, we expect that any system that says they can "read Parquet" will be able 
to read the file in 5 years or 7 years. We are not yet making this assertion 
about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded 
into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of 
the data encoding schemes that Parquet uses. If your disk storage or network is 
slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or 
caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.
 
 ### How does Arrow relate to Flatbuffers?
 
-Flatbuffers is a domain-agnostic low-level building block for binary data 
formats. It cannot be used directly for data analysis tasks without a lot of 
manual scaffolding. Arrow is a data layer aimed directly at the needs of data 
analysis, providing elaborate data types (including extensible logical types), 
built-in support for "null" values (a.k.a "N/A"), and an expanding toolbox of 
I/O and computing facilities.
+Flatbuffers is a low-level building block for binary data serialization.
+It is not adapted to the representation of large, structured, homogenous
+data, and does not sit at the right abstraction layer for data analysis tasks.
+
+Arrow is a data layer aimed directly at the needs of data analysis, providing
+elaborate data types (including extensible logical types), built-in support
+for "null" values (representing missing data), and an expanding toolbox of I/O
+and computing facilities.
 
-The Arrow file format does use Flatbuffers under the hood to facilitate 
low-level metadata serialization. However, Arrow data has much richer semantics 
than Flatbuffers data.
+The Arrow file format does use Flatbuffers under the hood to facilitate 
low-level
+metadata serialization, but the Arrow data format uses its own representation

Review comment:
       maybe "to serialize schemas and other metadata needed to implement the 
Arrow binary IPC protocol"

##########
File path: getting_started.md
##########
@@ -0,0 +1,74 @@
+---
+layout: default
+title: Getting started
+description: Links to user guides to help you start using Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Getting started
+
+This page collects resources and guides for using Arrow in all of the 
project's languages.
+For reference on official release packages, see the
+[install page]({{ site.baseurl }}/install/).
+
+## C
+
+Glib

Review comment:
       TODO

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory 
data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" 
href="mailto:[email protected]"; role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" 
href="{{ site.baseurl }}/install/" role="button">Install 
({{site.data.versions['current'].number}} Release - 
{{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a 
href="http://mail-archives.apache.org/mod_mbox/arrow-dev/";><strong>mailing 
list</strong></a> or check out the <a 
href="https://cwiki.apache.org/confluence/display/ARROW";><strong>developer 
wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of 
the latest SIMD (Single instruction, multiple data) operations included in 
modern processors, for native vectorized optimization of analytical data 
processing. Columnar layout is optimized for data locality for better 
performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for 
lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format 
for flat and hierarchical data, organized for efficient analytic operations on 
modern hardware like CPUs and GPUs. The Arrow memory format also supports 
<strong>zero-copy reads</strong> for lightning-fast data access without 
serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the 
design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the 
specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various 
systems. It is also focused on supporting a wide variety of industry-standard 
programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, 
Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory 
specification in many languages. They enable you to use the Arrow format as an 
efficient means of sharing data across languages and processes. Libraries are 
available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ 
site.baseurl }}/docs/cpp/">C++</a>, <a 
href="https://github.com/apache/arrow/blob/master/csharp/README.md";>C#</a>, <a 
href="https://godoc.org/github.com/apache/arrow/go/arrow";>Go</a>, <a href="{{ 
site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl 
}}/docs/js/">JavaScript</a>, <a 
href="https://github.com/apache/arrow/blob/master/matlab/README.md";>MATLAB</a>, 
<a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl 
}}/docs/r/">R</a>, <a 
href="https://github.com/apache/arrow/blob/master/ruby/README.md";>Ruby</a>, and 
<a href="https://docs.rs/crate/arrow/";>Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a 
href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source 
projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, 
Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto 
standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl 
}}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast 
analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular 
projects</a> use Arrow to ship columnar data efficiently or as the basis for 
analytic engines.
+      <p>The libraries also include built-in features for working with data 
directly, including Parquet file reading and querying large datasets. See more 
Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>

Review comment:
       I would say to condense the 2nd and 3rd points here and change this 3rd 
one to be about the ecosystem/community

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file 
today, we expect that any system that says they can "read Parquet" will be able 
to read the file in 5 years or 7 years. We are not yet making this assertion 
about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded 
into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of 
the data encoding schemes that Parquet uses. If your disk storage or network is 
slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or 
caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.

Review comment:
       "started as a separate specification" -> "was a simplified custom 
container for writing a subset of the Arrow format to disk prior to the 
development of the Arrow IPC file format. "Feather version 2" is now exactly 
the Arrow IPC file format and we have retained the "Feather" name and APIs for 
backwards compatibility."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation 
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only 
backwards-compatible changes, such as additional data types. We do not yet 
recommend the Arrow file format for long-term disk persistence of data; that 
said, it is perfectly acceptable to write Arrow memory to disk for purposes of 
memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing 
applications now, and choose a suitable file format for disk storage if 
necessary. The Arrow libraries include adapters for several file formats, 
including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl 
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be 
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue 
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to 
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed 
for in-memory use, but you can put it on disk and then memory-map later. Arrow 
and Parquet are intended to be compatible with each other and used together in 
applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet 
file requires decompressing and decoding its contents into some kind of 
in-memory data structure. It is designed to be space/IO-efficient at the 
expensive CPU utilization for decoding. It does not provide any data structures 
for in-memory computing. Parquet is a streaming format which must be decoded 
from start-to-end; while some "index page" facilities have been added to the 
storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file 
format".
 
-Arrow on the other hand is first and foremost a library providing columnar 
data structures for *in-memory computing*. When you read a Parquet file, you 
can decompress and decode the data *into* Arrow columnar data structures so 
that you can then perform analytics in-memory on the decoded data. The Arrow 
columnar format has some nice properties: random access is O(1) and each value 
cell is next to the previous and following one in memory, so it's efficient to 
iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" 
protocol for arranging a collection of Arrow columnar arrays (called a "record 
batch") that can be used for messaging and interprocess communication. You can 
put the protocol anywhere, including on disk, which can later be memory-mapped 
or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data 
without doing any deserialization, so performing analytics on Arrow protocol 
data on disk can use memory-mapping and pay effectively zero cost. The protocol 
is used for many other things as well, such as streaming data between Spark SQL 
and Python for running pandas functions against chunks of Spark SQL data (these 
are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for 
on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file 
today, we expect that any system that says they can "read Parquet" will be able 
to read the file in 5 years or 7 years. We are not yet making this assertion 
about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded 
into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of 
the data encoding schemes that Parquet uses. If your disk storage or network is 
slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network

Review comment:
       "elaborate" seems a bit emotionally charged to me, let's use something 
more neutral and precise
   
   "elaborate encoding schemes" -> "columnar data compression strategies"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only

Review comment:
       Maybe "columnar format and protocol"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow 
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?

Review comment:
       "Why define a standard for columnar in-memory?"
   
   There can't be a new standard if there isn't an old one. There never was




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-site] wesm commented on a change in pull request #63: Revamp website for 1.0 release

Reply via email to