nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438953875
##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
# Frequently Asked Questions
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard. The Arrow
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types). The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format. It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework. <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types. It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
Review comment:
Why not?
##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
# Frequently Asked Questions
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard. The Arrow
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types). The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format. It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework. <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types. It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
+
+## Getting involved
-The Arrow in-memory format is considered stable, and we intend to make only
backwards-compatible changes, such as additional data types. We do not yet
recommend the Arrow file format for long-term disk persistence of data; that
said, it is perfectly acceptable to write Arrow memory to disk for purposes of
memory mapping and caching.
+### I have some questions. How can I get help?
-We encourage people to start building Arrow-based in-memory computing
applications now, and choose a suitable file format for disk storage if
necessary. The Arrow libraries include adapters for several file formats,
including Parquet, ORC, CSV, and JSON.
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl
}}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be
done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue
tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to
[[email protected]](mailto:[email protected]).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
-In short, Parquet files are designed for disk storage, while Arrow is designed
for in-memory use, but you can put it on disk and then memory-map later. Arrow
and Parquet are intended to be compatible with each other and used together in
applications.
+Parquet files are designed for disk storage, while Arrow is designed for
in-memory use,
+though you can put it on disk and then memory-map later. Arrow and Parquet are
+intended to be compatible with each other and used together in applications.
-Parquet is a columnar file format for data serialization. Reading a Parquet
file requires decompressing and decoding its contents into some kind of
in-memory data structure. It is designed to be space/IO-efficient at the
expensive CPU utilization for decoding. It does not provide any data structures
for in-memory computing. Parquet is a streaming format which must be decoded
from start-to-end; while some "index page" facilities have been added to the
storage format recently, random access operations are generally costly.
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques. It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on, and it must be decoded in
+large chunks.
-Arrow on the other hand is first and foremost a library providing columnar
data structures for *in-memory computing*. When you read a Parquet file, you
can decompress and decode the data *into* Arrow columnar data structures so
that you can then perform analytics in-memory on the decoded data. The Arrow
columnar format has some nice properties: random access is O(1) and each value
cell is next to the previous and following one in memory, so it's efficient to
iterate over.
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes. Arrow data is not compressed (or only lightly so,
Review comment:
IPC files do support lz4/zstd compression, at least in the C++
implementation
##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
# Frequently Asked Questions
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard. The Arrow
format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types). The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format. It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework. <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types. It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation
matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
Review comment:
This should probably link to the published docs page (which, of course,
doesn't exist today but will when this goes live).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]