This is an automated email from the ASF dual-hosted git repository.
ianmcook pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-experiments.git
The following commit(s) were added to refs/heads/main by this push:
new dfd45f9 Improve READMEs and organization (#46)
dfd45f9 is described below
commit dfd45f94a37fba0bb02b9bea6094291405f429ea
Author: Ian Cook <[email protected]>
AuthorDate: Fri Jan 24 06:55:48 2025 -0500
Improve READMEs and organization (#46)
---
README.md | 6 ++++++
data/README.md | 12 ++++++------
http/README.md | 15 ++++++++++++++-
http/get_simple/README.md | 2 +-
http/get_simple/matlab/{ => client}/README.md | 0
http/get_simple/matlab/{ => client}/client.m | 0
6 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index 881a188..3ee88da 100644
--- a/README.md
+++ b/README.md
@@ -20,3 +20,9 @@
# Apache Arrow Experiments
This repository is for collaborative prototyping and research in the Apache
Arrow project.
+
+| Directory | Contents |
+| --------- | -------- |
+| **[data](./data)** | Various datasets that are used by the experiments in
this repository or intended to be used in future Arrow experiments |
+| **[dissociated-ipc](./dissociated-ipc)** | Reference example implementation
of the experimental [Arrow Dissociated IPC
Protocol](https://arrow.apache.org/docs/dev/format/DissociatedIPC.html) |
+| **[http](./http)** | Examples demonstrating ways of sending and receiving
data in Arrow IPC stream format (IANA media type
`application/vnd.apache.arrow.stream`) over HTTP APIs |
diff --git a/data/README.md b/data/README.md
index f598c4a..3b6229b 100644
--- a/data/README.md
+++ b/data/README.md
@@ -19,13 +19,13 @@
# Apache Arrow Data Experiments
-This subdirectory contains experimental Arrow data whose purpose has not
-yet become clear but may be useful in the future. This currently includes
-data used to generate compelling examples that is more realistic than
-generated data or the testing data found in
+This directory contains various datasets that are used by the experiments
+in this repository or intended to be used in future Arrow experiments.
+This currently includes data used to generate compelling examples that is
+more realistic than generated data or the testing data found in
[apache/arrow-testing](http://github.com/apache/arrow-testing). This
-subdirectory is intended as a semi-temporary staging area: eventually,
-data here should find a permanent home elsewhere or be removed.
+directory is intended as a semi-temporary staging area; eventually, much
+of the data here should find a permanent home elsewhere.
> [!IMPORTANT]
> Please install and use [Git LFS](https://git-lfs.com) when contributing to
> this subdirectory. Add any new large file extensions to
> [`.gitattributes`](https://github.com/apache/arrow-experiments/blob/main/.gitattributes).
diff --git a/http/README.md b/http/README.md
index 164c54e..2e8231b 100644
--- a/http/README.md
+++ b/http/README.md
@@ -19,7 +19,20 @@
# Apache Arrow HTTP Data Transport
-This area of the Apache Arrow Experiments repository is for collaborative
prototyping and research on the subject of sending and receiving
Arrow-formatted data over HTTP APIs.
+This area of the Apache Arrow Experiments repository is for collaborative
prototyping and research on the subject of sending and receiving data in Arrow
IPC stream format (IANA media type `application/vnd.apache.arrow.stream`) over
HTTP APIs.
+
+The subdirectories beginning with **get** demonstrate clients receiving data
from servers (HTTP GET request). Those beginning with **post** demonstrate
clients sending data to servers (HTTP POST request).
+
+| Subdirectory | Purpose |
+| ------------ | ------- |
+| **[get_compressed](get_compressed)** | Demonstrates various ways of using
compression when sending and receiving Arrow IPC stream data over HTTP |
+| **[get_indirect](get_indirect)** | Demonstrates a two-step sequence for
fetching Arrow data from a server, in which a JSON document provides the URIs
for the Arrow data |
+| **[get_multipart](get_multipart)** | Demonstrates how to send and receive a
multipart HTTP response body (`multipart/mixed`) containing Arrow IPC stream
data and other data |
+| **[get_range](get_range)** | Demonstrates how to use HTTP range requests to
download Arrow IPC stream data of known length in multiple requests |
+| **[get_simple](get_simple)** | Contains a large set of examples
demonstrating the basics of fetching an Arrow IPC stream from a server to a
client in 12+ languages |
+| **[post_multipart](post_multipart)** | Demonstrates how to send and receive
a multipart HTTP request body (`multipart/form-data`) containing Arrow IPC
stream data and other data |
+| **[post_simple](post_simple)** | Demonstrates the basics of sending Arrow
IPC stream data from a client to a server |
+
The intent of this work is to:
- Ensure excellent interoperability across languages.
diff --git a/http/get_simple/README.md b/http/get_simple/README.md
index 5f9c552..e6be795 100644
--- a/http/get_simple/README.md
+++ b/http/get_simple/README.md
@@ -25,7 +25,7 @@ This directory contains a set of minimal examples of HTTP
clients and servers im
The examples here assume that the server cannot determine the exact length in
bytes of the full Arrow IPC stream before sending it, so they cannot set the
`Content-Length` header or serve Range requests.
-The client examples here assume that the client needs to hold the full
received data in memory in an Arrow data structure for further in-memory
processing. (The case in which the client simply writes the result directly to
a file is much simpler and can be achieved trivially by using
[curl](https://curl.se) or similar.)
+Most of the client examples here assume that the client needs to hold the full
received data in memory in an Arrow data structure for further in-memory
processing. The case in which the client simply writes the result directly to a
file is much simpler and is demonstrated by the [curl client
example](curl/client).
To enable performance comparisons to Arrow Flight RPC, the server examples
generate the data in exactly the same way as in
[`flight_benchmark.cc`](https://github.com/apache/arrow/blob/7346bdffbdca36492089f6160534bfa2b81bad90/cpp/src/arrow/flight/flight_benchmark.cc#L194-L245)
as cited in the [original blog post introducing Flight
RPC](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/). But
note that Flight example sends four concurrent streams.
diff --git a/http/get_simple/matlab/README.md
b/http/get_simple/matlab/client/README.md
similarity index 100%
rename from http/get_simple/matlab/README.md
rename to http/get_simple/matlab/client/README.md
diff --git a/http/get_simple/matlab/client.m
b/http/get_simple/matlab/client/client.m
similarity index 100%
rename from http/get_simple/matlab/client.m
rename to http/get_simple/matlab/client/client.m