This is an automated email from the ASF dual-hosted git repository.
ianmcook pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-experiments.git
The following commit(s) were added to refs/heads/main by this push:
new 267f5a7 Add simple HTTP GET examples (#1)
267f5a7 is described below
commit 267f5a7e2610258bb1ffaef8ff759e4cff692a12
Author: Ian Cook <[email protected]>
AuthorDate: Mon Mar 4 14:26:14 2024 -0500
Add simple HTTP GET examples (#1)
* Add Python server example
Co-authored-by: Dewey Dunnington <[email protected]>
* Add Python client example
* Add Go server example
Co-authored-by: Matt Topol <[email protected]>
* Add Go client example
Co-authored-by: Matt Topol <[email protected]>
* Add Java client example
* Add C++ client example
Co-authored-by: Sutou Kouhei <[email protected]>
* Add JavaScript client example
Co-authored-by: Dominik Moritz <[email protected]>
* Add R client example
* Add READMEs
---------
Co-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Matt Topol <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Dominik Moritz <[email protected]>
---
http/README.md | 32 +++++++
http/get_simple/README.md | 33 +++++++
http/get_simple/cpp/client/README.md | 34 +++++++
http/get_simple/cpp/client/client.cpp | 57 ++++++++++++
http/get_simple/go/client/README.md | 34 +++++++
http/get_simple/go/client/client.go | 52 +++++++++++
http/get_simple/go/server/README.md | 34 +++++++
http/get_simple/go/server/server.go | 103 +++++++++++++++++++++
http/get_simple/java/client/README.md | 33 +++++++
http/get_simple/java/client/pom.xml | 38 ++++++++
.../src/main/java/com/example/ArrowHttpClient.java | 53 +++++++++++
http/get_simple/js/client/README.md | 32 +++++++
http/get_simple/js/client/client.js | 15 +++
http/get_simple/python/client/README.md | 32 +++++++
http/get_simple/python/client/client.py | 30 ++++++
http/get_simple/python/server/README.md | 32 +++++++
http/get_simple/python/server/server.py | 92 ++++++++++++++++++
http/get_simple/r/client/README.md | 32 +++++++
http/get_simple/r/client/client.R | 24 +++++
19 files changed, 792 insertions(+)
diff --git a/http/README.md b/http/README.md
new file mode 100644
index 0000000..63ff4f1
--- /dev/null
+++ b/http/README.md
@@ -0,0 +1,32 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Apache Arrow HTTP Data Transport
+
+This area of the Apache Arrow Experiments repository is for collaborative
prototyping and research on the subject of sending and receiving
Arrow-formatted data over HTTP APIs.
+
+The intent of this work is to:
+- Ensure excellent interoperability across languages.
+- Allow implementation within existing HTTP APIs.
+- Maximize performance.
+- Minimize implementation complexity.
+
+The end goal of this work is to inform and guide the creation of a set of
conventions to be published in the Arrow documentation.
+
+See the [related discussion on the Arrow developer mailing
list](https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf).
diff --git a/http/get_simple/README.md b/http/get_simple/README.md
new file mode 100644
index 0000000..8ae1193
--- /dev/null
+++ b/http/get_simple/README.md
@@ -0,0 +1,33 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Examples
+
+This directory contains a set of minimal examples of HTTP clients and servers
implemented in several languages. These examples demonstrate:
+- How a client can send a GET request to a server and receive a response from
the server containing an Arrow IPC stream of record batches.
+- How a server can respond to a GET request from a client and send the client
a response containing an Arrow IPC stream of record batches.
+
+To enable performance comparisons to Arrow Flight RPC, the server examples
generate the data in exactly the same way as in
[`flight_benchmark.cc`](https://github.com/apache/arrow/blob/7346bdffbdca36492089f6160534bfa2b81bad90/cpp/src/arrow/flight/flight_benchmark.cc#L194-L245)
as cited in the [original blog post introducing Flight
RPC](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/). But
note that Flight example sends four concurrent streams.
+
+If you are collaborating on the set of examples in this directory, please
follow these guidelines:
+- Each new example must be implemented as minimally as possible. For example,
error handling should be minimized or omitted.
+- Each new client example must be tested to ensure that it works with each
existing server example.
+- Each new server example must be tested to ensure that it works with each
existing client example.
+- To the greatest extent possible, each new server example should be
functionally equivalent to each existing server example (generating equivalent
data with the same schema, size, shape, and distribution of values; sending the
same HTTP headers; and so on).
+- Each new client example must print timing and size information before
exiting. At a minimum this must include the number of seconds elapsed (rounded
to the second decimal place) and the number of record batches received.
diff --git a/http/get_simple/cpp/client/README.md
b/http/get_simple/cpp/client/README.md
new file mode 100644
index 0000000..a7c993c
--- /dev/null
+++ b/http/get_simple/cpp/client/README.md
@@ -0,0 +1,34 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple C++ Client Example
+
+This directory contains a minimal example of an HTTP client implemented in
C++. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Collects the record batches as they are received.
+
+To run this example, first start one of the server examples in the parent
directory. Then install the `arrow` and `libcurl` C++ libraries, compile
`client.cpp`, and run the executable. For example, using `clang++`:
+
+```sh
+clang++ client.cpp -std=c++17 $(pkg-config --cflags --libs arrow libcurl) -o
client
+./client
+```
+
+This example requires version 15.0.0 or higher of the Arrow C++ library.
diff --git a/http/get_simple/cpp/client/client.cpp
b/http/get_simple/cpp/client/client.cpp
new file mode 100644
index 0000000..4074b7a
--- /dev/null
+++ b/http/get_simple/cpp/client/client.cpp
@@ -0,0 +1,57 @@
+#include <curl/curl.h>
+#include <arrow/api.h>
+#include <arrow/io/api.h>
+#include <arrow/ipc/api.h>
+#include <chrono>
+
+static size_t
+WriteFunction(void *contents, size_t size, size_t nmemb, void *userp)
+{
+ size_t real_size = size * nmemb;
+ auto decoder = static_cast<arrow::ipc::StreamDecoder*>(userp);
+ if (decoder->Consume(static_cast<const uint8_t*>(contents), real_size).ok())
{
+ return real_size;
+ } else {
+ return 0;
+ }
+}
+
+int main(void)
+{
+ std::string url = "http://localhost:8000";
+
+ CURL *curl_handle;
+ CURLcode res;
+
+ // We use arrow::ipc::CollectListner() here for simplicity,
+ // but another option is to process decoded record batches
+ // as a stream by overriding arrow::ipc::Listener().
+ auto collect_listener = std::make_shared<arrow::ipc::CollectListener>();
+ arrow::ipc::StreamDecoder decoder(collect_listener);
+
+ curl_global_init(CURL_GLOBAL_ALL);
+ curl_handle = curl_easy_init();
+
+ curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
+ curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteFunction);
+ curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &decoder);
+
+ auto start_time = std::chrono::steady_clock::now();
+
+ res = curl_easy_perform(curl_handle);
+
+ printf("%lld record batches received\n",
collect_listener->num_record_batches());
+
+ auto end_time = std::chrono::steady_clock::now();
+
+ auto time_duration =
std::chrono::duration_cast<std::chrono::duration<double>>(end_time -
start_time);
+ printf("%.2f seconds elapsed\n", time_duration.count());
+
+ curl_easy_cleanup(curl_handle);
+ curl_global_cleanup();
+
+ std::vector<std::shared_ptr<arrow::RecordBatch>> record_batches;
+ record_batches = collect_listener->record_batches();
+
+ return 0;
+}
diff --git a/http/get_simple/go/client/README.md
b/http/get_simple/go/client/README.md
new file mode 100644
index 0000000..ad82567
--- /dev/null
+++ b/http/get_simple/go/client/README.md
@@ -0,0 +1,34 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Go Client Example
+
+This directory contains a minimal example of an HTTP client implemented in Go.
The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a slice as they are received.
+
+To run this example, first start one of the server examples in the parent
directory, then:
+
+```sh
+go mod init client
+go mod tidy
+go build client.go
+./client
+```
diff --git a/http/get_simple/go/client/client.go
b/http/get_simple/go/client/client.go
new file mode 100644
index 0000000..a123952
--- /dev/null
+++ b/http/get_simple/go/client/client.go
@@ -0,0 +1,52 @@
+package main
+
+import (
+ "fmt"
+ "net/http"
+ "time"
+
+ "github.com/apache/arrow/go/v15/arrow"
+ "github.com/apache/arrow/go/v15/arrow/ipc"
+ "github.com/apache/arrow/go/v15/arrow/memory"
+)
+
+func main() {
+ start := time.Now()
+ resp, err := http.Get("http://localhost:8000")
+ if err != nil {
+ panic(err)
+ }
+
+ if resp.StatusCode != http.StatusOK {
+ panic(fmt.Errorf("got non-200 status: %d", resp.StatusCode))
+ }
+ defer resp.Body.Close()
+
+ rdr, err := ipc.NewReader(resp.Body,
ipc.WithAllocator(memory.DefaultAllocator))
+ if err != nil {
+ panic(err)
+ }
+ defer rdr.Release()
+
+ batches := make([]arrow.Record, 0)
+ defer func() {
+ for _, b := range batches {
+ b.Release()
+ }
+ }()
+
+ for rdr.Next() {
+ rec := rdr.Record()
+ rec.Retain()
+ batches = append(batches, rec)
+ }
+
+ if rdr.Err() != nil {
+ panic(rdr.Err())
+ }
+
+ execTime := time.Since(start)
+
+ fmt.Printf("%d record batches received\n", len(batches))
+ fmt.Printf("%.2f seconds elapsed\n", execTime.Seconds())
+}
diff --git a/http/get_simple/go/server/README.md
b/http/get_simple/go/server/README.md
new file mode 100644
index 0000000..cbd43be
--- /dev/null
+++ b/http/get_simple/go/server/README.md
@@ -0,0 +1,34 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Go Server Example
+
+This directory contains a minimal example of an HTTP server implemented in Go.
The server:
+1. Creates a slice of record batches and populates it with synthesized data.
+2. Listens for HTTP GET requests from clients.
+3. Upon receiving a request, sends an HTTP 200 response with the body
containing an Arrow IPC stream of record batches.
+
+To run this example:
+
+```sh
+go mod init server
+go mod tidy
+go build server.go
+./server
+```
diff --git a/http/get_simple/go/server/server.go
b/http/get_simple/go/server/server.go
new file mode 100644
index 0000000..a4e6971
--- /dev/null
+++ b/http/get_simple/go/server/server.go
@@ -0,0 +1,103 @@
+package main
+
+import (
+ "fmt"
+ "log"
+ "math/rand"
+ "net/http"
+
+ "github.com/apache/arrow/go/v15/arrow"
+ "github.com/apache/arrow/go/v15/arrow/array"
+ "github.com/apache/arrow/go/v15/arrow/ipc"
+ "github.com/apache/arrow/go/v15/arrow/memory"
+)
+
+var schema = arrow.NewSchema([]arrow.Field{
+ {Name: "a", Type: arrow.PrimitiveTypes.Int64},
+ {Name: "b", Type: arrow.PrimitiveTypes.Int64},
+ {Name: "c", Type: arrow.PrimitiveTypes.Int64},
+ {Name: "d", Type: arrow.PrimitiveTypes.Int64},
+}, nil)
+
+func GetPutData() []arrow.Record {
+ const (
+ totalRecords = 100000000
+ length = 4096
+ ncolumns = 4
+ seed = 42
+ )
+
+ var (
+ r = rand.New(rand.NewSource(seed))
+ mem = memory.DefaultAllocator
+ arrs = make([]arrow.Array, 0, ncolumns)
+ )
+ for i := 0; i < ncolumns; i++ {
+ buf := memory.NewResizableBuffer(mem)
+ buf.Resize(length * 8)
+ _, err := r.Read(buf.Buf())
+ if err != nil {
+ panic(err)
+ }
+ defer buf.Release()
+
+ data := array.NewData(arrow.PrimitiveTypes.Int64, length,
[]*memory.Buffer{nil, buf}, nil, 0, 0)
+ defer data.Release()
+ a := array.NewInt64Data(data)
+ defer a.Release()
+ arrs = append(arrs, a)
+ }
+
+ batch := array.NewRecord(schema, arrs, length)
+ defer batch.Release()
+
+ batches := make([]arrow.Record, 0)
+ records := int64(0)
+ for records < totalRecords {
+ if records+length > totalRecords {
+ lastLen := totalRecords - records
+ batches = append(batches, batch.NewSlice(0, lastLen))
+ records += lastLen
+ } else {
+ batch.Retain()
+ batches = append(batches, batch)
+ records += length
+ }
+ }
+
+ return batches
+}
+
+func main() {
+ batches := GetPutData()
+
+ http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodGet {
+ w.WriteHeader(http.StatusBadRequest)
+ return
+ }
+
+ hdrs := w.Header()
+
+ // set these headers if testing with a local browser-based
client:
+
+ //hdrs.Add("access-control-allow-origin",
"http://localhost:8000")
+ //hdrs.Add("access-control-allow-methods", "GET")
+ //hdrs.Add("access-control-allow-headers", "content-type")
+
+ hdrs.Add("content-type", "application/vnd.apache.arrow.stream")
+ w.WriteHeader(http.StatusOK)
+
+ wr := ipc.NewWriter(w, ipc.WithSchema(batches[0].Schema()))
+ defer wr.Close()
+
+ for _, b := range batches {
+ if err := wr.Write(b); err != nil {
+ panic(err)
+ }
+ }
+ })
+
+ fmt.Println("Serving on localhost:8000...")
+ log.Fatal(http.ListenAndServe(":8000", nil))
+}
diff --git a/http/get_simple/java/client/README.md
b/http/get_simple/java/client/README.md
new file mode 100644
index 0000000..5a4b2d3
--- /dev/null
+++ b/http/get_simple/java/client/README.md
@@ -0,0 +1,33 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Java Client Example
+
+This directory contains a minimal example of an HTTP client implemented in
Java. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a list as they are received.
+
+To run this example, first start one of the server examples in the parent
directory, then:
+
+```sh
+mvn install
+mvn compile
+_JAVA_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED" mvn exec:java
-Dexec.mainClass="ArrowHttpClient"
+```
diff --git a/http/get_simple/java/client/pom.xml
b/http/get_simple/java/client/pom.xml
new file mode 100644
index 0000000..543fd7b
--- /dev/null
+++ b/http/get_simple/java/client/pom.xml
@@ -0,0 +1,38 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+
+ <groupId>com.example</groupId>
+ <artifactId>ArrowHttpClient</artifactId>
+ <version>1.0-SNAPSHOT</version>
+
+ <properties>
+ <arrow.version>14.0.1</arrow.version>
+ <maven.compiler.source>21</maven.compiler.source>
+ <maven.compiler.target>21</maven.compiler.target>
+ </properties>
+
+ <dependencies>
+
+ <dependency>
+ <groupId>org.apache.arrow</groupId>
+ <artifactId>arrow-memory-core</artifactId>
+ <version>${arrow.version}</version>
+ </dependency>
+
+ <dependency>
+ <groupId>org.apache.arrow</groupId>
+ <artifactId>arrow-memory-netty</artifactId>
+ <version>${arrow.version}</version>
+ </dependency>
+
+ <dependency>
+ <groupId>org.apache.arrow</groupId>
+ <artifactId>arrow-vector</artifactId>
+ <version>${arrow.version}</version>
+ </dependency>
+
+ </dependencies>
+</project>
diff --git
a/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java
b/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java
new file mode 100644
index 0000000..2d39e4a
--- /dev/null
+++ b/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java
@@ -0,0 +1,53 @@
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.List;
+import java.util.ArrayList;
+
+public class ArrowHttpClient {
+
+ public static void main(String[] args) {
+ String serverUrl = "http://localhost:8000";
+
+ try {
+ URL url = new URL(serverUrl);
+ HttpURLConnection connection = (HttpURLConnection)
url.openConnection();
+ connection.setRequestMethod("GET");
+
+ if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
+ InputStream inputStream = connection.getInputStream();
+
+ BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ ArrowStreamReader reader = new ArrowStreamReader(inputStream,
allocator);
+ List<ArrowRecordBatch> batches = new ArrayList<>();
+
+ int num_rows = 0;
+ while (reader.loadNextBatch()) {
+ VectorSchemaRoot root = reader.getVectorSchemaRoot();
+ num_rows += root.getRowCount();
+ VectorUnloader unloader = new VectorUnloader(root);
+ ArrowRecordBatch arb = unloader.getRecordBatch();
+ batches.add(arb);
+ }
+
+ System.out.println(reader.bytesRead() + " bytes received");
+ System.out.println(num_rows + " records received");
+ System.out.println(batches.size() + " record batches
received");
+
+ reader.close();
+ } else {
+ System.err.println("Failed with response code: " +
connection.getResponseCode());
+ }
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ }
+}
diff --git a/http/get_simple/js/client/README.md
b/http/get_simple/js/client/README.md
new file mode 100644
index 0000000..861ae64
--- /dev/null
+++ b/http/get_simple/js/client/README.md
@@ -0,0 +1,32 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple JavaScript Client Example
+
+This directory contains a minimal example of an HTTP client implemented in
JavaScript. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Creates an Arrow table from the record batches
+
+To run this example, first start one of the server examples in the parent
directory, then:
+
+```sh
+npm install apache-arrow
+node client.js
+```
diff --git a/http/get_simple/js/client/client.js
b/http/get_simple/js/client/client.js
new file mode 100644
index 0000000..73d3a16
--- /dev/null
+++ b/http/get_simple/js/client/client.js
@@ -0,0 +1,15 @@
+const Arrow = require('apache-arrow');
+
+const url = 'http://localhost:8000';
+
+async function runExample(url) {
+ const startTime = new Date();
+
+ const table = await Arrow.tableFromIPC(fetch(url));
+
+ const duration = (new Date() - startTime) / 1000;
+ console.log(`${table.batches.length} record batches received`);
+ console.log(`${duration.toFixed(2)} seconds elapsed`);
+}
+
+runExample(url);
diff --git a/http/get_simple/python/client/README.md
b/http/get_simple/python/client/README.md
new file mode 100644
index 0000000..d794968
--- /dev/null
+++ b/http/get_simple/python/client/README.md
@@ -0,0 +1,32 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Python Client Example
+
+This directory contains a minimal example of an HTTP client implemented in
Python. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a list as they are received.
+
+To run this example, first start one of the server examples in the parent
directory, then:
+
+```sh
+pip install pyarrow
+python client.py
+```
diff --git a/http/get_simple/python/client/client.py
b/http/get_simple/python/client/client.py
new file mode 100644
index 0000000..705f10c
--- /dev/null
+++ b/http/get_simple/python/client/client.py
@@ -0,0 +1,30 @@
+import urllib.request
+import pyarrow as pa
+import time
+
+start_time = time.time()
+
+with urllib.request.urlopen('http://localhost:8000') as response:
+ buffer = response.read()
+
+batches = []
+
+with pa.ipc.open_stream(buffer) as reader:
+ schema = reader.schema
+ try:
+ while True:
+ batches.append(reader.read_next_batch())
+ except StopIteration:
+ pass
+
+# or:
+#with pa.ipc.open_stream(buffer) as reader:
+# schema = reader.schema
+# batches = [b for b in reader]
+
+end_time = time.time()
+execution_time = end_time - start_time
+
+print(f"{len(buffer)} bytes received")
+print(f"{len(batches)} record batches received")
+print(f"{execution_time} seconds elapsed")
diff --git a/http/get_simple/python/server/README.md
b/http/get_simple/python/server/README.md
new file mode 100644
index 0000000..18bc738
--- /dev/null
+++ b/http/get_simple/python/server/README.md
@@ -0,0 +1,32 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Python Server Example
+
+This directory contains a minimal example of an HTTP server implemented in
Python. The server:
+1. Creates a list of record batches and populates it with synthesized data.
+2. Listens for HTTP GET requests from clients.
+3. Upon receiving a request, sends an HTTP 200 response with the body
containing an Arrow IPC stream of record batches.
+
+To run this example:
+
+```sh
+pip install pyarrow
+python server.py
+```
diff --git a/http/get_simple/python/server/server.py
b/http/get_simple/python/server/server.py
new file mode 100644
index 0000000..72126c1
--- /dev/null
+++ b/http/get_simple/python/server/server.py
@@ -0,0 +1,92 @@
+import pyarrow as pa
+from random import randbytes
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import io
+
+schema = pa.schema([
+ ('a', pa.int64()),
+ ('b', pa.int64()),
+ ('c', pa.int64()),
+ ('d', pa.int64())
+])
+
+def GetPutData():
+ total_records = 100000000
+ length = 4096
+ ncolumns = 4
+
+ arrays = []
+
+ for x in range(0, ncolumns):
+ buffer = pa.py_buffer(randbytes(length * 8))
+ arrays.append(pa.Int64Array.from_buffers(pa.int64(), length, [None,
buffer], null_count=0))
+
+ batch = pa.record_batch(arrays, schema)
+ batches = []
+
+ records_sent = 0
+ while records_sent < total_records:
+ if records_sent + length > total_records:
+ last_length = total_records - records_sent
+ batches.append(batch.slice(0, last_length))
+ records_sent += last_length
+ else:
+ batches.append(batch)
+ records_sent += length
+
+ return batches
+
+def make_reader(schema, batches):
+ return pa.RecordBatchReader.from_batches(schema, batches)
+
+def generate_batches(schema, reader):
+ with io.BytesIO() as sink, pa.ipc.new_stream(sink, schema) as writer:
+ yield sink.getvalue()
+
+ for batch in reader:
+ sink.seek(0)
+ sink.truncate(0)
+ writer.write_batch(batch)
+ yield sink.getvalue()
+
+ sink.seek(0)
+ sink.truncate(0)
+ writer.close()
+ yield sink.getvalue()
+
+class MyServer(BaseHTTPRequestHandler):
+ def do_GET(self):
+ self.send_response(200)
+ self.send_header('Content-Type', 'application/vnd.apache.arrow.stream')
+
+ # set these headers if testing with a local browser-based client:
+
+ #self.send_header('Access-Control-Allow-Origin',
'http://localhost:8000')
+ #self.send_header('Access-Control-Allow-Methods', 'GET')
+ #self.send_header('Access-Control-Allow-Headers', 'Content-Type')
+
+ self.end_headers()
+
+ for buffer in generate_batches(schema, make_reader(schema, batches)):
+ self.wfile.write(buffer)
+ self.wfile.flush()
+
+ # if any record batch could be larger than 2 GB, split it
+ # into chunks before passing to self.wfile.write() by
+ # replacing the two lines above with this:
+
+ #chunk_size = int(2e9)
+ #chunk_splits = len(buffer) // chunk_size
+ #for i in range(chunk_splits):
+ # self.wfile.write(buffer[i * chunk_size:i * chunk_size +
chunk_size])
+ # self.wfile.flush()
+ #self.wfile.write(buffer[chunk_splits * chunk_size:])
+ #self.wfile.flush()
+
+batches = GetPutData()
+
+server_address = ('localhost', 8000)
+httpd = HTTPServer(server_address, MyServer)
+
+print(f'Serving on {server_address[0]}:{server_address[1]}...')
+httpd.serve_forever()
diff --git a/http/get_simple/r/client/README.md
b/http/get_simple/r/client/README.md
new file mode 100644
index 0000000..e34c45c
--- /dev/null
+++ b/http/get_simple/r/client/README.md
@@ -0,0 +1,32 @@
+<!---
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# HTTP GET Arrow Data: Simple R Client Example
+
+This directory contains a minimal example of an HTTP client implemented in R.
The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body
containing an Arrow IPC stream of record batches.
+3. Creates an Arrow table from the record batches
+
+To run this example, first start one of the server examples in the parent
directory, then:
+
+```sh
+R -e 'install.packages(c("httr", "arrow", "tictoc"),
repos="https://cloud.r-project.org")'
+R -f client.R -s
+```
diff --git a/http/get_simple/r/client/client.R
b/http/get_simple/r/client/client.R
new file mode 100644
index 0000000..73446ea
--- /dev/null
+++ b/http/get_simple/r/client/client.R
@@ -0,0 +1,24 @@
+library(httr)
+library(tictoc)
+suppressPackageStartupMessages(library(arrow))
+
+url <- 'http://localhost:8000'
+
+tic()
+
+response <- GET(url)
+buffer <- content(response, "raw")
+reader <- RecordBatchStreamReader$create(buffer)
+table <- reader$read_table()
+
+# or:
+#batches <- reader$batches()
+# but this is very slow
+
+# or:
+#result <- read_ipc_stream(buffer, as_data_frame = FALSE)
+
+# or:
+#result <- read_ipc_stream('http://localhost:8000', as_data_frame = FALSE)
+
+toc()