This is an automated email from the ASF dual-hosted git repository.

ianmcook pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-experiments.git


The following commit(s) were added to refs/heads/main by this push:
     new 267f5a7  Add simple HTTP GET examples (#1)
267f5a7 is described below

commit 267f5a7e2610258bb1ffaef8ff759e4cff692a12
Author: Ian Cook <[email protected]>
AuthorDate: Mon Mar 4 14:26:14 2024 -0500

    Add simple HTTP GET examples (#1)
    
    * Add Python server example
    
    Co-authored-by: Dewey Dunnington <[email protected]>
    
    * Add Python client example
    
    * Add Go server example
    
    Co-authored-by: Matt Topol <[email protected]>
    
    * Add Go client example
    
    Co-authored-by: Matt Topol <[email protected]>
    
    * Add Java client example
    
    * Add C++ client example
    
    Co-authored-by: Sutou Kouhei <[email protected]>
    
    * Add JavaScript client example
    
    Co-authored-by: Dominik Moritz <[email protected]>
    
    * Add R client example
    
    * Add READMEs
    
    ---------
    
    Co-authored-by: Dewey Dunnington <[email protected]>
    Co-authored-by: Matt Topol <[email protected]>
    Co-authored-by: Sutou Kouhei <[email protected]>
    Co-authored-by: Dominik Moritz <[email protected]>
---
 http/README.md                                     |  32 +++++++
 http/get_simple/README.md                          |  33 +++++++
 http/get_simple/cpp/client/README.md               |  34 +++++++
 http/get_simple/cpp/client/client.cpp              |  57 ++++++++++++
 http/get_simple/go/client/README.md                |  34 +++++++
 http/get_simple/go/client/client.go                |  52 +++++++++++
 http/get_simple/go/server/README.md                |  34 +++++++
 http/get_simple/go/server/server.go                | 103 +++++++++++++++++++++
 http/get_simple/java/client/README.md              |  33 +++++++
 http/get_simple/java/client/pom.xml                |  38 ++++++++
 .../src/main/java/com/example/ArrowHttpClient.java |  53 +++++++++++
 http/get_simple/js/client/README.md                |  32 +++++++
 http/get_simple/js/client/client.js                |  15 +++
 http/get_simple/python/client/README.md            |  32 +++++++
 http/get_simple/python/client/client.py            |  30 ++++++
 http/get_simple/python/server/README.md            |  32 +++++++
 http/get_simple/python/server/server.py            |  92 ++++++++++++++++++
 http/get_simple/r/client/README.md                 |  32 +++++++
 http/get_simple/r/client/client.R                  |  24 +++++
 19 files changed, 792 insertions(+)

diff --git a/http/README.md b/http/README.md
new file mode 100644
index 0000000..63ff4f1
--- /dev/null
+++ b/http/README.md
@@ -0,0 +1,32 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Apache Arrow HTTP Data Transport
+
+This area of the Apache Arrow Experiments repository is for collaborative 
prototyping and research on the subject of sending and receiving 
Arrow-formatted data over HTTP APIs.
+
+The intent of this work is to:
+- Ensure excellent interoperability across languages.
+- Allow implementation within existing HTTP APIs.
+- Maximize performance.
+- Minimize implementation complexity.
+
+The end goal of this work is to inform and guide the creation of a set of 
conventions to be published in the Arrow documentation.
+
+See the [related discussion on the Arrow developer mailing 
list](https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf).
diff --git a/http/get_simple/README.md b/http/get_simple/README.md
new file mode 100644
index 0000000..8ae1193
--- /dev/null
+++ b/http/get_simple/README.md
@@ -0,0 +1,33 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Examples
+
+This directory contains a set of minimal examples of HTTP clients and servers 
implemented in several languages. These examples demonstrate:
+- How a client can send a GET request to a server and receive a response from 
the server containing an Arrow IPC stream of record batches.
+- How a server can respond to a GET request from a client and send the client 
a response containing an Arrow IPC stream of record batches.
+
+To enable performance comparisons to Arrow Flight RPC, the server examples 
generate the data in exactly the same way as in 
[`flight_benchmark.cc`](https://github.com/apache/arrow/blob/7346bdffbdca36492089f6160534bfa2b81bad90/cpp/src/arrow/flight/flight_benchmark.cc#L194-L245)
 as cited in the [original blog post introducing Flight 
RPC](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/). But 
note that Flight example sends four concurrent streams.
+
+If you are collaborating on the set of examples in this directory, please 
follow these guidelines:
+- Each new example must be implemented as minimally as possible. For example, 
error handling should be minimized or omitted.
+- Each new client example must be tested to ensure that it works with each 
existing server example.
+- Each new server example must be tested to ensure that it works with each 
existing client example.
+- To the greatest extent possible, each new server example should be 
functionally equivalent to each existing server example (generating equivalent 
data with the same schema, size, shape, and distribution of values; sending the 
same HTTP headers; and so on).
+- Each new client example must print timing and size information before 
exiting. At a minimum this must include the number of seconds elapsed (rounded 
to the second decimal place) and the number of record batches received.
diff --git a/http/get_simple/cpp/client/README.md 
b/http/get_simple/cpp/client/README.md
new file mode 100644
index 0000000..a7c993c
--- /dev/null
+++ b/http/get_simple/cpp/client/README.md
@@ -0,0 +1,34 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple C++ Client Example
+
+This directory contains a minimal example of an HTTP client implemented in 
C++. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Collects the record batches as they are received.
+
+To run this example, first start one of the server examples in the parent 
directory. Then install the `arrow` and `libcurl` C++ libraries, compile 
`client.cpp`, and run the executable. For example, using `clang++`:
+
+```sh
+clang++ client.cpp -std=c++17 $(pkg-config --cflags --libs arrow libcurl) -o 
client
+./client
+```
+
+This example requires version 15.0.0 or higher of the Arrow C++ library.
diff --git a/http/get_simple/cpp/client/client.cpp 
b/http/get_simple/cpp/client/client.cpp
new file mode 100644
index 0000000..4074b7a
--- /dev/null
+++ b/http/get_simple/cpp/client/client.cpp
@@ -0,0 +1,57 @@
+#include <curl/curl.h>
+#include <arrow/api.h>
+#include <arrow/io/api.h>
+#include <arrow/ipc/api.h>
+#include <chrono>
+
+static size_t
+WriteFunction(void *contents, size_t size, size_t nmemb, void *userp)
+{
+  size_t real_size = size * nmemb;
+  auto decoder = static_cast<arrow::ipc::StreamDecoder*>(userp);
+  if (decoder->Consume(static_cast<const uint8_t*>(contents), real_size).ok()) 
{
+    return real_size;
+  } else {
+    return 0;
+  }
+}
+
+int main(void)
+{
+  std::string url = "http://localhost:8000";;
+
+  CURL *curl_handle;
+  CURLcode res;
+
+  // We use arrow::ipc::CollectListner() here for simplicity,
+  // but another option is to process decoded record batches
+  // as a stream by overriding arrow::ipc::Listener().
+  auto collect_listener = std::make_shared<arrow::ipc::CollectListener>();
+  arrow::ipc::StreamDecoder decoder(collect_listener);
+
+  curl_global_init(CURL_GLOBAL_ALL);
+  curl_handle = curl_easy_init();
+
+  curl_easy_setopt(curl_handle, CURLOPT_URL, url.c_str());
+  curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteFunction);
+  curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &decoder);
+ 
+  auto start_time = std::chrono::steady_clock::now();
+
+  res = curl_easy_perform(curl_handle);
+
+  printf("%lld record batches received\n", 
collect_listener->num_record_batches());
+
+  auto end_time = std::chrono::steady_clock::now();
+
+  auto time_duration = 
std::chrono::duration_cast<std::chrono::duration<double>>(end_time - 
start_time);
+  printf("%.2f seconds elapsed\n", time_duration.count());
+
+  curl_easy_cleanup(curl_handle);
+  curl_global_cleanup();
+
+  std::vector<std::shared_ptr<arrow::RecordBatch>> record_batches;
+  record_batches = collect_listener->record_batches();
+ 
+  return 0;
+}
diff --git a/http/get_simple/go/client/README.md 
b/http/get_simple/go/client/README.md
new file mode 100644
index 0000000..ad82567
--- /dev/null
+++ b/http/get_simple/go/client/README.md
@@ -0,0 +1,34 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Go Client Example
+
+This directory contains a minimal example of an HTTP client implemented in Go. 
The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a slice as they are received.
+
+To run this example, first start one of the server examples in the parent 
directory, then:
+
+```sh
+go mod init client
+go mod tidy
+go build client.go
+./client
+```
diff --git a/http/get_simple/go/client/client.go 
b/http/get_simple/go/client/client.go
new file mode 100644
index 0000000..a123952
--- /dev/null
+++ b/http/get_simple/go/client/client.go
@@ -0,0 +1,52 @@
+package main
+
+import (
+       "fmt"
+       "net/http"
+       "time"
+
+       "github.com/apache/arrow/go/v15/arrow"
+       "github.com/apache/arrow/go/v15/arrow/ipc"
+       "github.com/apache/arrow/go/v15/arrow/memory"
+)
+
+func main() {
+       start := time.Now()
+       resp, err := http.Get("http://localhost:8000";)
+       if err != nil {
+               panic(err)
+       }
+
+       if resp.StatusCode != http.StatusOK {
+               panic(fmt.Errorf("got non-200 status: %d", resp.StatusCode))
+       }
+       defer resp.Body.Close()
+
+       rdr, err := ipc.NewReader(resp.Body, 
ipc.WithAllocator(memory.DefaultAllocator))
+       if err != nil {
+               panic(err)
+       }
+       defer rdr.Release()
+
+       batches := make([]arrow.Record, 0)
+       defer func() {
+               for _, b := range batches {
+                       b.Release()
+               }
+       }()
+
+       for rdr.Next() {
+               rec := rdr.Record()
+               rec.Retain()
+               batches = append(batches, rec)
+       }
+
+       if rdr.Err() != nil {
+               panic(rdr.Err())
+       }
+
+       execTime := time.Since(start)
+
+       fmt.Printf("%d record batches received\n", len(batches))
+       fmt.Printf("%.2f seconds elapsed\n", execTime.Seconds())
+}
diff --git a/http/get_simple/go/server/README.md 
b/http/get_simple/go/server/README.md
new file mode 100644
index 0000000..cbd43be
--- /dev/null
+++ b/http/get_simple/go/server/README.md
@@ -0,0 +1,34 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Go Server Example
+
+This directory contains a minimal example of an HTTP server implemented in Go. 
The server:
+1. Creates a slice of record batches and populates it with synthesized data.
+2. Listens for HTTP GET requests from clients.
+3. Upon receiving a request, sends an HTTP 200 response with the body 
containing an Arrow IPC stream of record batches.
+
+To run this example:
+
+```sh
+go mod init server
+go mod tidy
+go build server.go
+./server
+```
diff --git a/http/get_simple/go/server/server.go 
b/http/get_simple/go/server/server.go
new file mode 100644
index 0000000..a4e6971
--- /dev/null
+++ b/http/get_simple/go/server/server.go
@@ -0,0 +1,103 @@
+package main
+
+import (
+       "fmt"
+       "log"
+       "math/rand"
+       "net/http"
+
+       "github.com/apache/arrow/go/v15/arrow"
+       "github.com/apache/arrow/go/v15/arrow/array"
+       "github.com/apache/arrow/go/v15/arrow/ipc"
+       "github.com/apache/arrow/go/v15/arrow/memory"
+)
+
+var schema = arrow.NewSchema([]arrow.Field{
+       {Name: "a", Type: arrow.PrimitiveTypes.Int64},
+       {Name: "b", Type: arrow.PrimitiveTypes.Int64},
+       {Name: "c", Type: arrow.PrimitiveTypes.Int64},
+       {Name: "d", Type: arrow.PrimitiveTypes.Int64},
+}, nil)
+
+func GetPutData() []arrow.Record {
+       const (
+               totalRecords = 100000000
+               length       = 4096
+               ncolumns     = 4
+               seed         = 42
+       )
+
+       var (
+               r    = rand.New(rand.NewSource(seed))
+               mem  = memory.DefaultAllocator
+               arrs = make([]arrow.Array, 0, ncolumns)
+       )
+       for i := 0; i < ncolumns; i++ {
+               buf := memory.NewResizableBuffer(mem)
+               buf.Resize(length * 8)
+               _, err := r.Read(buf.Buf())
+               if err != nil {
+                       panic(err)
+               }
+               defer buf.Release()
+
+               data := array.NewData(arrow.PrimitiveTypes.Int64, length, 
[]*memory.Buffer{nil, buf}, nil, 0, 0)
+               defer data.Release()
+               a := array.NewInt64Data(data)
+               defer a.Release()
+               arrs = append(arrs, a)
+       }
+
+       batch := array.NewRecord(schema, arrs, length)
+       defer batch.Release()
+
+       batches := make([]arrow.Record, 0)
+       records := int64(0)
+       for records < totalRecords {
+               if records+length > totalRecords {
+                       lastLen := totalRecords - records
+                       batches = append(batches, batch.NewSlice(0, lastLen))
+                       records += lastLen
+               } else {
+                       batch.Retain()
+                       batches = append(batches, batch)
+                       records += length
+               }
+       }
+
+       return batches
+}
+
+func main() {
+       batches := GetPutData()
+
+       http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+               if r.Method != http.MethodGet {
+                       w.WriteHeader(http.StatusBadRequest)
+                       return
+               }
+
+               hdrs := w.Header()
+
+               // set these headers if testing with a local browser-based 
client:
+
+               //hdrs.Add("access-control-allow-origin", 
"http://localhost:8000";)
+               //hdrs.Add("access-control-allow-methods", "GET")
+               //hdrs.Add("access-control-allow-headers", "content-type")
+
+               hdrs.Add("content-type", "application/vnd.apache.arrow.stream")
+               w.WriteHeader(http.StatusOK)
+
+               wr := ipc.NewWriter(w, ipc.WithSchema(batches[0].Schema()))
+               defer wr.Close()
+
+               for _, b := range batches {
+                       if err := wr.Write(b); err != nil {
+                               panic(err)
+                       }
+               }
+       })
+
+       fmt.Println("Serving on localhost:8000...")
+       log.Fatal(http.ListenAndServe(":8000", nil))
+}
diff --git a/http/get_simple/java/client/README.md 
b/http/get_simple/java/client/README.md
new file mode 100644
index 0000000..5a4b2d3
--- /dev/null
+++ b/http/get_simple/java/client/README.md
@@ -0,0 +1,33 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Java Client Example
+
+This directory contains a minimal example of an HTTP client implemented in 
Java. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a list as they are received.
+
+To run this example, first start one of the server examples in the parent 
directory, then:
+
+```sh
+mvn install
+mvn compile
+_JAVA_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED" mvn exec:java 
-Dexec.mainClass="ArrowHttpClient"
+```
diff --git a/http/get_simple/java/client/pom.xml 
b/http/get_simple/java/client/pom.xml
new file mode 100644
index 0000000..543fd7b
--- /dev/null
+++ b/http/get_simple/java/client/pom.xml
@@ -0,0 +1,38 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+
+    <groupId>com.example</groupId>
+    <artifactId>ArrowHttpClient</artifactId>
+    <version>1.0-SNAPSHOT</version>
+
+    <properties>
+        <arrow.version>14.0.1</arrow.version>
+        <maven.compiler.source>21</maven.compiler.source>
+        <maven.compiler.target>21</maven.compiler.target>
+    </properties>
+
+    <dependencies>
+
+        <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-memory-core</artifactId>
+            <version>${arrow.version}</version>
+        </dependency>
+
+        <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-memory-netty</artifactId>
+            <version>${arrow.version}</version>
+        </dependency>
+
+        <dependency>
+            <groupId>org.apache.arrow</groupId>
+            <artifactId>arrow-vector</artifactId>
+            <version>${arrow.version}</version>
+        </dependency>
+
+    </dependencies>
+</project>
diff --git 
a/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java 
b/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java
new file mode 100644
index 0000000..2d39e4a
--- /dev/null
+++ b/http/get_simple/java/client/src/main/java/com/example/ArrowHttpClient.java
@@ -0,0 +1,53 @@
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.VectorUnloader;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.HttpURLConnection;
+import java.net.URL;
+import java.util.List;
+import java.util.ArrayList;
+
+public class ArrowHttpClient {
+
+    public static void main(String[] args) {
+        String serverUrl = "http://localhost:8000";;
+
+        try {
+            URL url = new URL(serverUrl);
+            HttpURLConnection connection = (HttpURLConnection) 
url.openConnection();
+            connection.setRequestMethod("GET");
+
+            if (connection.getResponseCode() == HttpURLConnection.HTTP_OK) {
+                InputStream inputStream = connection.getInputStream();
+
+                BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+                ArrowStreamReader reader = new ArrowStreamReader(inputStream, 
allocator);
+                List<ArrowRecordBatch> batches = new ArrayList<>();
+
+                int num_rows = 0;
+                while (reader.loadNextBatch()) { 
+                    VectorSchemaRoot root = reader.getVectorSchemaRoot();
+                    num_rows += root.getRowCount();
+                    VectorUnloader unloader = new VectorUnloader(root);
+                    ArrowRecordBatch arb = unloader.getRecordBatch();
+                    batches.add(arb);
+                }
+                
+                System.out.println(reader.bytesRead() + " bytes received");
+                System.out.println(num_rows + " records received");
+                System.out.println(batches.size() + " record batches 
received");
+
+                reader.close();
+            } else {
+                System.err.println("Failed with response code: " + 
connection.getResponseCode());
+            }
+        } catch (IOException e) {
+            e.printStackTrace();
+        }
+    }
+}
diff --git a/http/get_simple/js/client/README.md 
b/http/get_simple/js/client/README.md
new file mode 100644
index 0000000..861ae64
--- /dev/null
+++ b/http/get_simple/js/client/README.md
@@ -0,0 +1,32 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple JavaScript Client Example
+
+This directory contains a minimal example of an HTTP client implemented in 
JavaScript. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Creates an Arrow table from the record batches
+
+To run this example, first start one of the server examples in the parent 
directory, then:
+
+```sh
+npm install apache-arrow
+node client.js
+```
diff --git a/http/get_simple/js/client/client.js 
b/http/get_simple/js/client/client.js
new file mode 100644
index 0000000..73d3a16
--- /dev/null
+++ b/http/get_simple/js/client/client.js
@@ -0,0 +1,15 @@
+const Arrow = require('apache-arrow');
+
+const url = 'http://localhost:8000';
+
+async function runExample(url) {
+  const startTime = new Date();
+  
+  const table = await Arrow.tableFromIPC(fetch(url));
+  
+  const duration = (new Date() - startTime) / 1000;
+  console.log(`${table.batches.length} record batches received`);
+  console.log(`${duration.toFixed(2)} seconds elapsed`);
+}
+
+runExample(url);
diff --git a/http/get_simple/python/client/README.md 
b/http/get_simple/python/client/README.md
new file mode 100644
index 0000000..d794968
--- /dev/null
+++ b/http/get_simple/python/client/README.md
@@ -0,0 +1,32 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Python Client Example
+
+This directory contains a minimal example of an HTTP client implemented in 
Python. The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Adds the record batches to a list as they are received.
+
+To run this example, first start one of the server examples in the parent 
directory, then:
+
+```sh
+pip install pyarrow
+python client.py
+```
diff --git a/http/get_simple/python/client/client.py 
b/http/get_simple/python/client/client.py
new file mode 100644
index 0000000..705f10c
--- /dev/null
+++ b/http/get_simple/python/client/client.py
@@ -0,0 +1,30 @@
+import urllib.request
+import pyarrow as pa
+import time
+
+start_time = time.time()
+
+with urllib.request.urlopen('http://localhost:8000') as response:
+  buffer = response.read()
+
+batches = []
+
+with pa.ipc.open_stream(buffer) as reader:
+  schema = reader.schema
+  try:
+    while True:
+      batches.append(reader.read_next_batch())
+  except StopIteration:
+      pass
+
+# or:
+#with pa.ipc.open_stream(buffer) as reader:
+#  schema = reader.schema
+#  batches = [b for b in reader]
+
+end_time = time.time()
+execution_time = end_time - start_time
+
+print(f"{len(buffer)} bytes received")
+print(f"{len(batches)} record batches received")
+print(f"{execution_time} seconds elapsed")
diff --git a/http/get_simple/python/server/README.md 
b/http/get_simple/python/server/README.md
new file mode 100644
index 0000000..18bc738
--- /dev/null
+++ b/http/get_simple/python/server/README.md
@@ -0,0 +1,32 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple Python Server Example
+
+This directory contains a minimal example of an HTTP server implemented in 
Python. The server:
+1. Creates a list of record batches and populates it with synthesized data.
+2. Listens for HTTP GET requests from clients.
+3. Upon receiving a request, sends an HTTP 200 response with the body 
containing an Arrow IPC stream of record batches.
+
+To run this example:
+
+```sh
+pip install pyarrow
+python server.py
+```
diff --git a/http/get_simple/python/server/server.py 
b/http/get_simple/python/server/server.py
new file mode 100644
index 0000000..72126c1
--- /dev/null
+++ b/http/get_simple/python/server/server.py
@@ -0,0 +1,92 @@
+import pyarrow as pa
+from random import randbytes
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import io
+
+schema = pa.schema([
+    ('a', pa.int64()),
+    ('b', pa.int64()),
+    ('c', pa.int64()),
+    ('d', pa.int64())
+])
+
+def GetPutData():
+    total_records = 100000000
+    length = 4096
+    ncolumns = 4
+    
+    arrays = []
+    
+    for x in range(0, ncolumns):
+        buffer = pa.py_buffer(randbytes(length * 8))
+        arrays.append(pa.Int64Array.from_buffers(pa.int64(), length, [None, 
buffer], null_count=0))
+    
+    batch = pa.record_batch(arrays, schema)
+    batches = []
+    
+    records_sent = 0
+    while records_sent < total_records:
+      if records_sent + length > total_records:
+        last_length = total_records - records_sent
+        batches.append(batch.slice(0, last_length))
+        records_sent += last_length
+      else:
+        batches.append(batch)
+        records_sent += length
+    
+    return batches
+
+def make_reader(schema, batches):
+    return pa.RecordBatchReader.from_batches(schema, batches)
+
+def generate_batches(schema, reader):
+    with io.BytesIO() as sink, pa.ipc.new_stream(sink, schema) as writer:
+        yield sink.getvalue()
+        
+        for batch in reader:
+            sink.seek(0)
+            sink.truncate(0)
+            writer.write_batch(batch)
+            yield sink.getvalue()
+        
+        sink.seek(0)
+        sink.truncate(0)
+        writer.close()
+        yield sink.getvalue()
+ 
+class MyServer(BaseHTTPRequestHandler):
+    def do_GET(self):
+        self.send_response(200)
+        self.send_header('Content-Type', 'application/vnd.apache.arrow.stream')
+        
+        # set these headers if testing with a local browser-based client:
+        
+        #self.send_header('Access-Control-Allow-Origin', 
'http://localhost:8000')
+        #self.send_header('Access-Control-Allow-Methods', 'GET')
+        #self.send_header('Access-Control-Allow-Headers', 'Content-Type')
+        
+        self.end_headers()
+        
+        for buffer in generate_batches(schema, make_reader(schema, batches)):
+            self.wfile.write(buffer)
+            self.wfile.flush()
+            
+            # if any record batch could be larger than 2 GB, split it
+            # into chunks before passing to self.wfile.write() by 
+            # replacing the two lines above with this:
+            
+            #chunk_size = int(2e9)
+            #chunk_splits = len(buffer) // chunk_size
+            #for i in range(chunk_splits):
+            #    self.wfile.write(buffer[i * chunk_size:i * chunk_size + 
chunk_size])
+            #    self.wfile.flush()
+            #self.wfile.write(buffer[chunk_splits * chunk_size:])
+            #self.wfile.flush()
+
+batches = GetPutData()
+
+server_address = ('localhost', 8000)
+httpd = HTTPServer(server_address, MyServer)
+
+print(f'Serving on {server_address[0]}:{server_address[1]}...')
+httpd.serve_forever()
diff --git a/http/get_simple/r/client/README.md 
b/http/get_simple/r/client/README.md
new file mode 100644
index 0000000..e34c45c
--- /dev/null
+++ b/http/get_simple/r/client/README.md
@@ -0,0 +1,32 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# HTTP GET Arrow Data: Simple R Client Example
+
+This directory contains a minimal example of an HTTP client implemented in R. 
The client:
+1. Sends an HTTP GET request to a server.
+2. Receives an HTTP 200 response from the server, with the response body 
containing an Arrow IPC stream of record batches.
+3. Creates an Arrow table from the record batches
+
+To run this example, first start one of the server examples in the parent 
directory, then:
+
+```sh
+R -e 'install.packages(c("httr", "arrow", "tictoc"), 
repos="https://cloud.r-project.org";)'
+R -f client.R -s
+```
diff --git a/http/get_simple/r/client/client.R 
b/http/get_simple/r/client/client.R
new file mode 100644
index 0000000..73446ea
--- /dev/null
+++ b/http/get_simple/r/client/client.R
@@ -0,0 +1,24 @@
+library(httr)
+library(tictoc)
+suppressPackageStartupMessages(library(arrow))
+
+url <- 'http://localhost:8000'
+
+tic()
+
+response <- GET(url)
+buffer <- content(response, "raw")
+reader <- RecordBatchStreamReader$create(buffer)
+table <- reader$read_table()
+
+# or:
+#batches <- reader$batches()
+# but this is very slow
+
+# or:
+#result <- read_ipc_stream(buffer, as_data_frame = FALSE)
+
+# or:
+#result <- read_ipc_stream('http://localhost:8000', as_data_frame = FALSE)
+
+toc()


Reply via email to