This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 916666bb chore(dev/benchmarks): Benchmark IPC reader (#405)
916666bb is described below

commit 916666bb604692fcd5a1586d6cc6569a4787c238
Author: Dewey Dunnington <[email protected]>
AuthorDate: Tue Mar 26 13:17:30 2024 -0300

    chore(dev/benchmarks): Benchmark IPC reader (#405)
    
    This PR adds IPC reader benchmarks via the ArrowArrayStream
    implementation, which is the most common use. It required a little CMake
    shuffling to get this to work plus some code to generate IPC stream
    fixtures. The current benchmarks are:
    
    - Many batches that are very tiny (from disk)
    - Many batches that are very tiny (from a buffer)
    - One very wide batch
    
    Benchmark output in details:
    
    <details>
    
    # Benchmark Report
    
    ## Configurations
    
    These benchmarks were run with the following configurations:
    
    | preset_name | preset_description                               |
    |:------------|:-------------------------------------------------|
    | local       | Uses the nanoarrow C sources from this checkout. |
    | v0.4.0      | Uses the nanoarrow C sources the 0.4.0 release.  |
    
    ## Summary
    
    A quick and dirty summary of benchmark results between this checkout and
    the last released version.
    
    | benchmark_label | v0.4.0 | local | change | pct_change |
    
    
|:--------------------------------------------------------------|---------:|---------:|---------:|-----------:|
    | [ArrayAppendInt16](#arrayappendint16) | 2.67ms | 2.67ms | 3.03µs |
    0.1% |
    | [ArrayAppendInt32](#arrayappendint32) | 3.13ms | 3.14ms | 18.1µs |
    0.6% |
    | [ArrayAppendInt64](#arrayappendint64) | 3.86ms | 3.44ms | 1ns | -10.7%
    |
    | [ArrayAppendInt8](#arrayappendint8) | 2.4ms | 2.41ms | 9.04µs | 0.4% |
    | [ArrayAppendNulls](#arrayappendnulls) | 12.14ms | 12.13ms | 1ns |
    -0.1% |
    | [ArrayAppendString](#arrayappendstring) | 8.44ms | 8.79ms | 345.74µs |
    4.1% |
    | [ArrayViewGetInt16](#arrayviewgetint16) | 633.71µs | 631µs | 1ns |
    -0.4% |
    | [ArrayViewGetInt32](#arrayviewgetint32) | 627.51µs | 633.47µs | 5.96µs
    | 1% |
    | [ArrayViewGetInt64](#arrayviewgetint64) | 677.86µs | 680.66µs | 2.8µs
    | 0.4% |
    | [ArrayViewGetInt8](#arrayviewgetint8) | 945.2µs | 943.48µs | 1ns |
    -0.2% |
    | [ArrayViewGetString](#arrayviewgetstring) | 1.25ms | 1.26ms | 3.48µs |
    0.3% |
    | [ArrayViewIsNull](#arrayviewisnull) | 1.2ms | 1.19ms | 1ns | -0.4% |
    | [ArrayViewIsNullNonNullable](#arrayviewisnullnonnullable) | 952.93µs |
    941.49µs | 1ns | -1.2% |
    | [IpcReadManyBatchesFromBuffer](#ipcreadmanybatchesfrombuffer) | 6.1ms
    | 5.47ms | 1ns | -10.4% |
    | [IpcReadManyBatchesFromFile](#ipcreadmanybatchesfromfile) | 6.99ms |
    6.28ms | 1ns | -10.2% |
    | [IpcReadManyColumnsFromFile](#ipcreadmanycolumnsfromfile) | 8.55ms |
    8.67ms | 117.82µs | 1.4% |
    | [SchemaInitWideStruct](#schemainitwidestruct) | 1.02ms | 1.02ms |
    4.79µs | 0.5% |
    | [SchemaViewInitWideStruct](#schemaviewinitwidestruct) | 104.11µs |
    104.31µs | 198.06ns | 0.2% |
    
    ## ArrowArray-related benchmarks
    
    Benchmarks for producing ArrowArrays using the ArrowArrayXXX()
    functions.
    
    ### ArrayAppendString
    
    Use ArrowArrayAppendString() to build a string array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L289-L314)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |         81 |    8.79ms |   8.76ms |      114,185,666 |
    | v0.4.0      |         81 |    8.44ms |   8.41ms |      118,902,224 |
    
    ### ArrayAppendInt8
    
    Use ArrowArrayAppendInt() to build an int8 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L338-L340)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        294 |    2.41ms |    2.4ms |      415,884,526 |
    | v0.4.0      |        291 |     2.4ms |    2.4ms |      417,592,258 |
    
    ### ArrayAppendInt16
    
    Use ArrowArrayAppendInt() to build an int16 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L343-L345)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        262 |    2.67ms |   2.67ms |      375,204,429 |
    | v0.4.0      |        260 |    2.67ms |   2.66ms |      375,796,399 |
    
    ### ArrayAppendInt32
    
    Use ArrowArrayAppendInt() to build an int32 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L348-L350)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        224 |    3.14ms |   3.14ms |      318,552,407 |
    | v0.4.0      |        223 |    3.13ms |   3.12ms |      320,376,521 |
    
    ### ArrayAppendInt64
    
    Use ArrowArrayAppendInt() to build an int64 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L353-L355)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        224 |    3.44ms |   3.43ms |      291,583,520 |
    | v0.4.0      |        194 |    3.86ms |   3.82ms |      261,618,420 |
    
    ### ArrayAppendNulls
    
    Use ArrowArrayAppendNulls() to build an int32 array that contains 80%
    null values.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L378-L400)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |         57 |    12.1ms |   12.1ms |       82,528,559 |
    | v0.4.0      |         57 |    12.1ms |   12.1ms |       82,489,624 |
    
    ## ArrowArrayView-related benchmarks
    
    Benchmarks for consuming ArrowArrays using the ArrowArrayViewXXX()
    functions.
    
    ### ArrayViewGetInt8
    
    Use ArrowArrayViewGet() to consume an int8 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L122-L124)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        742 |     943µs |    943µs |    1,060,960,927 |
    | v0.4.0      |        745 |     945µs |    943µs |    1,059,924,880 |
    
    ### ArrayViewGetInt16
    
    Use ArrowArrayViewGet() to consume an int16 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L127-L129)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |       1115 |     631µs |    630µs |    1,586,526,901 |
    | v0.4.0      |       1115 |     634µs |    633µs |    1,580,582,789 |
    
    ### ArrayViewGetInt32
    
    Use ArrowArrayViewGet() to consume an int32 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L132-L134)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |       1111 |     633µs |    633µs |    1,580,851,070 |
    | v0.4.0      |       1115 |     628µs |    627µs |    1,596,033,249 |
    
    ### ArrayViewGetInt64
    
    Use ArrowArrayViewGet() to consume an int64 array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L137-L139)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |       1029 |     681µs |    680µs |    1,471,227,424 |
    | v0.4.0      |       1034 |     678µs |    677µs |    1,476,773,664 |
    
    ### ArrayViewIsNullNonNullable
    
    Use ArrowArrayViewIsNull() to check for nulls while consuming an int32
    array that does not contain a validity buffer.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L143-L172)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        746 |     941µs |    940µs |    1,063,268,768 |
    | v0.4.0      |        735 |     953µs |    951µs |    1,051,845,237 |
    
    ### ArrayViewIsNull
    
    Use ArrowArrayViewIsNull() to check for nulls while consuming an int32
    array that contains 20% nulls.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L176-L215)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        589 |    1.19ms |   1.19ms |      838,600,093 |
    | v0.4.0      |        579 |     1.2ms |    1.2ms |      835,530,389 |
    
    ### ArrayViewGetString
    
    Use ArrowArrayViewGetStringUnsafe() to consume a string array.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/array_benchmark.cc#L218-L249)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        559 |    1.26ms |   1.25ms |      797,174,096 |
    | v0.4.0      |        558 |    1.25ms |   1.25ms |      799,731,703 |
    
    ## IPC Reader Benchmarks
    
    Benchmarks for the ArrowArrayStream IPC reader.
    
    ### IpcReadManyBatchesFromFile
    
    Use the ArrowArrayStream IPC reader to read 10,000 batches with 5
    elements each from a file.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/ipc_benchmark.cc#L93-L113)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        110 |    6.28ms |   6.26ms |        1,596,234 |
    | v0.4.0      |        101 |    6.99ms |   6.98ms |        1,432,594 |
    
    ### IpcReadManyBatchesFromBuffer
    
    Use the ArrowArrayStream IPC reader to read 10,000 batches with 5
    elements each from a file.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/ipc_benchmark.cc#L117-L147)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        127 |    5.47ms |   5.46ms |        1,831,415 |
    | v0.4.0      |        114 |     6.1ms |    6.1ms |        1,640,075 |
    
    ### IpcReadManyColumnsFromFile
    
    Use the ArrowArrayStream IPC reader to read 10,000 batches with 5
    elements each from a file.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/ipc_benchmark.cc#L151-L171)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |         82 |    8.67ms |   8.66ms |        14,083.85 |
    | v0.4.0      |         83 |    8.55ms |   8.55ms |        14,098.53 |
    
    ## Schema-related benchmarks
    
    Benchmarks for producing and consuming ArrowSchema.
    
    ### SchemaInitWideStruct
    
    Benchmark ArrowSchema creation for very wide tables.
    
    Simulates part of the process of creating a very wide table with a
    simple column type (integer).
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/schema_benchmark.cc#L45-L56)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |        688 |    1.02ms |   1.02ms |        9,786,782 |
    | v0.4.0      |        686 |    1.02ms |   1.02ms |        9,829,446 |
    
    ### SchemaViewInitWideStruct
    
    Benchmark ArrowSchema parsing for very wide tables.
    
    Simulates part of the process of consuming a very wide table. Typically
    the ArrowSchemaViewInit() is done by ArrowArrayViewInit() but uses a
    similar pattern.
    
    [View
    
Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmark-ipc/dev/benchmarks/c/schema_benchmark.cc#L78-L91)
    
    | preset_name | iterations | real_time | cpu_time | items_per_second |
    |:------------|-----------:|----------:|---------:|-----------------:|
    | local       |       6666 |     104µs |    104µs |       96,339,782 |
    | v0.4.0      |       6733 |     104µs |    104µs |       96,151,786 |
    
    
    </details>
---
 .github/workflows/benchmarks.yaml       |  11 +++
 CMakeLists.txt                          |   1 +
 dev/benchmarks/.gitignore               |   1 +
 dev/benchmarks/CMakeLists.txt           |  42 ++++++---
 dev/benchmarks/README.md                |   1 +
 dev/benchmarks/benchmark-report.qmd     |   5 +-
 dev/benchmarks/c/ipc_benchmark.cc       | 150 ++++++++++++++++++++++++++++++++
 dev/benchmarks/generate-fixtures.py     |  84 ++++++++++++++++++
 extensions/nanoarrow_ipc/CMakeLists.txt |  20 ++---
 9 files changed, 290 insertions(+), 25 deletions(-)

diff --git a/.github/workflows/benchmarks.yaml 
b/.github/workflows/benchmarks.yaml
index 1072d2a1..de528200 100644
--- a/.github/workflows/benchmarks.yaml
+++ b/.github/workflows/benchmarks.yaml
@@ -38,6 +38,17 @@ jobs:
 
     steps:
     - uses: actions/checkout@v4
+    - uses: actions/setup-python@v4
+
+    - name: Install pyarrow
+      run: |
+        pip install pyarrow
+
+    - name: Genereate fixtures
+      run: |
+        cd dev/benchmarks
+        python generate-fixtures.py
+
     - name: Run benchmarks
       run: |
         cd dev/benchmarks
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 1feec135..6ae3b810 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -284,5 +284,6 @@ if(NANOARROW_BUILD_TESTS)
 endif()
 
 if(NANOARROW_BUILD_BENCHMARKS)
+  add_subdirectory(extensions/nanoarrow_ipc)
   add_subdirectory(dev/benchmarks)
 endif()
diff --git a/dev/benchmarks/.gitignore b/dev/benchmarks/.gitignore
index c06fbb3f..71a6b327 100644
--- a/dev/benchmarks/.gitignore
+++ b/dev/benchmarks/.gitignore
@@ -17,3 +17,4 @@
 
 .Rhistory
 benchmark-report.md
+fixtures/
diff --git a/dev/benchmarks/CMakeLists.txt b/dev/benchmarks/CMakeLists.txt
index 9064903d..bb452c17 100644
--- a/dev/benchmarks/CMakeLists.txt
+++ b/dev/benchmarks/CMakeLists.txt
@@ -16,7 +16,7 @@
 # under the License.
 
 message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
-cmake_minimum_required(VERSION 3.14)
+cmake_minimum_required(VERSION 3.18)
 include(FetchContent)
 
 project(nanoarrow_benchmarks)
@@ -54,29 +54,45 @@ fetchcontent_makeavailable(benchmark)
 
 if(IS_DIRECTORY "${NANOARROW_BENCHMARK_SOURCE_URL}")
   fetchcontent_declare(nanoarrow SOURCE_DIR 
"${NANOARROW_BENCHMARK_SOURCE_URL}")
+  fetchcontent_declare(nanoarrow_ipc
+                       SOURCE_DIR
+                       
"${NANOARROW_BENCHMARK_SOURCE_URL}/extensions/nanoarrow_ipc")
+
   fetchcontent_makeavailable(nanoarrow)
+  fetchcontent_makeavailable(nanoarrow_ipc)
 elseif(NOT "${NANOARROW_BENCHMARK_SOURCE_URL}" STREQUAL "")
   fetchcontent_declare(nanoarrow URL "${NANOARROW_BENCHMARK_SOURCE_URL}")
+  fetchcontent_declare(nanoarrow_ipc URL "${NANOARROW_BENCHMARK_SOURCE_URL}"
+                                         SOURCE_SUBDIR 
extensions/nanoarrow_ipc)
   fetchcontent_makeavailable(nanoarrow)
+  fetchcontent_makeavailable(nanoarrow_ipc)
 endif()
 
 # Check that either the parent scope or this CMakeLists.txt defines a 
nanoarrow target
-if(NOT TARGET nanoarrow)
-  message(FATAL_ERROR "nanoarrow target not found (missing 
-DNANOARROW_BENCHMARK_SOURCE_URL option?)"
+if(NOT TARGET nanoarrow OR NOT TARGET nanoarrow_ipc)
+  message(FATAL_ERROR "nanoarrow or nanoarrow_ipc target not found (missing 
-DNANOARROW_BENCHMARK_SOURCE_URL option?)"
   )
 endif()
 
-# Add + link tests
-add_executable(schema_benchmark c/schema_benchmark.cc)
-add_executable(array_benchmark c/array_benchmark.cc)
-
-target_link_libraries(schema_benchmark PRIVATE nanoarrow 
benchmark::benchmark_main)
-target_link_libraries(array_benchmark PRIVATE nanoarrow 
benchmark::benchmark_main)
+file(MAKE_DIRECTORY "${CMAKE_BINARY_DIR}/fixtures")
+foreach(ITEM float64_basic;float64_long;float64_wide)
+  file(COPY_FILE "${CMAKE_CURRENT_LIST_DIR}/fixtures/${ITEM}.arrows"
+       "${CMAKE_BINARY_DIR}/fixtures/${ITEM}.arrows" ONLY_IF_DIFFERENT)
+endforeach()
 
+# Add executables and register them as tests.
 # This lets all benchmarks run via ctest -VV when this is the top-level project
+# and takes care of setting the relevant test properties such that the 
benchmarks
+# can find the fixtures.
 include(CTest)
 enable_testing()
-add_test(NAME schema_benchmark COMMAND schema_benchmark
-                                       --benchmark_out=schema_benchmark.json)
-add_test(NAME array_benchmark COMMAND array_benchmark
-                                      --benchmark_out=array_benchmark.json)
+
+foreach(ITEM schema;array;ipc)
+  add_executable(${ITEM}_benchmark "c/${ITEM}_benchmark.cc")
+  target_link_libraries(${ITEM}_benchmark PRIVATE nanoarrow nanoarrow_ipc
+                                                  benchmark::benchmark_main)
+  add_test(NAME ${ITEM}_benchmark COMMAND ${ITEM}_benchmark
+                                          
--benchmark_out=${ITEM}_benchmark.json)
+  set_tests_properties(${ITEM}_benchmark PROPERTIES WORKING_DIRECTORY
+                                                    "${CMAKE_BINARY_DIR}")
+endforeach(ITEM)
diff --git a/dev/benchmarks/README.md b/dev/benchmarks/README.md
index e6193c13..70d0f581 100644
--- a/dev/benchmarks/README.md
+++ b/dev/benchmarks/README.md
@@ -45,6 +45,7 @@ and runs `ctest`.
 You can build a full report by running:
 
 ```shell
+python generate-fixtures.py # requires pyarrow
 ./benchmark-run-all.sh
 cd apidoc && doxygen && cd ..
 quarto render benchmark-report.qmd
diff --git a/dev/benchmarks/benchmark-report.qmd 
b/dev/benchmarks/benchmark-report.qmd
index f33be3cd..1af151b1 100644
--- a/dev/benchmarks/benchmark-report.qmd
+++ b/dev/benchmarks/benchmark-report.qmd
@@ -207,9 +207,10 @@ make_table <- function(benchmark_name) {
       iterations,
       real_time = real_time_pretty,
       cpu_time = cpu_time_pretty,
-      items_per_second = format(items_per_second, big.mark = ",")
+      items_per_second = format(items_per_second, big.mark = ","),
+      bytes_per_second = format(bytes_per_second, big.mark = ",")
     ) |>
-    knitr::kable(align = "lrrrr") |>
+    knitr::kable(align = "lrrrrr") |>
     as.character() |>
     paste(collapse = "\n")
 }
diff --git a/dev/benchmarks/c/ipc_benchmark.cc 
b/dev/benchmarks/c/ipc_benchmark.cc
new file mode 100644
index 00000000..4eda3d33
--- /dev/null
+++ b/dev/benchmarks/c/ipc_benchmark.cc
@@ -0,0 +1,150 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <stdio.h>
+
+#include <benchmark/benchmark.h>
+
+#include <nanoarrow/nanoarrow.hpp>
+#include <nanoarrow/nanoarrow_ipc.hpp>
+
+static ArrowErrorCode MakeFixtureInputStreamFile(const std::string& 
fixture_name,
+                                                 ArrowIpcInputStream* out) {
+  const char* fixture_dir = std::getenv("NANOARROW_BENCHMARK_FIXTURE_DIR");
+  if (fixture_dir == NULL) {
+    fixture_dir = "fixtures";
+  }
+
+  std::string fixture_path = std::string(fixture_dir) + std::string("/") + 
fixture_name;
+  FILE* fixture_file = fopen(fixture_path.c_str(), "rb");
+
+  NANOARROW_RETURN_NOT_OK(
+      ArrowIpcInputStreamInitFile(out, fixture_file, /*close_on_release*/ 
true));
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode MakeFixtureBuffer(const std::string& fixture_name,
+                                        ArrowBuffer* out) {
+  nanoarrow::ipc::UniqueInputStream input_stream;
+  NANOARROW_RETURN_NOT_OK(MakeFixtureInputStreamFile(fixture_name, 
input_stream.get()));
+
+  nanoarrow::UniqueBuffer buffer;
+  int64_t size_read_out = 0;
+  int64_t chunk_size = 1024;
+  do {
+    NANOARROW_RETURN_NOT_OK(ArrowBufferReserve(buffer.get(), chunk_size));
+    NANOARROW_RETURN_NOT_OK(input_stream->read(input_stream.get(),
+                                               buffer->data + 
buffer->size_bytes,
+                                               chunk_size, &size_read_out, 
nullptr));
+    buffer->size_bytes += size_read_out;
+  } while (size_read_out > 0);
+
+  ArrowBufferMove(buffer.get(), out);
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrayStreamReadAll(ArrowArrayStream* array_stream,
+                                         int64_t* batch_count, int64_t* 
column_count) {
+  nanoarrow::UniqueSchema schema;
+  NANOARROW_RETURN_NOT_OK(array_stream->get_schema(array_stream, 
schema.get()));
+  *column_count = schema->n_children;
+  benchmark::DoNotOptimize(schema);
+
+  nanoarrow::UniqueArrayView array_view;
+  NANOARROW_RETURN_NOT_OK(
+      ArrowArrayViewInitFromSchema(array_view.get(), schema.get(), nullptr));
+
+  while (true) {
+    nanoarrow::UniqueArray array;
+    NANOARROW_RETURN_NOT_OK(array_stream->get_next(array_stream, array.get()));
+    if (array->release == nullptr) {
+      break;
+    }
+
+    NANOARROW_RETURN_NOT_OK(
+        ArrowArrayViewSetArray(array_view.get(), array.get(), nullptr));
+
+    *batch_count = *batch_count + 1;
+  }
+
+  return NANOARROW_OK;
+}
+
+/// \defgroup nanoarrow-benchmark-ipc IPC Reader Benchmarks
+///
+/// Benchmarks for the ArrowArrayStream IPC reader.
+///
+/// @{
+
+static void BaseBenchmarIpcFixtureBuffer(const std::string& fixture_name,
+                                         benchmark::State& state) {
+  int64_t batch_count = 0;
+  int64_t column_count = 0;
+
+  nanoarrow::UniqueBuffer buffer;
+  NANOARROW_THROW_NOT_OK(MakeFixtureBuffer(fixture_name, buffer.get()));
+
+  for (auto _ : state) {
+    // Don't copy the buffer within the benchmarking loop
+    nanoarrow::UniqueBuffer buffer_copy;
+    NANOARROW_THROW_NOT_OK(ArrowBufferSetAllocator(
+        buffer_copy.get(),
+        ArrowBufferDeallocator([](ArrowBufferAllocator*, uint8_t*, int64_t) -> 
void {},
+                               nullptr)));
+    buffer_copy->data = buffer->data;
+    buffer_copy->size_bytes = buffer->size_bytes;
+
+    nanoarrow::ipc::UniqueInputStream input_stream;
+    NANOARROW_THROW_NOT_OK(
+        ArrowIpcInputStreamInitBuffer(input_stream.get(), buffer_copy.get()));
+
+    nanoarrow::UniqueArrayStream array_stream;
+    NANOARROW_THROW_NOT_OK(
+        ArrowIpcArrayStreamReaderInit(array_stream.get(), input_stream.get(), 
nullptr));
+
+    NANOARROW_THROW_NOT_OK(
+        ArrayStreamReadAll(array_stream.get(), &batch_count, &column_count));
+
+    benchmark::DoNotOptimize(batch_count);
+  }
+
+  state.SetBytesProcessed(state.iterations() * buffer->size_bytes);
+}
+
+/// \brief Use the ArrowArrayStream IPC reader to read a ~10 MB stream with 10
+/// float64 columns.
+static void BenchmarkIpcReadFloat64FromBuffer(benchmark::State& state) {
+  BaseBenchmarIpcFixtureBuffer("float64_basic.arrows", state);
+}
+
+/// \brief Use the ArrowArrayStream IPC reader to read a ~10 MB stream with 1
+/// float64 column.
+static void BenchmarkIpcReadFloat64LongFromBuffer(benchmark::State& state) {
+  BaseBenchmarIpcFixtureBuffer("float64_long.arrows", state);
+}
+
+/// \brief Use the ArrowArrayStream IPC reader to read a ~10 MB stream with 
1280
+/// float64 columns.
+static void BenchmarkIpcReadFloat64WideFromBuffer(benchmark::State& state) {
+  BaseBenchmarIpcFixtureBuffer("float64_wide.arrows", state);
+}
+
+BENCHMARK(BenchmarkIpcReadFloat64FromBuffer);
+BENCHMARK(BenchmarkIpcReadFloat64LongFromBuffer);
+BENCHMARK(BenchmarkIpcReadFloat64WideFromBuffer);
+
+/// @}
diff --git a/dev/benchmarks/generate-fixtures.py 
b/dev/benchmarks/generate-fixtures.py
new file mode 100644
index 00000000..415d56a0
--- /dev/null
+++ b/dev/benchmarks/generate-fixtures.py
@@ -0,0 +1,84 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import os
+
+import numpy as np
+import pyarrow as pa
+from pyarrow import ipc
+
+
+def write_fixture(schema, batch_generator, fixture_name, fixtures_dir=None):
+    if fixtures_dir is None:
+        fixtures_dir = os.getcwd()
+
+    with ipc.new_stream(os.path.join(fixtures_dir, fixture_name), schema) as 
out:
+        for batch in batch_generator:
+            out.write_batch(batch)
+
+
+def write_fixture_float64(
+    fixture_name,
+    num_cols=10,
+    num_batches=2,
+    batch_size=65536,
+    seed=1938,
+    fixtures_dir=None,
+):
+    """
+    Writes a fixture containing random float64 columns in various 
configurations.
+    """
+    generator = np.random.default_rng(seed=seed)
+
+    schema = pa.schema({f"col{i}": pa.float64() for i in range(num_cols)})
+
+    def gen_batches():
+        for _ in range(num_batches):
+            arrays = [np.array(generator.random(batch_size)) for _ in 
range(num_cols)]
+            yield pa.record_batch(arrays, names=[f"col{i}" for i in 
range(num_cols)])
+
+    write_fixture(schema, gen_batches(), fixture_name, 
fixtures_dir=fixtures_dir)
+
+
+if __name__ == "__main__":
+    this_dir = os.path.dirname(__file__)
+    fixtures_dir = os.path.join(this_dir, "fixtures")
+
+    if not os.path.isdir(fixtures_dir):
+        os.mkdir(fixtures_dir)
+
+    write_fixture_float64(
+        "float64_basic.arrows",
+        num_cols=10,
+        num_batches=2,
+        batch_size=65536,
+        fixtures_dir=fixtures_dir,
+    )
+    write_fixture_float64(
+        "float64_long.arrows",
+        num_cols=1,
+        num_batches=20,
+        batch_size=65536,
+        fixtures_dir=fixtures_dir,
+    )
+    write_fixture_float64(
+        "float64_wide.arrows",
+        num_cols=1280,
+        num_batches=1,
+        batch_size=1024,
+        fixtures_dir=fixtures_dir,
+    )
diff --git a/extensions/nanoarrow_ipc/CMakeLists.txt 
b/extensions/nanoarrow_ipc/CMakeLists.txt
index 6d4bf208..0dcf82d4 100644
--- a/extensions/nanoarrow_ipc/CMakeLists.txt
+++ b/extensions/nanoarrow_ipc/CMakeLists.txt
@@ -39,16 +39,16 @@ option(NANOARROW_IPC_CODE_COVERAGE "Enable coverage 
reporting" OFF)
 add_library(ipc_coverage_config INTERFACE)
 
 if(NANOARROW_IPC_BUILD_TESTS OR NOT NANOARROW_IPC_BUNDLE)
-  # Add the nanoarrow dependency. nanoarrow is not linked into the
-  # nanoarrow_ipc library (the caller must link this themselves);
-  # however, we need nanoarrow.h to build nanoarrow_ipc.c.
-  fetchcontent_declare(nanoarrow SOURCE_DIR ${CMAKE_CURRENT_LIST_DIR}/../..)
-
-  # Don't install nanoarrow because of this configuration
-  fetchcontent_getproperties(nanoarrow)
-  if(NOT nanoarrow_POPULATED)
-    fetchcontent_populate(nanoarrow)
-    add_subdirectory(${nanoarrow_SOURCE_DIR} ${nanoarrow_BINARY_DIR} 
EXCLUDE_FROM_ALL)
+  # Lazily add the nanoarrow dependency
+  if(NOT TARGET nanoarrow)
+    fetchcontent_declare(nanoarrow SOURCE_DIR ${CMAKE_CURRENT_LIST_DIR}/../..)
+
+    # Don't install nanoarrow because of this configuration
+    fetchcontent_getproperties(nanoarrow)
+    if(NOT nanoarrow_POPULATED)
+      fetchcontent_populate(nanoarrow)
+      add_subdirectory(${nanoarrow_SOURCE_DIR} ${nanoarrow_BINARY_DIR} 
EXCLUDE_FROM_ALL)
+    endif()
   endif()
 
   # Add the flatcc (runtime) dependency

Reply via email to