(polaris-tools) branch main updated: Benchmarks docs, config and core reorganisation (#9)

snazy Wed, 23 Apr 2025 07:19:06 -0700

This is an automated email from the ASF dual-hosted git repository.

snazy pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris-tools.git



The following commit(s) were added to refs/heads/main by this push:
     new 1fb817c  Benchmarks docs, config and core reorganisation (#9)
1fb817c is described below

commit 1fb817cc1832a14b385501da55452671fb4b03df
Author: Pierre Laporte <pie...@pingtimeout.fr>
AuthorDate: Wed Apr 23 16:18:54 2025 +0200

    Benchmarks docs, config and core reorganisation (#9)
    
    * Cleanup benchmarks list in README
    
    * Move mixed simulation parameter under the proper section
    
    * Remove the updates generation parameters
    
    This commit changes the way update requests are generated.  They used to
    be generated up to a certain number.  Now as many updates as the user
    wants can be generated, using infinite streams.
    
    This not only simplifies the configuration of update-related benchmarks
    but also reduce the amount of boilerplate code.
    
    Typically the `BufferedRandomIterator` class was needed to shuffle
    updates, as otherwise, multiple updates against the same entity would be
    sent simultaneously.  Now, before an entity is updated again, all the
    other entities of the dataset need to receive an update as well.  This
    removes the need for the shuffling and buffering operation.
    
    As a result, there are no random operation that are performed anymore in
    the simulations.  The seed parameter is not necessary anymore.  It has
    been removed as it is dead code.
    
    * Fix misleading throughput parameters
    
    The throughput of the 100% write and 100% read benchmarks cannot be
    configured.  Only the number of concurrent users can.  Those benchmark
    are designed with a closed model [1] so that the number of requests
    injection rate depends on the response rate, and so that the requests do
    not queue up on the server side.
    
    This commit renames the parameters and fixes the documentation
    accordingly, for clarity.
---
 benchmarks/README.md                               | 35 ++++----------
 .../src/gatling/resources/benchmark-defaults.conf  | 53 ++++++++--------------
 .../benchmarks/actions/NamespaceActions.scala      | 27 +++++++----
 .../polaris/benchmarks/actions/TableActions.scala  | 14 ++++--
 .../polaris/benchmarks/actions/ViewActions.scala   | 14 ++++--
 .../benchmarks/parameters/BenchmarkConfig.scala    | 14 ++----
 .../ReadUpdateTreeDatasetParameters.scala          |  9 ++++
 .../benchmarks/parameters/WorkloadParameters.scala | 30 +-----------
 .../benchmarks/simulations/ReadTreeDataset.scala   |  3 +-
 .../simulations/ReadUpdateTreeDataset.scala        | 13 ++++--
 .../polaris/benchmarks/util/CircularIterator.scala | 22 +--------
 11 files changed, 91 insertions(+), 143 deletions(-)

diff --git a/benchmarks/README.md b/benchmarks/README.md
index 8c443e3..792a400 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -23,30 +23,9 @@ Benchmarks for the Polaris service using Gatling.
 
 ## Available Benchmarks
 
-### Dataset Creation Benchmark
-
-The CreateTreeDataset benchmark creates a test dataset with a specific 
structure:
-
-- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates up to 
50 entities simultaneously
-
-This is a write-only workload designed to populate the system for subsequent 
benchmarks.
-
-### Read/Update Benchmark
-
-The ReadUpdateTreeDataset benchmark tests read and update operations on an 
existing dataset:
-
-- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs 
up to 20 read/update operations simultaneously
-
-This benchmark can only be run after using CreateTreeDataset to populate the 
system.
-
-### Read-Only Benchmark
-
-The ReadTreeDataset benchmark is a 100% read workload that fetches a tree 
dataset in Polaris:
-
-- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs 
read-only operations to verify namespaces, tables, and views
-
-This benchmark is intended to be used against a Polaris instance with a 
pre-existing tree dataset. It has no side effects on the dataset and can be 
executed multiple times without any issues.
-
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates a 
test dataset with a specific structure.  It is a write-only workload designed 
to populate the system for subsequent benchmarks.
+- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs 
read-only operations to fetch namespaces, tables, and views.  Some attributes 
of the objects are also fetched.  This benchmark is intended to be used against 
a Polaris instance with a pre-existing tree dataset.  It has no side effects on 
the dataset and can be executed multiple times without any issues.
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs 
read and update operations against a Polaris instance populated with a test 
dataset.  It is a read/write workload that can be used to test the system's 
ability to handle concurrent read and update operations.  It is not destructive 
and does not prevent subsequent executions of `ReadTreeDataset` or 
`ReadUpdateTreeDataset`.
 
 ## Parameters
 
@@ -95,7 +74,9 @@ Workload settings are configured under `workload`:
 
 ```hocon
 workload {
-  read-write-ratio = 0.8  # Ratio of reads (0.0-1.0)
+  read-update-tree-dataset {
+    read-write-ratio = 0.8  # Ratio of reads (0.0-1.0)
+  }
 }
 ```
 
@@ -117,7 +98,9 @@ http {
 }
 
 workload {
-  read-write-ratio = 0.8
+  read-update-tree-dataset {
+    read-write-ratio = 0.8  # Ratio of reads (0.0-1.0)
+  }
 }
 ```
 
diff --git a/benchmarks/src/gatling/resources/benchmark-defaults.conf 
b/benchmarks/src/gatling/resources/benchmark-defaults.conf
index 21b0451..8c326fb 100644
--- a/benchmarks/src/gatling/resources/benchmark-defaults.conf
+++ b/benchmarks/src/gatling/resources/benchmark-defaults.conf
@@ -117,55 +117,42 @@ dataset.tree {
 
 # Workload configuration
 workload {
-  # Ratio of read operations to write operations
-  # Range: 0.0 to 1.0 where:
-  # - 0.0 means 100% writes
-  # - 1.0 means 100% reads
-  # Example: 0.8 means 80% reads and 20% writes
-  # Required: Must be provided through environment variable READ_WRITE_RATIO
-  read-write-ratio = 0.5
-
-  # Seed used for random number generation
-  # Default: 1
-  seed = 1
-
-  # Number of property updates to perform per individual namespace
-  # Default: 5
-  updates-per-namespace = 5
-
-  # Number of property updates to perform per individual table
-  # Default: 10
-  updates-per-table = 10
-
-  # Number of property updates to perform per individual view
-  # Default: 10
-  updates-per-view = 10
-
-
   # Configuration for the ReadTreeDataset simulation
   read-tree-dataset {
-    # Number of table operations to perform per second
+    # Number of table operations to perform simultaneously
+    # This controls the concurrency level for table operations
     # Default: 20
-    table-throughput = 20
+    table-concurrency = 20
 
-    # Number of view operations to perform per second
+    # Number of view operations to perform simultaneously
+    # This controls the concurrency level for view operations
     # Default: 10
-    view-throughput = 10
+    view-concurrency = 10
   }
 
   # Configuration for the CreateTreeDataset simulation
   create-tree-dataset {
-    # Number of table operations to perform per second
+    # Number of table operations to perform simultaneously
+    # This controls the concurrency level for table operations
     # Default: 20
-    table-throughput = 20
+    table-concurrency = 20
 
-    # Number of view operations to perform per second
+    # Number of view operations to perform simultaneously
+    # This controls the concurrency level for view operations
     # Default: 10
-    view-throughput = 10
+    view-concurrency = 10
   }
 
   # Configuration for the ReadUpdateTreeDataset simulation
   read-update-tree-dataset {
+    # Ratio of read operations to write operations
+    # Range: 0.0 to 1.0 where:
+    # - 0.0 means 100% writes
+    # - 1.0 means 100% reads
+    # Example: 0.8 means 80% reads and 20% writes
+    # Default: 0.5
+    read-write-ratio = 0.5
+
     # Number of operations to perform per second
     # Default: 100
     throughput = 100
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala
index 89e0e41..7a5f5f0 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala
@@ -115,15 +115,24 @@ case class NamespaceActions(
       )
     }
 
-  def namespacePropertiesUpdateFeeder(): Feeder[Any] = 
namespaceIdentityFeeder()
-    .flatMap { row =>
-      (0 until wp.updatesPerNamespace).map { updateId =>
-        val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId")
-        row ++ Map(
-          "jsonPropertyUpdates" -> Json.toJson(updates).toString()
-        )
-      }
-    }
+  /**
+   * Creates a Gatling Feeder that generates namespace property updates. Each 
row contains a single
+   * property update targeting a specific namespace. The feeder is infinite, 
in that it will
+   * generate a new property update every time.
+   *
+   * @return An iterator providing namespace property update details
+   */
+  def namespacePropertiesUpdateFeeder(): Feeder[Any] = Iterator
+    .from(0)
+    .flatMap(updateId =>
+      namespaceIdentityFeeder()
+        .map { row =>
+          val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId")
+          row ++ Map(
+            "jsonPropertyUpdates" -> Json.toJson(updates).toString()
+          )
+        }
+    )
 
   /**
    * Creates a new namespace in a specified catalog. The namespace is created 
with a full path and
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala
index 7c0901e..1d5b951 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala
@@ -107,14 +107,18 @@ case class TableActions(
 
   /**
    * Creates a Gatling Feeder that generates table property updates. Each row 
contains a single
-   * property update targeting a specific table.
+   * property update targeting a specific table. The feeder is infinite, in 
that it will generate a
+   * new property update every time.
    *
    * @return An iterator providing table property update details
    */
-  def propertyUpdateFeeder(): Feeder[Any] = tableIdentityFeeder()
-    .flatMap(row =>
-      Range(0, wp.updatesPerTable)
-        .map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": 
"NewValue_$k"}"""))
+  def propertyUpdateFeeder(): Feeder[Any] = Iterator
+    .from(0)
+    .flatMap(updateId =>
+      tableIdentityFeeder()
+        .map { row =>
+          row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": 
"NewValue_$updateId"}""")
+        }
     )
 
   /**
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala
index 812c161..14bf698 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala
@@ -99,14 +99,18 @@ case class ViewActions(
 
   /**
    * Creates a Gatling Feeder that generates view property updates. Each row 
contains a single
-   * property update targeting a specific view.
+   * property update targeting a specific view. The feeder is infinite, in 
that it will generate a
+   * new property update every time.
    *
    * @return An iterator providing view property update details
    */
-  def propertyUpdateFeeder(): Feeder[Any] = viewIdentityFeeder()
-    .flatMap(row =>
-      Range(0, wp.updatesPerView)
-        .map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": 
"NewValue_$k"}"""))
+  def propertyUpdateFeeder(): Feeder[Any] = Iterator
+    .from(0)
+    .flatMap(updateId =>
+      viewIdentityFeeder()
+        .map { row =>
+          row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": 
"NewValue_$updateId"}""")
+        }
     )
 
   /**
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala
index 755a34d..730321c 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala
@@ -44,20 +44,16 @@ object BenchmarkConfig {
       val rutdConfig = workload.getConfig("read-update-tree-dataset")
 
       WorkloadParameters(
-        workload.getDouble("read-write-ratio"),
-        workload.getInt("updates-per-namespace"),
-        workload.getInt("updates-per-table"),
-        workload.getInt("updates-per-view"),
-        workload.getLong("seed"),
         ReadTreeDatasetParameters(
-          rtdConfig.getInt("table-throughput"),
-          rtdConfig.getInt("view-throughput")
+          rtdConfig.getInt("table-concurrency"),
+          rtdConfig.getInt("view-concurrency")
         ),
         CreateTreeDatasetParameters(
-          ctdConfig.getInt("table-throughput"),
-          ctdConfig.getInt("view-throughput")
+          ctdConfig.getInt("table-concurrency"),
+          ctdConfig.getInt("view-concurrency")
         ),
         ReadUpdateTreeDatasetParameters(
+          rutdConfig.getDouble("read-write-ratio"),
           rutdConfig.getInt("throughput"),
           rutdConfig.getInt("duration-in-minutes")
         )
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala
index ada6cac..40d4c7d 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala
@@ -22,13 +22,22 @@ package org.apache.polaris.benchmarks.parameters
 /**
  * Case class to hold the parameters for the ReadUpdateTreeDataset simulation.
  *
+ * @param readWriteRatio The ratio of read operations to write operations 
(0.0-1.0).
  * @param throughput The number of operations to perform per second.
  * @param durationInMinutes The duration of the simulation in minutes.
  */
 case class ReadUpdateTreeDatasetParameters(
+    readWriteRatio: Double,
     throughput: Int,
     durationInMinutes: Int
 ) {
+  require(
+    readWriteRatio >= 0.0 && readWriteRatio <= 1.0,
+    "Read/write ratio must be between 0.0 and 1.0 inclusive"
+  )
   require(throughput >= 0, "Throughput cannot be negative")
   require(durationInMinutes > 0, "Duration in minutes must be positive")
+
+  val gatlingReadRatio: Double = readWriteRatio * 100
+  val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100
 }
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala
index daad6bc..b6fec3c 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala
@@ -20,35 +20,7 @@
 package org.apache.polaris.benchmarks.parameters
 
 case class WorkloadParameters(
-    readWriteRatio: Double,
-    updatesPerNamespace: Int,
-    updatesPerTable: Int,
-    updatesPerView: Int,
-    seed: Long,
     readTreeDataset: ReadTreeDatasetParameters,
     createTreeDataset: CreateTreeDatasetParameters,
     readUpdateTreeDataset: ReadUpdateTreeDatasetParameters
-) {
-  require(
-    readWriteRatio >= 0.0 && readWriteRatio <= 1.0,
-    "Read/write ratio must be between 0.0 and 1.0 inclusive"
-  )
-
-  require(
-    updatesPerNamespace >= 0,
-    "Updates per namespace must be non-negative"
-  )
-
-  require(
-    updatesPerTable >= 0,
-    "Updates per table must be non-negative"
-  )
-
-  require(
-    updatesPerView >= 0,
-    "Updates per view must be non-negative"
-  )
-
-  val gatlingReadRatio: Double = readWriteRatio * 100
-  val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100
-}
+) {}
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala
index 942105f..0bb6244 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala
@@ -33,7 +33,8 @@ import scala.concurrent.duration.DurationInt
 /**
  * This simulation is a 100% read workload that fetches a tree dataset in 
Polaris. It is intended to
  * be used against a Polaris instance with a pre-existing tree dataset. It has 
no side effect on the
- * dataset and therefore can be executed multiple times without any issue.
+ * dataset and therefore can be executed multiple times without any issue. It 
fetches each entity
+ * exactly once.
  */
 class ReadTreeDataset extends Simulation {
   private val logger = LoggerFactory.getLogger(getClass)
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala
index 080152c..9626e83 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala
@@ -37,6 +37,9 @@ import scala.concurrent.duration._
 
 /**
  * This simulation tests read and update operations on an existing dataset.
+ *
+ * The ratio of read operations to write operations is controlled by the 
readWriteRatio parameter in
+ * the ReadUpdateTreeDatasetParameters.
  */
 class ReadUpdateTreeDataset extends Simulation {
   private val logger = LoggerFactory.getLogger(getClass)
@@ -91,17 +94,17 @@ class ReadUpdateTreeDataset extends Simulation {
   private val nsListFeeder = new 
CircularIterator(nsActions.namespaceIdentityFeeder)
   private val nsExistsFeeder = new 
CircularIterator(nsActions.namespaceIdentityFeeder)
   private val nsFetchFeeder = new 
CircularIterator(nsActions.namespaceFetchFeeder)
-  private val nsUpdateFeeder = new 
CircularIterator(nsActions.namespacePropertiesUpdateFeeder)
+  private val nsUpdateFeeder = nsActions.namespacePropertiesUpdateFeeder()
 
   private val tblListFeeder = new 
CircularIterator(tblActions.tableIdentityFeeder)
   private val tblExistsFeeder = new 
CircularIterator(tblActions.tableIdentityFeeder)
   private val tblFetchFeeder = new 
CircularIterator(tblActions.tableFetchFeeder)
-  private val tblUpdateFeeder = new 
CircularIterator(tblActions.propertyUpdateFeeder)
+  private val tblUpdateFeeder = tblActions.propertyUpdateFeeder()
 
   private val viewListFeeder = new 
CircularIterator(viewActions.viewIdentityFeeder)
   private val viewExistsFeeder = new 
CircularIterator(viewActions.viewIdentityFeeder)
   private val viewFetchFeeder = new 
CircularIterator(viewActions.viewFetchFeeder)
-  private val viewUpdateFeeder = new 
CircularIterator(viewActions.propertyUpdateFeeder)
+  private val viewUpdateFeeder = viewActions.propertyUpdateFeeder()
 
   // 
--------------------------------------------------------------------------------
   // Workload: Randomly read and write entities
@@ -110,7 +113,7 @@ class ReadUpdateTreeDataset extends Simulation {
     scenario("Read and write entities using the Iceberg REST API")
       .exec(authActions.restoreAccessTokenInSession)
       .randomSwitch(
-        wp.gatlingReadRatio -> group("Read")(
+        wp.readUpdateTreeDataset.gatlingReadRatio -> group("Read")(
           uniformRandomSwitch(
             
exec(feed(nsListFeeder).exec(nsActions.fetchAllChildrenNamespaces)),
             exec(feed(nsExistsFeeder).exec(nsActions.checkNamespaceExists)),
@@ -123,7 +126,7 @@ class ReadUpdateTreeDataset extends Simulation {
             exec(feed(viewFetchFeeder).exec(viewActions.fetchView))
           )
         ),
-        wp.gatlingWriteRatio -> group("Write")(
+        wp.readUpdateTreeDataset.gatlingWriteRatio -> group("Write")(
           uniformRandomSwitch(
             
exec(feed(nsUpdateFeeder).exec(nsActions.updateNamespaceProperties)),
             exec(feed(tblUpdateFeeder).exec(tblActions.updateTable)),
diff --git 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala
 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala
index 0694f50..a766d26 100644
--- 
a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala
+++ 
b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala
@@ -19,35 +19,15 @@
 
 package org.apache.polaris.benchmarks.util
 
-import scala.util.Random
-
 class CircularIterator[T](builder: () => Iterator[T]) extends Iterator[T] {
   private var currentIterator: Iterator[T] = builder()
 
   override def hasNext: Boolean = true
 
-  override def next(): T = synchronized {
+  override def next(): T = {
     if (!currentIterator.hasNext) {
       currentIterator = builder()
     }
     currentIterator.next()
   }
 }
-
-class BufferedRandomIterator[T](underlying: CircularIterator[T], bufferSize: 
Int, seed: Long)
-    extends Iterator[T] {
-  private val random = new Random(seed)
-  private var buffer: Iterator[T] = populateAndShuffle()
-
-  private def populateAndShuffle(): Iterator[T] =
-    random.shuffle((1 to bufferSize).map(_ => 
underlying.next()).toList).iterator
-
-  override def hasNext: Boolean = true
-
-  override def next(): T = synchronized {
-    if (!buffer.hasNext) {
-      buffer = populateAndShuffle()
-    }
-    buffer.next()
-  }
-}

(polaris-tools) branch main updated: Benchmarks docs, config and core reorganisation (#9)

Reply via email to