This is an automated email from the ASF dual-hosted git repository. snazy pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/polaris-tools.git
The following commit(s) were added to refs/heads/main by this push: new 1fb817c Benchmarks docs, config and core reorganisation (#9) 1fb817c is described below commit 1fb817cc1832a14b385501da55452671fb4b03df Author: Pierre Laporte <pie...@pingtimeout.fr> AuthorDate: Wed Apr 23 16:18:54 2025 +0200 Benchmarks docs, config and core reorganisation (#9) * Cleanup benchmarks list in README * Move mixed simulation parameter under the proper section * Remove the updates generation parameters This commit changes the way update requests are generated. They used to be generated up to a certain number. Now as many updates as the user wants can be generated, using infinite streams. This not only simplifies the configuration of update-related benchmarks but also reduce the amount of boilerplate code. Typically the `BufferedRandomIterator` class was needed to shuffle updates, as otherwise, multiple updates against the same entity would be sent simultaneously. Now, before an entity is updated again, all the other entities of the dataset need to receive an update as well. This removes the need for the shuffling and buffering operation. As a result, there are no random operation that are performed anymore in the simulations. The seed parameter is not necessary anymore. It has been removed as it is dead code. * Fix misleading throughput parameters The throughput of the 100% write and 100% read benchmarks cannot be configured. Only the number of concurrent users can. Those benchmark are designed with a closed model [1] so that the number of requests injection rate depends on the response rate, and so that the requests do not queue up on the server side. This commit renames the parameters and fixes the documentation accordingly, for clarity. --- benchmarks/README.md | 35 ++++---------- .../src/gatling/resources/benchmark-defaults.conf | 53 ++++++++-------------- .../benchmarks/actions/NamespaceActions.scala | 27 +++++++---- .../polaris/benchmarks/actions/TableActions.scala | 14 ++++-- .../polaris/benchmarks/actions/ViewActions.scala | 14 ++++-- .../benchmarks/parameters/BenchmarkConfig.scala | 14 ++---- .../ReadUpdateTreeDatasetParameters.scala | 9 ++++ .../benchmarks/parameters/WorkloadParameters.scala | 30 +----------- .../benchmarks/simulations/ReadTreeDataset.scala | 3 +- .../simulations/ReadUpdateTreeDataset.scala | 13 ++++-- .../polaris/benchmarks/util/CircularIterator.scala | 22 +-------- 11 files changed, 91 insertions(+), 143 deletions(-) diff --git a/benchmarks/README.md b/benchmarks/README.md index 8c443e3..792a400 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -23,30 +23,9 @@ Benchmarks for the Polaris service using Gatling. ## Available Benchmarks -### Dataset Creation Benchmark - -The CreateTreeDataset benchmark creates a test dataset with a specific structure: - -- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates up to 50 entities simultaneously - -This is a write-only workload designed to populate the system for subsequent benchmarks. - -### Read/Update Benchmark - -The ReadUpdateTreeDataset benchmark tests read and update operations on an existing dataset: - -- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs up to 20 read/update operations simultaneously - -This benchmark can only be run after using CreateTreeDataset to populate the system. - -### Read-Only Benchmark - -The ReadTreeDataset benchmark is a 100% read workload that fetches a tree dataset in Polaris: - -- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs read-only operations to verify namespaces, tables, and views - -This benchmark is intended to be used against a Polaris instance with a pre-existing tree dataset. It has no side effects on the dataset and can be executed multiple times without any issues. - +- `org.apache.polaris.benchmarks.simulations.CreateTreeDataset`: Creates a test dataset with a specific structure. It is a write-only workload designed to populate the system for subsequent benchmarks. +- `org.apache.polaris.benchmarks.simulations.ReadTreeDataset`: Performs read-only operations to fetch namespaces, tables, and views. Some attributes of the objects are also fetched. This benchmark is intended to be used against a Polaris instance with a pre-existing tree dataset. It has no side effects on the dataset and can be executed multiple times without any issues. +- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDataset`: Performs read and update operations against a Polaris instance populated with a test dataset. It is a read/write workload that can be used to test the system's ability to handle concurrent read and update operations. It is not destructive and does not prevent subsequent executions of `ReadTreeDataset` or `ReadUpdateTreeDataset`. ## Parameters @@ -95,7 +74,9 @@ Workload settings are configured under `workload`: ```hocon workload { - read-write-ratio = 0.8 # Ratio of reads (0.0-1.0) + read-update-tree-dataset { + read-write-ratio = 0.8 # Ratio of reads (0.0-1.0) + } } ``` @@ -117,7 +98,9 @@ http { } workload { - read-write-ratio = 0.8 + read-update-tree-dataset { + read-write-ratio = 0.8 # Ratio of reads (0.0-1.0) + } } ``` diff --git a/benchmarks/src/gatling/resources/benchmark-defaults.conf b/benchmarks/src/gatling/resources/benchmark-defaults.conf index 21b0451..8c326fb 100644 --- a/benchmarks/src/gatling/resources/benchmark-defaults.conf +++ b/benchmarks/src/gatling/resources/benchmark-defaults.conf @@ -117,55 +117,42 @@ dataset.tree { # Workload configuration workload { - # Ratio of read operations to write operations - # Range: 0.0 to 1.0 where: - # - 0.0 means 100% writes - # - 1.0 means 100% reads - # Example: 0.8 means 80% reads and 20% writes - # Required: Must be provided through environment variable READ_WRITE_RATIO - read-write-ratio = 0.5 - - # Seed used for random number generation - # Default: 1 - seed = 1 - - # Number of property updates to perform per individual namespace - # Default: 5 - updates-per-namespace = 5 - - # Number of property updates to perform per individual table - # Default: 10 - updates-per-table = 10 - - # Number of property updates to perform per individual view - # Default: 10 - updates-per-view = 10 - - # Configuration for the ReadTreeDataset simulation read-tree-dataset { - # Number of table operations to perform per second + # Number of table operations to perform simultaneously + # This controls the concurrency level for table operations # Default: 20 - table-throughput = 20 + table-concurrency = 20 - # Number of view operations to perform per second + # Number of view operations to perform simultaneously + # This controls the concurrency level for view operations # Default: 10 - view-throughput = 10 + view-concurrency = 10 } # Configuration for the CreateTreeDataset simulation create-tree-dataset { - # Number of table operations to perform per second + # Number of table operations to perform simultaneously + # This controls the concurrency level for table operations # Default: 20 - table-throughput = 20 + table-concurrency = 20 - # Number of view operations to perform per second + # Number of view operations to perform simultaneously + # This controls the concurrency level for view operations # Default: 10 - view-throughput = 10 + view-concurrency = 10 } # Configuration for the ReadUpdateTreeDataset simulation read-update-tree-dataset { + # Ratio of read operations to write operations + # Range: 0.0 to 1.0 where: + # - 0.0 means 100% writes + # - 1.0 means 100% reads + # Example: 0.8 means 80% reads and 20% writes + # Default: 0.5 + read-write-ratio = 0.5 + # Number of operations to perform per second # Default: 100 throughput = 100 diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala index 89e0e41..7a5f5f0 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/NamespaceActions.scala @@ -115,15 +115,24 @@ case class NamespaceActions( ) } - def namespacePropertiesUpdateFeeder(): Feeder[Any] = namespaceIdentityFeeder() - .flatMap { row => - (0 until wp.updatesPerNamespace).map { updateId => - val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId") - row ++ Map( - "jsonPropertyUpdates" -> Json.toJson(updates).toString() - ) - } - } + /** + * Creates a Gatling Feeder that generates namespace property updates. Each row contains a single + * property update targeting a specific namespace. The feeder is infinite, in that it will + * generate a new property update every time. + * + * @return An iterator providing namespace property update details + */ + def namespacePropertiesUpdateFeeder(): Feeder[Any] = Iterator + .from(0) + .flatMap(updateId => + namespaceIdentityFeeder() + .map { row => + val updates = Map(s"UpdatedAttribute_$updateId" -> s"$updateId") + row ++ Map( + "jsonPropertyUpdates" -> Json.toJson(updates).toString() + ) + } + ) /** * Creates a new namespace in a specified catalog. The namespace is created with a full path and diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala index 7c0901e..1d5b951 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/TableActions.scala @@ -107,14 +107,18 @@ case class TableActions( /** * Creates a Gatling Feeder that generates table property updates. Each row contains a single - * property update targeting a specific table. + * property update targeting a specific table. The feeder is infinite, in that it will generate a + * new property update every time. * * @return An iterator providing table property update details */ - def propertyUpdateFeeder(): Feeder[Any] = tableIdentityFeeder() - .flatMap(row => - Range(0, wp.updatesPerTable) - .map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": "NewValue_$k"}""")) + def propertyUpdateFeeder(): Feeder[Any] = Iterator + .from(0) + .flatMap(updateId => + tableIdentityFeeder() + .map { row => + row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": "NewValue_$updateId"}""") + } ) /** diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala index 812c161..14bf698 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/ViewActions.scala @@ -99,14 +99,18 @@ case class ViewActions( /** * Creates a Gatling Feeder that generates view property updates. Each row contains a single - * property update targeting a specific view. + * property update targeting a specific view. The feeder is infinite, in that it will generate a + * new property update every time. * * @return An iterator providing view property update details */ - def propertyUpdateFeeder(): Feeder[Any] = viewIdentityFeeder() - .flatMap(row => - Range(0, wp.updatesPerView) - .map(k => row + ("newProperty" -> s"""{"NewAttribute_$k": "NewValue_$k"}""")) + def propertyUpdateFeeder(): Feeder[Any] = Iterator + .from(0) + .flatMap(updateId => + viewIdentityFeeder() + .map { row => + row ++ Map("newProperty" -> s"""{"NewAttribute_$updateId": "NewValue_$updateId"}""") + } ) /** diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala index 755a34d..730321c 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/BenchmarkConfig.scala @@ -44,20 +44,16 @@ object BenchmarkConfig { val rutdConfig = workload.getConfig("read-update-tree-dataset") WorkloadParameters( - workload.getDouble("read-write-ratio"), - workload.getInt("updates-per-namespace"), - workload.getInt("updates-per-table"), - workload.getInt("updates-per-view"), - workload.getLong("seed"), ReadTreeDatasetParameters( - rtdConfig.getInt("table-throughput"), - rtdConfig.getInt("view-throughput") + rtdConfig.getInt("table-concurrency"), + rtdConfig.getInt("view-concurrency") ), CreateTreeDatasetParameters( - ctdConfig.getInt("table-throughput"), - ctdConfig.getInt("view-throughput") + ctdConfig.getInt("table-concurrency"), + ctdConfig.getInt("view-concurrency") ), ReadUpdateTreeDatasetParameters( + rutdConfig.getDouble("read-write-ratio"), rutdConfig.getInt("throughput"), rutdConfig.getInt("duration-in-minutes") ) diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala index ada6cac..40d4c7d 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/ReadUpdateTreeDatasetParameters.scala @@ -22,13 +22,22 @@ package org.apache.polaris.benchmarks.parameters /** * Case class to hold the parameters for the ReadUpdateTreeDataset simulation. * + * @param readWriteRatio The ratio of read operations to write operations (0.0-1.0). * @param throughput The number of operations to perform per second. * @param durationInMinutes The duration of the simulation in minutes. */ case class ReadUpdateTreeDatasetParameters( + readWriteRatio: Double, throughput: Int, durationInMinutes: Int ) { + require( + readWriteRatio >= 0.0 && readWriteRatio <= 1.0, + "Read/write ratio must be between 0.0 and 1.0 inclusive" + ) require(throughput >= 0, "Throughput cannot be negative") require(durationInMinutes > 0, "Duration in minutes must be positive") + + val gatlingReadRatio: Double = readWriteRatio * 100 + val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100 } diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala index daad6bc..b6fec3c 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/parameters/WorkloadParameters.scala @@ -20,35 +20,7 @@ package org.apache.polaris.benchmarks.parameters case class WorkloadParameters( - readWriteRatio: Double, - updatesPerNamespace: Int, - updatesPerTable: Int, - updatesPerView: Int, - seed: Long, readTreeDataset: ReadTreeDatasetParameters, createTreeDataset: CreateTreeDatasetParameters, readUpdateTreeDataset: ReadUpdateTreeDatasetParameters -) { - require( - readWriteRatio >= 0.0 && readWriteRatio <= 1.0, - "Read/write ratio must be between 0.0 and 1.0 inclusive" - ) - - require( - updatesPerNamespace >= 0, - "Updates per namespace must be non-negative" - ) - - require( - updatesPerTable >= 0, - "Updates per table must be non-negative" - ) - - require( - updatesPerView >= 0, - "Updates per view must be non-negative" - ) - - val gatlingReadRatio: Double = readWriteRatio * 100 - val gatlingWriteRatio: Double = (1 - readWriteRatio) * 100 -} +) {} diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala index 942105f..0bb6244 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadTreeDataset.scala @@ -33,7 +33,8 @@ import scala.concurrent.duration.DurationInt /** * This simulation is a 100% read workload that fetches a tree dataset in Polaris. It is intended to * be used against a Polaris instance with a pre-existing tree dataset. It has no side effect on the - * dataset and therefore can be executed multiple times without any issue. + * dataset and therefore can be executed multiple times without any issue. It fetches each entity + * exactly once. */ class ReadTreeDataset extends Simulation { private val logger = LoggerFactory.getLogger(getClass) diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala index 080152c..9626e83 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/simulations/ReadUpdateTreeDataset.scala @@ -37,6 +37,9 @@ import scala.concurrent.duration._ /** * This simulation tests read and update operations on an existing dataset. + * + * The ratio of read operations to write operations is controlled by the readWriteRatio parameter in + * the ReadUpdateTreeDatasetParameters. */ class ReadUpdateTreeDataset extends Simulation { private val logger = LoggerFactory.getLogger(getClass) @@ -91,17 +94,17 @@ class ReadUpdateTreeDataset extends Simulation { private val nsListFeeder = new CircularIterator(nsActions.namespaceIdentityFeeder) private val nsExistsFeeder = new CircularIterator(nsActions.namespaceIdentityFeeder) private val nsFetchFeeder = new CircularIterator(nsActions.namespaceFetchFeeder) - private val nsUpdateFeeder = new CircularIterator(nsActions.namespacePropertiesUpdateFeeder) + private val nsUpdateFeeder = nsActions.namespacePropertiesUpdateFeeder() private val tblListFeeder = new CircularIterator(tblActions.tableIdentityFeeder) private val tblExistsFeeder = new CircularIterator(tblActions.tableIdentityFeeder) private val tblFetchFeeder = new CircularIterator(tblActions.tableFetchFeeder) - private val tblUpdateFeeder = new CircularIterator(tblActions.propertyUpdateFeeder) + private val tblUpdateFeeder = tblActions.propertyUpdateFeeder() private val viewListFeeder = new CircularIterator(viewActions.viewIdentityFeeder) private val viewExistsFeeder = new CircularIterator(viewActions.viewIdentityFeeder) private val viewFetchFeeder = new CircularIterator(viewActions.viewFetchFeeder) - private val viewUpdateFeeder = new CircularIterator(viewActions.propertyUpdateFeeder) + private val viewUpdateFeeder = viewActions.propertyUpdateFeeder() // -------------------------------------------------------------------------------- // Workload: Randomly read and write entities @@ -110,7 +113,7 @@ class ReadUpdateTreeDataset extends Simulation { scenario("Read and write entities using the Iceberg REST API") .exec(authActions.restoreAccessTokenInSession) .randomSwitch( - wp.gatlingReadRatio -> group("Read")( + wp.readUpdateTreeDataset.gatlingReadRatio -> group("Read")( uniformRandomSwitch( exec(feed(nsListFeeder).exec(nsActions.fetchAllChildrenNamespaces)), exec(feed(nsExistsFeeder).exec(nsActions.checkNamespaceExists)), @@ -123,7 +126,7 @@ class ReadUpdateTreeDataset extends Simulation { exec(feed(viewFetchFeeder).exec(viewActions.fetchView)) ) ), - wp.gatlingWriteRatio -> group("Write")( + wp.readUpdateTreeDataset.gatlingWriteRatio -> group("Write")( uniformRandomSwitch( exec(feed(nsUpdateFeeder).exec(nsActions.updateNamespaceProperties)), exec(feed(tblUpdateFeeder).exec(tblActions.updateTable)), diff --git a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala index 0694f50..a766d26 100644 --- a/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala +++ b/benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/util/CircularIterator.scala @@ -19,35 +19,15 @@ package org.apache.polaris.benchmarks.util -import scala.util.Random - class CircularIterator[T](builder: () => Iterator[T]) extends Iterator[T] { private var currentIterator: Iterator[T] = builder() override def hasNext: Boolean = true - override def next(): T = synchronized { + override def next(): T = { if (!currentIterator.hasNext) { currentIterator = builder() } currentIterator.next() } } - -class BufferedRandomIterator[T](underlying: CircularIterator[T], bufferSize: Int, seed: Long) - extends Iterator[T] { - private val random = new Random(seed) - private var buffer: Iterator[T] = populateAndShuffle() - - private def populateAndShuffle(): Iterator[T] = - random.shuffle((1 to bufferSize).map(_ => underlying.next()).toList).iterator - - override def hasNext: Boolean = true - - override def next(): T = synchronized { - if (!buffer.hasNext) { - buffer = populateAndShuffle() - } - buffer.next() - } -}