This is an automated email from the ASF dual-hosted git repository.
wusheng pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/skywalking-banyandb.git
The following commit(s) were added to refs/heads/main by this push:
new c27d5621 Scalability Test and Cluster Management Doc (#511)
c27d5621 is described below
commit c27d5621fabde183688b5095db5cb68b4e6c8835
Author: Gao Hongtao <[email protected]>
AuthorDate: Wed Aug 14 21:39:25 2024 +0800
Scalability Test and Cluster Management Doc (#511)
---
CHANGES.md | 1 +
docs/menu.yml | 5 +-
docs/operation/cluster.md | 57 ++++++++++
pkg/node/round_robin.go | 22 ++--
pkg/node/round_robin_test.go | 65 +++++++++--
test/e2e-v2/cases/cluster/e2e.yaml | 16 +--
test/scale/Makefile | 33 ++++++
test/scale/README.md | 219 +++++++++++++++++++++++++++++++++++++
test/scale/kind.yaml | 23 ++++
test/scale/measure-default.yaml | 26 +++++
test/scale/oap-pod-xl.yaml | 83 ++++++++++++++
test/scale/oap-pod.yaml | 77 +++++++++++++
test/scale/segment.tpl.json | 111 +++++++++++++++++++
13 files changed, 706 insertions(+), 32 deletions(-)
diff --git a/CHANGES.md b/CHANGES.md
index 255c30dd..6d72c58b 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -45,6 +45,7 @@ Release Notes.
- Add quick-start guide.
- Add web-ui interacting guide.
- Add bydbctl interacting guide.
+- Add cluster management guide.
### Chores
diff --git a/docs/menu.yml b/docs/menu.yml
index ca0fc5aa..cfb0b057 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -14,7 +14,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-
catalog:
- name: "Welcome"
path: "/readme"
@@ -113,7 +112,7 @@ catalog:
- name: "Observability"
path: "/observability"
- name: "Cluster Management"
- path: ""
+ path: "/operation/cluster"
- name: "Security"
catalog:
- name: "TLS Configuration"
@@ -131,4 +130,4 @@ catalog:
- name: "Clustering"
path: "/concept/clustering"
- name: "TSDB"
- path: "/concept/tsdb"
\ No newline at end of file
+ path: "/concept/tsdb"
diff --git a/docs/operation/cluster.md b/docs/operation/cluster.md
new file mode 100644
index 00000000..86d63ffc
--- /dev/null
+++ b/docs/operation/cluster.md
@@ -0,0 +1,57 @@
+# Cluster Maintenance
+
+## Introduction
+Properly maintaining and scaling a cluster is crucial for ensuring its
reliable and efficient operation. This document provides guidance on setting up
a cluster, planning its capacity, and scaling it to meet evolving requirements.
+
+## Cluster Setup
+Before deploying or maintaining a cluster, it is recommended to familiarize
oneself with the basic clustering concepts by reviewing the [clustering
documentation](../concept/clustering.md).
+
+To set up a cluster, one can refer to the [cluster installation
guide](../installation/cluster.md), which describes the process in detail. A
minimal cluster should consist of the following nodes:
+
+- 3 etcd nodes
+- 2 liaison nodes
+- 2 data nodes
+
+This configuration is recommended for high availability, ensuring that the
cluster can continue operating even if a single node becomes temporarily
unavailable, as the remaining nodes can handle the increased workload.
+
+It is generally preferable to deploy multiple smaller data nodes rather than a
few larger ones, as this approach reduces the workload increase on the
remaining data nodes when some nodes become temporarily unavailable.
+
+To balance the write and query traffic to the liaison nodes, the use of an
gRPC load balancer is recommended. The gRPC port defaults to `17912`, but the
gRPC host and port can be altered using the `grpc-host` and `grpc-port`
configuration options.
+
+For those seeking to set up a cluster in a Kubernetes environment, a
[dedicated guide](../installation/kubernetes.md) is available to assist with
the process.
+
+## Capacity Planning
+Each node role can be provisioned with the most suitable hardware resources.
The cluster's capacity scales linearly with the available resources. The
required amounts of CPU and RAM per node role depend highly on the workload,
such as the number of time series, query types, and write/query QPS. It is
recommended to set up a test cluster mirroring the production workload and
iteratively scale the per-node resources and the number of nodes per role until
the cluster becomes stable. Additio [...]
+
+The necessary storage space can be estimated based on the disk space usage
observed during a test run. For example, if the storage space usage is 10GB
after a day-long test run on a production workload, then the cluster should
have at least 10GB*7=70GB of disk space for a group with `ttl=7day`.
+
+To ensure the cluster's resilience and responsiveness, it is recommended to
maintain the following spare resource levels:
+
+- 50% of free RAM across all the nodes to reduce the probability of OOM (out
of memory) crashes and slowdowns during temporary spikes in workload.
+- 50% of spare CPU across all the nodes to reduce the probability of slowdowns
during temporary spikes in workload.
+- At least 20% of free storage space at the directories pointed by
`measure-root-path` and `stream-root-path`.
+
+## Scalability
+The cluster's performance and capacity can be scaled in two ways: vertical
scalability and horizontal scalability.
+
+### Vertical Scalability
+Vertical scalability refers to adding more resources (CPU, RAM, disk I/O, disk
space, network bandwidth) to existing nodes in the cluster.
+
+Increasing the CPU and RAM of existing liaison nodes can improve the
performance for heavy queries that process a large number of time series with
many data points.
+
+Increasing the CPU and RAM of existing data nodes can increase the number of
time series the cluster can handle. However, it is generally preferred to add
more data nodes rather than increasing the resources of existing data nodes, as
a higher number of data nodes increases cluster stability and improves query
performance over time series.
+
+Increasing the disk I/O and disk space of existing etcd nodes can improve the
performance for heavy metadata queries that process a large number of metadata
entries.
+
+### Horizontal Scalability
+Horizontal scalability refers to adding more nodes to the cluster.
+
+Increasing the number of liaison nodes can increase the maximum possible data
ingestion speed, as the ingested data can be split among a larger number of
liaison nodes. It can also increase the maximum possible query rate, as the
incoming concurrent requests can be split among a larger number of liaison
nodes.
+
+Increasing the number of data nodes can increase the number of time series the
cluster can handle. This can also improve query performance, as each data node
contains a lower number of time series when the number of data nodes increases.
+
+The new added data nodes can be automatically discovered by the existing
liaison nodes. It is recommended to add data nodes one by one to avoid
overloading the liaison nodes with the new data nodes' metadata.
+
+The cluster's availability is also improved by increasing the number of data
nodes, as active data nodes need to handle a lower additional workload when
some data nodes become unavailable. For example, if one node out of 2 nodes is
unavailable, then 50% of the load is re-distributed across the remaining node,
resulting in a 100% per-node workload increase. If one node out of 10 nodes is
unavailable, then 10% of the load is re-distributed across the 9 remaining
nodes, resulting in only an [...]
+
+Increasing the number of etcd nodes can increase the cluster's metadata
capacity and improve the cluster's metadata query performance. It can also
improve the cluster's metadata availability, as the metadata is replicated
across all the etcd nodes. However, the cluster size should be odd to avoid
split-brain situations.
\ No newline at end of file
diff --git a/pkg/node/round_robin.go b/pkg/node/round_robin.go
index d0bb37e1..94fcc130 100644
--- a/pkg/node/round_robin.go
+++ b/pkg/node/round_robin.go
@@ -97,6 +97,7 @@ func (r *roundRobinSelector) OnAddOrUpdate(schemaMetadata
schema.Metadata) {
}
r.mu.Lock()
defer r.mu.Unlock()
+ r.removeGroup(group.Metadata.Name)
for i := uint32(0); i < group.ResourceOpts.ShardNum; i++ {
k := key{group: group.Metadata.Name, shardID: i}
r.lookupTable = append(r.lookupTable, k)
@@ -104,6 +105,17 @@ func (r *roundRobinSelector) OnAddOrUpdate(schemaMetadata
schema.Metadata) {
r.sortEntries()
}
+func (r *roundRobinSelector) removeGroup(group string) {
+ for i := 0; i < len(r.lookupTable); {
+ if r.lookupTable[i].group == group {
+ copy(r.lookupTable[i:], r.lookupTable[i+1:])
+ r.lookupTable = r.lookupTable[:len(r.lookupTable)-1]
+ } else {
+ i++
+ }
+ }
+}
+
func (r *roundRobinSelector) OnDelete(schemaMetadata schema.Metadata) {
if schemaMetadata.Kind != schema.KindGroup {
return
@@ -111,15 +123,7 @@ func (r *roundRobinSelector) OnDelete(schemaMetadata
schema.Metadata) {
r.mu.Lock()
defer r.mu.Unlock()
group := schemaMetadata.Spec.(*commonv1.Group)
- for i := uint32(0); i < group.ResourceOpts.ShardNum; i++ {
- k := key{group: group.Metadata.Name, shardID: i}
- for j := range r.lookupTable {
- if r.lookupTable[j] == k {
- r.lookupTable = append(r.lookupTable[:j],
r.lookupTable[j+1:]...)
- break
- }
- }
- }
+ r.removeGroup(group.Metadata.Name)
}
func (r *roundRobinSelector) OnInit(kinds []schema.Kind) (bool, []int64) {
diff --git a/pkg/node/round_robin_test.go b/pkg/node/round_robin_test.go
index 223e1908..3b42cf84 100644
--- a/pkg/node/round_robin_test.go
+++ b/pkg/node/round_robin_test.go
@@ -133,20 +133,61 @@ func TestStringer(t *testing.T) {
assert.NotEmpty(t, selector.String())
}
-var groupSchema = schema.Metadata{
- TypeMeta: schema.TypeMeta{
- Kind: schema.KindGroup,
- },
- Spec: &commonv1.Group{
- Metadata: &commonv1.Metadata{
- Name: "group1",
+func TestChangeShard(t *testing.T) {
+ s := NewRoundRobinSelector(nil)
+ selector := s.(*roundRobinSelector)
+ setupGroup(selector)
+ selector.AddNode(&databasev1.Node{Metadata: &commonv1.Metadata{Name:
"node1"}})
+ selector.AddNode(&databasev1.Node{Metadata: &commonv1.Metadata{Name:
"node2"}})
+ _, err := selector.Pick("group1", "", 0)
+ assert.NoError(t, err)
+ _, err = selector.Pick("group1", "", 1)
+ assert.NoError(t, err)
+ // Reduce shard number to 1
+ selector.OnAddOrUpdate(groupSchema1)
+ _, err = selector.Pick("group1", "", 0)
+ assert.NoError(t, err)
+ _, err = selector.Pick("group1", "", 1)
+ assert.Error(t, err)
+ // Restore shard number to 2
+ setupGroup(selector)
+ node1, err := selector.Pick("group1", "", 0)
+ assert.NoError(t, err)
+ node2, err := selector.Pick("group1", "", 1)
+ assert.NoError(t, err)
+ assert.NotEqual(t, node1, node2)
+}
+
+var (
+ groupSchema = schema.Metadata{
+ TypeMeta: schema.TypeMeta{
+ Kind: schema.KindGroup,
},
- Catalog: commonv1.Catalog_CATALOG_MEASURE,
- ResourceOpts: &commonv1.ResourceOpts{
- ShardNum: 2,
+ Spec: &commonv1.Group{
+ Metadata: &commonv1.Metadata{
+ Name: "group1",
+ },
+ Catalog: commonv1.Catalog_CATALOG_MEASURE,
+ ResourceOpts: &commonv1.ResourceOpts{
+ ShardNum: 2,
+ },
},
- },
-}
+ }
+ groupSchema1 = schema.Metadata{
+ TypeMeta: schema.TypeMeta{
+ Kind: schema.KindGroup,
+ },
+ Spec: &commonv1.Group{
+ Metadata: &commonv1.Metadata{
+ Name: "group1",
+ },
+ Catalog: commonv1.Catalog_CATALOG_MEASURE,
+ ResourceOpts: &commonv1.ResourceOpts{
+ ShardNum: 1,
+ },
+ },
+ }
+)
func setupGroup(selector Selector) {
selector.(*roundRobinSelector).OnAddOrUpdate(groupSchema)
diff --git a/test/e2e-v2/cases/cluster/e2e.yaml
b/test/e2e-v2/cases/cluster/e2e.yaml
index 9f321cc6..cb05907d 100644
--- a/test/e2e-v2/cases/cluster/e2e.yaml
+++ b/test/e2e-v2/cases/cluster/e2e.yaml
@@ -21,12 +21,12 @@ setup:
timeout: 20m
init-system-environment: ../../script/env
steps:
- - name: set PATH
- command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH
- - name: install yq
- command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq
- - name: install swctl
- command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl
+ - name: set PATH
+ command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH
+ - name: install yq
+ command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq
+ - name: install swctl
+ command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl
trigger:
action: http
@@ -46,5 +46,5 @@ verify:
# the interval between two retries, in millisecond.
interval: 10s
cases:
- - includes:
- - storage-cases.yaml
\ No newline at end of file
+ - includes:
+ - storage-cases.yaml
diff --git a/test/scale/Makefile b/test/scale/Makefile
new file mode 100644
index 00000000..f69bd63f
--- /dev/null
+++ b/test/scale/Makefile
@@ -0,0 +1,33 @@
+# Licensed to Apache Software Foundation (ASF) under one or more contributor
+# license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright
+# ownership. Apache Software Foundation (ASF) licenses this file to you under
+# the Apache License, Version 2.0 (the "License"); you may
+# not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+QPS ?= 10
+
+GROUP ?= "default"
+
+.PHONY: up_traffic
+up_traffic:
+ curl -XPOST
'http://localhost:12800/mock-data/segments/tasks?qps=$(QPS)&group=$(GROUP)'
-H'Content-Type: application/json' -d "@segment.tpl.json"
+
+.PHONY: ls_traffic
+ls_traffic:
+ curl -XGET 'http://localhost:12800/mock-data/segments/tasks'
+
+.PHONY: rm_traffic
+rm_traffic:
+ curl -XDELETE 'http://localhost:12800/mock-data/segments/tasks'
\ No newline at end of file
diff --git a/test/scale/README.md b/test/scale/README.md
new file mode 100644
index 00000000..e4d6d5d6
--- /dev/null
+++ b/test/scale/README.md
@@ -0,0 +1,219 @@
+# Scale Test
+
+## Provisioning the KinD cluster
+
+```bash
+kind create cluster --config kind.yaml
+```
+
+## Build BanyanDB and Load Image into KinD
+
+```bash
+make docker.build
+kind load docker-image apache/skywalking-banyandb:latest
+```
+
+## Deploy BanyanDB
+
+```bash
+helm registry login registry-1.docker.io
+
+helm install "scale-test" \
+ oci://ghcr.io/apache/skywalking-banyandb-helm/skywalking-banyandb-helm \
+ --version "0.0.0-973f59b" \
+ -n "default" \
+ --set image.repository=apache/skywalking-banyandb \
+ --set image.tag=latest \
+ --set standalone.enabled=false \
+ --set cluster.enabled=true \
+ --set cluster.data.replicas=1 \
+ --set etcd.enabled=true
+```
+
+## Deploy Data Generator
+
+```bash
+kubectl apply -f oap-pod.yaml
+```
+
+## Trigger Data Generation
+
+```bash
+make up_traffic
+```
+
+## Verify Route Table and Files on Disk
+
+Liaison nodes contain the same route table:
+
+```json
+{
+ "measure-default-0": "10.244.0.12:17912",
+ "measure-minute-0": "10.244.0.12:17912",
+ "measure-minute-1": "10.244.0.12:17912",
+ "stream-browser_error_log-0": "10.244.0.12:17912",
+ "stream-browser_error_log-1": "10.244.0.12:17912",
+ "stream-default-0": "10.244.0.12:17912",
+ "stream-log-0": "10.244.0.12:17912",
+ "stream-log-1": "10.244.0.12:17912",
+ "stream-segment-0": "10.244.0.12:17912",
+ "stream-segment-1": "10.244.0.12:17912",
+ "stream-zipkin_span-0": "10.244.0.12:17912",
+ "stream-zipkin_span-1": "10.244.0.12:17912"
+}
+```
+
+All shards are stored on the same node:
+
+## Case 1: Scale Out Data Nodes
+
+Set statefulset replicas to 2:
+
+```bash
+kubectl scale statefulset banyandb --replicas=2
+```
+
+Verify that the new node is added to the route table:
+
+```json
+{
+ "measure-default-0": "10.244.0.12:17912",
+ "measure-minute-0": "10.244.0.19:17912",
+ "measure-minute-1": "10.244.0.12:17912",
+ "stream-browser_error_log-0": "10.244.0.19:17912",
+ "stream-browser_error_log-1": "10.244.0.12:17912",
+ "stream-default-0": "10.244.0.19:17912",
+ "stream-log-0": "10.244.0.12:17912",
+ "stream-log-1": "10.244.0.19:17912",
+ "stream-segment-0": "10.244.0.12:17912",
+ "stream-segment-1": "10.244.0.19:17912",
+ "stream-zipkin_span-0": "10.244.0.12:17912",
+ "stream-zipkin_span-1": "10.244.0.19:17912"
+}
+```
+
+Shards are distributed across the two nodes.
+
+> Note: TopNAggregation result measure is distributed independently of its
source measure. So you may see
+> measure-minute-0 and measure-minute-1 on the same node. See
https://github.com/apache/skywalking/issues/12526.
+
+## Case 2: Increase Shard Count
+
+Set the number of shards to 2:
+
+```bash
+bydbctl group update -f measure-default.yaml
+```
+
+Verify that the new shard is added to the route table:
+
+```json
+{
+ "measure-default-0": "10.244.0.12:17912",
+ "measure-default-1": "10.244.0.19:17912",
+ "measure-minute-0": "10.244.0.12:17912",
+ "measure-minute-1": "10.244.0.19:17912",
+ "stream-browser_error_log-0": "10.244.0.12:17912",
+ "stream-browser_error_log-1": "10.244.0.19:17912",
+ "stream-default-0": "10.244.0.12:17912",
+ "stream-log-0": "10.244.0.19:17912",
+ "stream-log-1": "10.244.0.12:17912",
+ "stream-segment-0": "10.244.0.19:17912",
+ "stream-segment-1": "10.244.0.12:17912",
+ "stream-zipkin_span-0": "10.244.0.19:17912",
+ "stream-zipkin_span-1": "10.244.0.12:17912"
+}
+```
+
+`measure-default` are distributed across the two shards.
+
+## Case 3: Scale Out to A Extreme Large Cluster
+
+Set statefulset replicas to 10:
+
+```bash
+kubectl scale statefulset banyandb --replicas=10
+```
+
+Deploy a new data generator:
+
+```bash
+kubectl apply -f oap-pod-xl.yaml
+```
+
+Verify that the new node is added to the route table:
+
+```json
+{
+ "measure-default-0": "10.244.0.51:17912",
+ "measure-default-1": "10.244.0.54:17912",
+ "measure-default-2": "10.244.0.55:17912",
+ "measure-default-3": "10.244.0.56:17912",
+ "measure-default-4": "10.244.0.57:17912",
+ "measure-default-5": "10.244.0.58:17912",
+ "measure-default-6": "10.244.0.60:17912",
+ "measure-default-7": "10.244.0.61:17912",
+ "measure-default-8": "10.244.0.62:17912",
+ "measure-default-9": "10.244.0.63:17912",
+ "measure-minute-0": "10.244.0.51:17912",
+ "measure-minute-1": "10.244.0.54:17912",
+ "measure-minute-2": "10.244.0.55:17912",
+ "measure-minute-3": "10.244.0.56:17912",
+ "measure-minute-4": "10.244.0.57:17912",
+ "measure-minute-5": "10.244.0.58:17912",
+ "measure-minute-6": "10.244.0.60:17912",
+ "measure-minute-7": "10.244.0.61:17912",
+ "measure-minute-8": "10.244.0.62:17912",
+ "measure-minute-9": "10.244.0.63:17912",
+ "stream-browser_error_log-0": "10.244.0.51:17912",
+ "stream-browser_error_log-1": "10.244.0.54:17912",
+ "stream-browser_error_log-2": "10.244.0.55:17912",
+ "stream-browser_error_log-3": "10.244.0.56:17912",
+ "stream-browser_error_log-4": "10.244.0.57:17912",
+ "stream-browser_error_log-5": "10.244.0.58:17912",
+ "stream-browser_error_log-6": "10.244.0.60:17912",
+ "stream-browser_error_log-7": "10.244.0.61:17912",
+ "stream-browser_error_log-8": "10.244.0.62:17912",
+ "stream-browser_error_log-9": "10.244.0.63:17912",
+ "stream-default-0": "10.244.0.51:17912",
+ "stream-default-1": "10.244.0.54:17912",
+ "stream-default-2": "10.244.0.55:17912",
+ "stream-default-3": "10.244.0.56:17912",
+ "stream-default-4": "10.244.0.57:17912",
+ "stream-default-5": "10.244.0.58:17912",
+ "stream-default-6": "10.244.0.60:17912",
+ "stream-default-7": "10.244.0.61:17912",
+ "stream-default-8": "10.244.0.62:17912",
+ "stream-default-9": "10.244.0.63:17912",
+ "stream-log-0": "10.244.0.51:17912",
+ "stream-log-1": "10.244.0.54:17912",
+ "stream-log-2": "10.244.0.55:17912",
+ "stream-log-3": "10.244.0.56:17912",
+ "stream-log-4": "10.244.0.57:17912",
+ "stream-log-5": "10.244.0.58:17912",
+ "stream-log-6": "10.244.0.60:17912",
+ "stream-log-7": "10.244.0.61:17912",
+ "stream-log-8": "10.244.0.62:17912",
+ "stream-log-9": "10.244.0.63:17912",
+ "stream-segment-0": "10.244.0.51:17912",
+ "stream-segment-1": "10.244.0.54:17912",
+ "stream-segment-2": "10.244.0.55:17912",
+ "stream-segment-3": "10.244.0.56:17912",
+ "stream-segment-4": "10.244.0.57:17912",
+ "stream-segment-5": "10.244.0.58:17912",
+ "stream-segment-6": "10.244.0.60:17912",
+ "stream-segment-7": "10.244.0.61:17912",
+ "stream-segment-8": "10.244.0.62:17912",
+ "stream-segment-9": "10.244.0.63:17912",
+ "stream-zipkin_span-0": "10.244.0.51:17912",
+ "stream-zipkin_span-1": "10.244.0.54:17912",
+ "stream-zipkin_span-2": "10.244.0.55:17912",
+ "stream-zipkin_span-3": "10.244.0.56:17912",
+ "stream-zipkin_span-4": "10.244.0.57:17912",
+ "stream-zipkin_span-5": "10.244.0.58:17912",
+ "stream-zipkin_span-6": "10.244.0.60:17912",
+ "stream-zipkin_span-7": "10.244.0.61:17912",
+ "stream-zipkin_span-8": "10.244.0.62:17912",
+ "stream-zipkin_span-9": "10.244.0.63:17912"
+}
+```
diff --git a/test/scale/kind.yaml b/test/scale/kind.yaml
new file mode 100644
index 00000000..cc8090f4
--- /dev/null
+++ b/test/scale/kind.yaml
@@ -0,0 +1,23 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+kind: Cluster
+apiVersion: kind.x-k8s.io/v1alpha4
+nodes:
+- role: control-plane
+ extraPortMappings:
+ - containerPort: 12800
+ hostPort: 12800
+ protocol: TCP
diff --git a/test/scale/measure-default.yaml b/test/scale/measure-default.yaml
new file mode 100644
index 00000000..e4efcf7f
--- /dev/null
+++ b/test/scale/measure-default.yaml
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+catalog: CATALOG_MEASURE
+metadata:
+ name: measure-default
+resourceOpts:
+ segmentInterval:
+ num: 8
+ unit: UNIT_DAY
+ shardNum: 2
+ ttl:
+ num: 7
+ unit: UNIT_DAY
diff --git a/test/scale/oap-pod-xl.yaml b/test/scale/oap-pod-xl.yaml
new file mode 100644
index 00000000..17fb23dc
--- /dev/null
+++ b/test/scale/oap-pod-xl.yaml
@@ -0,0 +1,83 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1
+kind: Pod
+metadata:
+ labels:
+ component: oap
+ name: data-generator
+ namespace: default
+spec:
+ containers:
+ - env:
+ - name: JAVA_OPTS
+ value: -Xmx2g -Xms2g
+ - name: SW_STORAGE
+ value: banyandb
+ - name: SW_STORAGE_BANYANDB_TARGETS
+ value: banyandb-grpc:17912
+ - name: SW_STORAGE_BANYANDB_METRICS_SHARDS_NUMBER
+ value: "10"
+ - name: SW_STORAGE_BANYANDB_RECORD_SHARDS_NUMBER
+ value: "10"
+ - name: SW_STORAGE_BANYANDB_SUPERDATASET_SHARDS_FACTOR
+ value: "1"
+ image:
ghcr.io/apache/skywalking/data-generator:9b17ff1efeab7a20c870839f59eb0e6af485cd3f
+ imagePullPolicy: IfNotPresent
+ livenessProbe:
+ failureThreshold: 3
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ name: oap
+ ports:
+ - containerPort: 11800
+ name: grpc
+ protocol: TCP
+ - containerPort: 12800
+ name: rest
+ protocol: TCP
+ hostPort: 12800
+ readinessProbe:
+ failureThreshold: 3
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ resources: {}
+ startupProbe:
+ failureThreshold: 9
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ dnsPolicy: ClusterFirst
+ enableServiceLinks: true
+ initContainers:
+ - command:
+ - sh
+ - -c
+ - for i in $(seq 1 60); do curl banyandb-http:17913/api/healthz && exit 0
|| sleep 5; done; exit 1
+ image: curlimages/curl
+ imagePullPolicy: IfNotPresent
+ name: wait-for-banyandb
+ resources: {}
diff --git a/test/scale/oap-pod.yaml b/test/scale/oap-pod.yaml
new file mode 100644
index 00000000..3159ae59
--- /dev/null
+++ b/test/scale/oap-pod.yaml
@@ -0,0 +1,77 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v1
+kind: Pod
+metadata:
+ labels:
+ component: oap
+ name: data-generator
+ namespace: default
+spec:
+ containers:
+ - env:
+ - name: JAVA_OPTS
+ value: -Xmx2g -Xms2g
+ - name: SW_STORAGE
+ value: banyandb
+ - name: SW_STORAGE_BANYANDB_TARGETS
+ value: banyandb-grpc:17912
+ image:
ghcr.io/apache/skywalking/data-generator:9b17ff1efeab7a20c870839f59eb0e6af485cd3f
+ imagePullPolicy: IfNotPresent
+ livenessProbe:
+ failureThreshold: 3
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ name: oap
+ ports:
+ - containerPort: 11800
+ name: grpc
+ protocol: TCP
+ - containerPort: 12800
+ name: rest
+ protocol: TCP
+ hostPort: 12800
+ readinessProbe:
+ failureThreshold: 3
+ initialDelaySeconds: 5
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ resources: {}
+ startupProbe:
+ failureThreshold: 9
+ periodSeconds: 10
+ successThreshold: 1
+ tcpSocket:
+ port: 12800
+ timeoutSeconds: 1
+ dnsPolicy: ClusterFirst
+ enableServiceLinks: true
+ initContainers:
+ - command:
+ - sh
+ - -c
+ - for i in $(seq 1 60); do curl banyandb-http:17913/api/healthz && exit 0
|| sleep 5; done; exit 1
+ image: curlimages/curl
+ imagePullPolicy: IfNotPresent
+ name: wait-for-banyandb
+ resources: {}
diff --git a/test/scale/segment.tpl.json b/test/scale/segment.tpl.json
new file mode 100644
index 00000000..2f4fe81d
--- /dev/null
+++ b/test/scale/segment.tpl.json
@@ -0,0 +1,111 @@
+{
+ "traceId": {
+ "type": "uuid",
+ "changingFrequency": "1"
+ },
+ "serviceInstanceName": {
+ "type": "randomString",
+ "length": "10",
+ "letters": true,
+ "numbers": true,
+ "domainSize": 10
+ },
+ "serviceName": {
+ "type": "fixedString",
+ "value": "service_"
+ },
+ "segments": {
+ "type": "randomList",
+ "size": 5,
+ "item": {
+ "endpointName": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_",
+ "letters": true,
+ "numbers": true,
+ "domainSize": 10
+ },
+ "error": {
+ "type": "randomInt",
+ "min": 1,
+ "max": 1
+ },
+ "now": {
+ "type": "time",
+ "stepMillisecond": 1000,
+ "waitMillisecond": 1000
+ },
+ "tags": {
+ "type": "randomList",
+ "size": 5,
+ "item": {
+ "key": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_tag_",
+ "letters": true,
+ "numbers": true,
+ "domainSize": 5
+ },
+ "value": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_value_",
+ "letters": true,
+ "numbers": true,
+ "domainSize": 10
+ }
+ }
+ },
+ "spans": {
+ "type": "randomList",
+ "size": 5,
+ "item": {
+ "latency": {
+ "type": "randomInt",
+ "min": 100,
+ "max": 1000
+ },
+ "operationName": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_endpoint_",
+ "letters": true,
+ "numbers": true
+ },
+ "componentId": {
+ "type": "randomInt",
+ "min": "0",
+ "max": "4"
+ },
+ "error": {
+ "type": "randomBool",
+ "possibility": "0.2"
+ },
+ "tags": {
+ "type": "randomList",
+ "size": 5,
+ "item": {
+ "key": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_tag_key_",
+ "letters": true,
+ "numbers": true,
+ "domainSize": 10
+ },
+ "value": {
+ "type": "randomString",
+ "length": "10",
+ "prefix": "test_tag_val_",
+ "letters": true,
+ "numbers": true
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+}
\ No newline at end of file