[
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eddy Truyen updated CASSANDRA-15717:
------------------------------------
Description:
Sorry for the slightly irrelevant post. This is not an issue with Cassandra but
possibly with the interaction between Cassandra and Kubernetes.
We experienced a performance degradation when running a single Cassandra
instance inside kubeadm 1.14 in comparison with running the Docker container
stand-alone.
A write-only workload (YCSB benchmark workload A - Load phase) using the
following user table:
{{ cqlsh> create keyspace ycsb
WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
;
cqlsh> USE ycsb;
cqlsh> create table usertable (
y_id varchar primary key,
field0 varchar,
field1 varchar,
field2 varchar,
field3 varchar,
field4 varchar,
field5 varchar,
field6 varchar,
field7 varchar,
field8 varchar,
field9 varchar);}}
And using the following script:
{{python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p
recordcount=1500000 -p
operationcount=1500000 -p measurementtype=raw -p
cassandra.connecttimeoutmillis=60000 -p
cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost >
results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt
sleep 15}}
We used the following image: {{decomads/cassandra:2.2.16}}, which uses the
official {{cassandra:2.2.16}} as base image and adds a readinessProbe to it.
We used identical Docker configuration parameters by ensuring that the output
of {{docker inspect}} is as much as possible the same. First we got the YCSB
benchmark in a container that is co-located with the cassandra container in one
pod. Kubernetes starts these containers then with network mode
{{net=container:...}} This is a separate container that links up the ycsb and
cassandra containers within the same network space so they can talk via
localhost. By this we hope to avoid network plugin interference from the CNI
plugin.
We ran the docker-only container within the Kubernetes node using the default
bridge network
We first performed the experiment on an Openstack VM Ubuntu 16:04 (4GB, 4 CPU
cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is Ceph
however and therefore distributed
To avoid distributed storage of ceph, we repeated the experiment also on
minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on a Windows 10 laptop with 4
cores/8 logical processors and 16GB RAM. However the same performance
degradation was measured.
Observations (On Ubuntu-OpenStack)
* Docker:
** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
* Kubernetes
** Mean average response latency YCSB benchmark: 2.7 ms-3ms
* CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my
position paper: [https://lirias.kuleuven.be/2788169?limo=0]):
Possible causes:
* Network overhead of virtual bridge in Kubernetes is not the cause of the
problem in our opinion.
** We repeated the experiment where we ran the Docker-Only containers inside a
Kubernetes node and we linked the containers using the --net=container: mode
mechanisms as similar as possible as we could. The YCSB latency stayed the same.
* Disk/io bottleneck: Nodetool tablestats are very similar. Cassandra
containers are configured to write data to a filesystem that is mounted from
the host inside the container. Exactly the same Docker mount type is used
** Write latency is very stable over multiple runs
* Kubernetes for ycsb user table: 0.0167 ms.
* Write latency Docker for ycsb usertable: 0.0150 ms.
** Compaction_history/compaction_in_progress is also very similar (see
attached files)
)
Do you know of any other causes that might explain the difference in reported
YCSB reponse latency? Could it be the the Cassandra Session is closed by
Kubernetes after each request? How can I diagnose this?
was:
This is my first JIRA issue. Sorry if I do something wrong in the reporting.
I experienced a performance degradation when running a single Cassandra
instance inside Kubernetes in comparison with running the Docker container
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses
cassandra:2.2.16 as base image and adds a readinessProbe to it.
I used identical Docker configuration parameters by ensuring that the output of
docker inspect is as much as possible the same. First we got the ycsb
benchmark in a container that is co-located with the cassandra container in one
pod. Kubernetes starts these containers then with network mode
"net=container:... This is a separate container that link up the ycsb and
cassandra containers within the same network space so they can talk via
localhost – by this we hope to avoid network plugin interference from the CNI
plugin.
We ran the docker-only container within the Kubernetes node using the default
bridge network
Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack
VM Ubuntu 16:04 (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with
16 CPU cores. Storage is Ceph.
* A write-only workload (YCSB benchmark workload A - Load phase) using the
following user table:
cqlsh> create keyspace ycsb
WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
;
cqlsh> USE ycsb;
cqlsh> create table usertable (
y_id varchar primary key,
field0 varchar,
field1 varchar,
field2 varchar,
field3 varchar,
field4 varchar,
field5 varchar,
field6 varchar,
field7 varchar,
field8 varchar,
field9 varchar);
* And using the following script: python ./bin/ycsb load cassandra2-cql -P
workloads/workloada -p recordcount=1500000 -p operationcount=1500000 -p
measurementtype=raw -p cassandra.connecttimeoutmillis=60000 -p
cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost >
results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt
sleep 15
Observations (On Ubuntu-OpenStack)
* Docker:
** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
* Kubernetes
** Mean average response latency YCSB benchmark: 2.7 ms-3ms
* CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my
position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:
Possible causes:
* Network overhead of virtual bridge in container orchestrator is not the
cause of the problem in our opinion
** We repeated the experiment where we ran the Docker-Only containers inside a
Kubernetes node and we linked the containers using the --net=container: mode
mechanisms as similar as possible as we could. The YCSB latency stayed the same.
* Disk/io bottleneck: Nodetool tablestats are very similar
** Cassandra containers are configured to write data to a filesystem that is
mounted from the host inside the container. Exactly the same Docker mount type
is used
** Write latency is very stable over multiple runs
*** Kubernetes for ycsb user table: 0.0167 ms.
*** Write latency Docker for ycsb usertable: 0.0150 ms.
** Compaction_history/compaction_in_progress is also very similar (as opposed
to earlier versions of the issue – sorry for the confusion!)
Do you know of any other causes that might explain the difference in reported
YCSB reponse latency?
> Benchmark performance difference between Docker and Kubernetes when running
> Cassandra:2.2.16 official Docker image
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
> Issue Type: Bug
> Components: Test/benchmark
> Reporter: Eddy Truyen
> Priority: Normal
> Attachments: nodetool-compaction-history-docker-cassandra.txt,
> nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> Sorry for the slightly irrelevant post. This is not an issue with Cassandra
> but possibly with the interaction between Cassandra and Kubernetes.
> We experienced a performance degradation when running a single Cassandra
> instance inside kubeadm 1.14 in comparison with running the Docker container
> stand-alone.
> A write-only workload (YCSB benchmark workload A - Load phase) using the
> following user table:
>
> {{ cqlsh> create keyspace ycsb
> WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
> ;
> cqlsh> USE ycsb;
> cqlsh> create table usertable (
> y_id varchar primary key,
> field0 varchar,
> field1 varchar,
> field2 varchar,
> field3 varchar,
> field4 varchar,
> field5 varchar,
> field6 varchar,
> field7 varchar,
> field8 varchar,
> field9 varchar);}}
> And using the following script:
>
> {{python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p
> recordcount=1500000 -p
> operationcount=1500000 -p measurementtype=raw -p
> cassandra.connecttimeoutmillis=60000 -p
> cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost
> >
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt
> sleep 15}}
> We used the following image: {{decomads/cassandra:2.2.16}}, which uses the
> official {{cassandra:2.2.16}} as base image and adds a readinessProbe to it.
> We used identical Docker configuration parameters by ensuring that the output
> of {{docker inspect}} is as much as possible the same. First we got the YCSB
> benchmark in a container that is co-located with the cassandra container in
> one pod. Kubernetes starts these containers then with network mode
> {{net=container:...}} This is a separate container that links up the ycsb and
> cassandra containers within the same network space so they can talk via
> localhost. By this we hope to avoid network plugin interference from the CNI
> plugin.
> We ran the docker-only container within the Kubernetes node using the default
> bridge network
> We first performed the experiment on an Openstack VM Ubuntu 16:04 (4GB, 4 CPU
> cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is
> Ceph however and therefore distributed
> To avoid distributed storage of ceph, we repeated the experiment also on
> minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on a Windows 10 laptop with 4
> cores/8 logical processors and 16GB RAM. However the same performance
> degradation was measured.
> Observations (On Ubuntu-OpenStack)
> * Docker:
> ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
> * Kubernetes
> ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
> * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my
> position paper: [https://lirias.kuleuven.be/2788169?limo=0]):
> Possible causes:
> * Network overhead of virtual bridge in Kubernetes is not the cause of the
> problem in our opinion.
> ** We repeated the experiment where we ran the Docker-Only containers inside
> a Kubernetes node and we linked the containers using the --net=container:
> mode mechanisms as similar as possible as we could. The YCSB latency stayed
> the same.
> * Disk/io bottleneck: Nodetool tablestats are very similar. Cassandra
> containers are configured to write data to a filesystem that is mounted from
> the host inside the container. Exactly the same Docker mount type is used
> ** Write latency is very stable over multiple runs
> * Kubernetes for ycsb user table: 0.0167 ms.
> * Write latency Docker for ycsb usertable: 0.0150 ms.
> ** Compaction_history/compaction_in_progress is also very similar (see
> attached files)
> )
> Do you know of any other causes that might explain the difference in reported
> YCSB reponse latency? Could it be the the Cassandra Session is closed by
> Kubernetes after each request? How can I diagnose this?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]