ChenYi015 commented on PR #2499:
URL: https://github.com/apache/celeborn/pull/2499#issuecomment-2104238860
I had did some e2e tests as follows:
1. Create a kind cluster with the following config:
```yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker
- role: worker
- role: worker
```
```shell
$ kind create cluster --name celeborn --config kind-config.yaml
Creating cluster "celeborn" ...
â Ensuring node image (kindest/node:v1.29.2) đŧ
â Preparing nodes đĻ đĻ đĻ đĻ đĻ đĻ
â Writing configuration đ
â Starting control-plane đšī¸
â Installing CNI đ
â Installing StorageClass đž
â Joining worker nodes đ
Set kubectl context to "kind-celeborn"
You can now use your cluster with:
kubectl cluster-info --context kind-celeborn
Thanks for using kind! đ
```
2. Build docker image locally and load into kind cluster
```shell
docker build -f docker/Dockerfile -t
docker.io/apache/celeborn:v0.4.0-incubating .
kind load docker-image --name celeborn
docker.io/apache/celeborn:v0.4.0-incubating
```
3. Install the helm chart
```shell
$ helm install celeborn charts/celeborn \
--namespace celeborn \
--create-namespace \
--set image.repository=apache/celeborn \
--set image.tag=v0.4.0-incubating \
--set image.pullPolicy=Never \
--debug
install.go:200: [debug] Original chart version: ""
install.go:217: [debug] CHART PATH:
/Users/chenyi/Code/github.com/apache/celeborn/charts/celeborn
client.go:134: [debug] creating 1 resource(s)
client.go:134: [debug] creating 5 resource(s)
NAME: celeborn
LAST DEPLOYED: Fri May 10 16:58:44 2024
NAMESPACE: celeborn
STATUS: deployed
REVISION: 1
TEST SUITE: None
USER-SUPPLIED VALUES:
image:
pullPolicy: Never
repository: apache/celeborn
tag: v0.4.0-incubating
COMPUTED VALUES:
affinity:
master:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- celeborn
- key: app.kubernetes.io/role
operator: In
values:
- master
topologyKey: kubernetes.io/hostname
worker:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- celeborn
- key: app.kubernetes.io/role
operator: In
values:
- worker
topologyKey: kubernetes.io/hostname
celeborn:
celeborn.application.heartbeat.timeout: 120s
celeborn.master.ha.enabled: true
celeborn.master.http.port: 9098
celeborn.metrics.enabled: true
celeborn.push.stageEnd.timeout: 120s
celeborn.rpc.dispatcher.numThreads: 4
celeborn.rpc.io.clientThreads: 64
celeborn.rpc.io.numConnectionsPerPeer: 2
celeborn.rpc.io.serverThreads: 64
celeborn.shuffle.chunk.size: 8m
celeborn.worker.fetch.io.threads: 32
celeborn.worker.flusher.buffer.size: 256K
celeborn.worker.heartbeat.timeout: 120s
celeborn.worker.http.port: 9096
celeborn.worker.monitor.disk.enabled: false
celeborn.worker.push.io.threads: 32
cluster:
name: cluster
dnsPolicy: ClusterFirst
environments:
CELEBORN_MASTER_JAVA_OPTS: -XX:-PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-master.out
-Dio.netty.leakDetectionLevel=advanced
CELEBORN_MASTER_MEMORY: 2g
CELEBORN_NO_DAEMONIZE: 1
CELEBORN_WORKER_JAVA_OPTS: -XX:-PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-worker.out
-Dio.netty.leakDetectionLevel=advanced
CELEBORN_WORKER_MEMORY: 2g
CELEBORN_WORKER_OFFHEAP_MEMORY: 12g
TZ: Asia/Shanghai
fullnameOverride: ""
hostNetwork: false
image:
pullPolicy: Never
pullSecrets: []
repository: apache/celeborn
tag: v0.4.0-incubating
nameOverride: ""
nodeSelector: {}
podAnnotations: {}
podMonitor:
enable: true
podMetricsEndpoint:
interval: 5s
portName: metrics
scheme: http
priorityClass:
master:
create: false
name: ""
value: 1000000000
worker:
create: false
name: ""
value: 999999000
replicaCount:
master: 3
worker: 5
resources:
master: {}
worker: {}
securityContext:
fsGroup: 10006
runAsGroup: 10006
runAsUser: 10006
service:
port: 9097
type: ClusterIP
tolerations: []
volumes:
master:
- capacity: 100Gi
hostPath: /mnt/celeborn_ratis
mountPath: /mnt/celeborn_ratis
type: hostPath
worker:
- capacity: 100Gi
diskType: SSD
hostPath: /mnt/disk1
mountPath: /mnt/disk1
type: hostPath
- capacity: 100Gi
diskType: SSD
hostPath: /mnt/disk2
mountPath: /mnt/disk2
type: hostPath
- capacity: 100Gi
diskType: SSD
hostPath: /mnt/disk3
mountPath: /mnt/disk3
type: hostPath
- capacity: 100Gi
diskType: SSD
hostPath: /mnt/disk4
mountPath: /mnt/disk4
type: hostPath
HOOKS:
MANIFEST:
---
# Source: celeborn/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: celeborn-conf
labels:
helm.sh/chart: celeborn-0.1.0
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/version: "0.4.0"
app.kubernetes.io/managed-by: Helm
data:
celeborn-defaults.conf: |-
celeborn.master.endpoints=celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-1.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-2.celeborn-master-svc.celeborn.svc.cluster.local,
celeborn.master.ha.node.0.host=celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local
celeborn.master.ha.node.1.host=celeborn-master-1.celeborn-master-svc.celeborn.svc.cluster.local
celeborn.master.ha.node.2.host=celeborn-master-2.celeborn-master-svc.celeborn.svc.cluster.local
celeborn.master.ha.ratis.raft.server.storage.dir=/mnt/celeborn_ratis
celeborn.worker.storage.dirs=/mnt/disk1:disktype=SSD:capacity=100Gi,/mnt/disk2:disktype=SSD:capacity=100Gi,/mnt/disk3:disktype=SSD:capacity=100Gi,/mnt/disk4:disktype=SSD:capacity=100Gi
celeborn.application.heartbeat.timeout=120s
celeborn.master.ha.enabled=true
celeborn.master.http.port=9098
celeborn.metrics.enabled=true
celeborn.push.stageEnd.timeout=120s
celeborn.rpc.dispatcher.numThreads=4
celeborn.rpc.io.clientThreads=64
celeborn.rpc.io.numConnectionsPerPeer=2
celeborn.rpc.io.serverThreads=64
celeborn.shuffle.chunk.size=8m
celeborn.worker.fetch.io.threads=32
celeborn.worker.flusher.buffer.size=256K
celeborn.worker.heartbeat.timeout=120s
celeborn.worker.http.port=9096
celeborn.worker.monitor.disk.enabled=false
celeborn.worker.push.io.threads=32
celeborn-env.sh: |
CELEBORN_MASTER_JAVA_OPTS="-XX:-PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-master.out
-Dio.netty.leakDetectionLevel=advanced"
CELEBORN_MASTER_MEMORY="2g"
CELEBORN_NO_DAEMONIZE="1"
CELEBORN_WORKER_JAVA_OPTS="-XX:-PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-worker.out
-Dio.netty.leakDetectionLevel=advanced"
CELEBORN_WORKER_MEMORY="2g"
CELEBORN_WORKER_OFFHEAP_MEMORY="12g"
TZ="Asia/Shanghai"
log4j2.xml: |-
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<!--
~ Extra logging related to initialization of Log4j.
~ Set to debug or trace if log4j initialization is failing.
-->
<Configuration status="INFO">
<Appenders>
<Console name="stdout" target="SYSTEM_OUT">
<!--
~ In the pattern layout configuration below, we specify an
explicit `%ex` conversion
~ pattern for logging Throwables. If this was omitted,
then (by default) Log4J would
~ implicitly add an `%xEx` conversion pattern which logs
stacktraces with additional
~ class packaging information. That extra information can
sometimes add a substantial
~ performance overhead, so we disable it in our default
logging config.
-->
<PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p [%t]
%c{1}: %m%n%ex"/>
</Console>
<RollingRandomAccessFile name="file"
fileName="${env:CELEBORN_LOG_DIR}/celeborn.log"
filePattern="${env:CELEBORN_LOG_DIR}/celeborn.log.%d-%i">
<PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p [%t]
%c{1}: %m%n%ex"/>
<Policies>
<SizeBasedTriggeringPolicy size="200 MB"/>
</Policies>
<DefaultRolloverStrategy max="7">
<Delete basePath="${env:CELEBORN_LOG_DIR}" maxDepth="1">
<IfFileName glob="celeborn.log*">
<IfAny>
<IfAccumulatedFileSize exceeds="1 GB" />
<IfAccumulatedFileCount exceeds="10" />
</IfAny>
</IfFileName>
</Delete>
</DefaultRolloverStrategy>
</RollingRandomAccessFile>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="stdout"/>
<AppenderRef ref="file"/>
</Root>
<Logger name="org.apache.hadoop.hdfs" level="WARN"
additivity="false">
<Appender-ref ref="stdout" level="WARN" />
<Appender-ref ref="file" level="WARN"/>
</Logger>
</Loggers>
</Configuration>
metrics.properties: >-
*.sink.prometheusServlet.class=org.apache.celeborn.common.metrics.sink.PrometheusServlet
---
# Source: celeborn/templates/master/service.yaml
apiVersion: v1
kind: Service
metadata:
name: celeborn-master-svc
labels:
helm.sh/chart: celeborn-0.1.0
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/version: "0.4.0"
app.kubernetes.io/managed-by: Helm
spec:
selector:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: master
ports:
- port: 9097
targetPort: 9097
protocol: TCP
name: celeborn-master
type: ClusterIP
clusterIP: None
---
# Source: celeborn/templates/worker/service.yaml
apiVersion: v1
kind: Service
metadata:
name: celeborn-worker-svc
labels:
helm.sh/chart: celeborn-0.1.0
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/version: "0.4.0"
app.kubernetes.io/managed-by: Helm
spec:
selector:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: worker
type: ClusterIP
clusterIP: None
---
# Source: celeborn/templates/master/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: celeborn-master
labels:
helm.sh/chart: celeborn-0.1.0
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/version: "0.4.0"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/role: master
spec:
serviceName: celeborn-master-svc
selector:
matchLabels:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: master
template:
metadata:
labels:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: master
app.kubernetes.io/tag: "v0.4.0-incubating"
spec:
initContainers:
- name: chown-celeborn-master-volume
image: alpine:3.18
imagePullPolicy: Never
command:
- chown
- 10006:10006
- /mnt/celeborn_ratis
volumeMounts:
- name: celeborn-master-vol-0
mountPath: /mnt/celeborn_ratis
securityContext:
runAsUser: 0
containers:
- name: celeborn
image: apache/celeborn:v0.4.0-incubating
imagePullPolicy: Never
command:
- /usr/bin/tini
- --
- /bin/sh
- -c
- "until true; do echo waiting for master; sleep 2; done && exec
/opt/celeborn/sbin/start-master.sh"
ports:
- containerPort: 9097
- containerPort: 9098
name: metrics
protocol: TCP
env:
- name: CELEBORN_MASTER_JAVA_OPTS
value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-master.out
-Dio.netty.leakDetectionLevel=advanced"
- name: CELEBORN_MASTER_MEMORY
value: "2g"
- name: CELEBORN_NO_DAEMONIZE
value: "1"
- name: CELEBORN_WORKER_JAVA_OPTS
value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-worker.out
-Dio.netty.leakDetectionLevel=advanced"
- name: CELEBORN_WORKER_MEMORY
value: "2g"
- name: CELEBORN_WORKER_OFFHEAP_MEMORY
value: "12g"
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- name: celeborn-volume
mountPath: /opt/celeborn/conf
readOnly: true
- name: celeborn-master-vol-0
mountPath: /mnt/celeborn_ratis
volumes:
- name: celeborn-volume
configMap:
name: celeborn-conf
- name: celeborn-master-vol-0
hostPath:
path: /mnt/celeborn_ratis/master
type: DirectoryOrCreate
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- celeborn
- key: app.kubernetes.io/role
operator: In
values:
- master
topologyKey: kubernetes.io/hostname
dnsPolicy: ClusterFirst
securityContext:
fsGroup: 10006
runAsGroup: 10006
runAsUser: 10006
terminationGracePeriodSeconds: 30
replicas: 3
---
# Source: celeborn/templates/worker/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: celeborn-worker
labels:
helm.sh/chart: celeborn-0.1.0
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/version: "0.4.0"
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/role: worker
spec:
serviceName: celeborn-worker-svc
selector:
matchLabels:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: worker
template:
metadata:
labels:
app.kubernetes.io/name: celeborn
app.kubernetes.io/instance: celeborn
app.kubernetes.io/role: worker
app.kubernetes.io/tag: "v0.4.0-incubating"
spec:
initContainers:
- name: chown-celeborn-worker-volume
image: alpine:3.18
imagePullPolicy: Never
command:
- chown
- 10006:10006
- /mnt/disk1
- /mnt/disk2
- /mnt/disk3
- /mnt/disk4
volumeMounts:
- name: celeborn-worker-vol-0
mountPath: /mnt/disk1
- name: celeborn-worker-vol-1
mountPath: /mnt/disk2
- name: celeborn-worker-vol-2
mountPath: /mnt/disk3
- name: celeborn-worker-vol-3
mountPath: /mnt/disk4
securityContext:
runAsUser: 0
containers:
- name: celeborn
image: apache/celeborn:v0.4.0-incubating
imagePullPolicy: Never
command:
- /usr/bin/tini
- --
- /bin/sh
- -c
- "until true; do echo waiting for master; sleep 2; done && exec
/opt/celeborn/sbin/start-worker.sh"
ports:
- containerPort: 9096
name: metrics
protocol: TCP
env:
- name: CELEBORN_MASTER_JAVA_OPTS
value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-master.out
-Dio.netty.leakDetectionLevel=advanced"
- name: CELEBORN_MASTER_MEMORY
value: "2g"
- name: CELEBORN_NO_DAEMONIZE
value: "1"
- name: CELEBORN_WORKER_JAVA_OPTS
value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -Xloggc:gc-worker.out
-Dio.netty.leakDetectionLevel=advanced"
- name: CELEBORN_WORKER_MEMORY
value: "2g"
- name: CELEBORN_WORKER_OFFHEAP_MEMORY
value: "12g"
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- mountPath: /opt/celeborn/conf
name: celeborn-volume
readOnly: true
- name: celeborn-worker-vol-0
mountPath: /mnt/disk1
- name: celeborn-worker-vol-1
mountPath: /mnt/disk2
- name: celeborn-worker-vol-2
mountPath: /mnt/disk3
- name: celeborn-worker-vol-3
mountPath: /mnt/disk4
volumes:
- name: celeborn-volume
configMap:
name: celeborn-conf
- name: celeborn-worker-vol-0
hostPath:
path: /mnt/disk1/worker
type: DirectoryOrCreate
- name: celeborn-worker-vol-1
hostPath:
path: /mnt/disk2/worker
type: DirectoryOrCreate
- name: celeborn-worker-vol-2
hostPath:
path: /mnt/disk3/worker
type: DirectoryOrCreate
- name: celeborn-worker-vol-3
hostPath:
path: /mnt/disk4/worker
type: DirectoryOrCreate
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- celeborn
- key: app.kubernetes.io/role
operator: In
values:
- worker
topologyKey: kubernetes.io/hostname
dnsPolicy: ClusterFirst
securityContext:
fsGroup: 10006
runAsGroup: 10006
runAsUser: 10006
terminationGracePeriodSeconds: 30
replicas: 5
NOTES:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
Celeborn
```
4. Wait a momemt and inspect the pod status
```shell
$ kubectl get pods -n celeborn
NAME READY STATUS RESTARTS AGE
celeborn-master-0 1/1 Running 0 44s
celeborn-master-1 1/1 Running 0 42s
celeborn-master-2 1/1 Running 0 40s
celeborn-worker-0 1/1 Running 0 44s
celeborn-worker-1 1/1 Running 0 42s
celeborn-worker-2 1/1 Running 0 40s
celeborn-worker-3 1/1 Running 0 38s
celeborn-worker-4 1/1 Running 0 35s
```
5. Inspect master logs
```shell
$ kubectl logs -n celeborn celeborn-master-0
...
24/05/10 16:58:46,907 WARN [main] JVMSource: Add gauge jvm.thread.deadlocks
failed, the value type class java.util.Collections$EmptySet is not a number
24/05/10 16:58:46,952 WARN [main] ThreadPoolSource: Add gauge is_terminating
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:46,953 WARN [main] ThreadPoolSource: Add gauge is_terminated
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:46,953 WARN [main] ThreadPoolSource: Add gauge is_shutdown
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:46,954 INFO [main] Dispatcher: Dispatcher numThreads: 4
24/05/10 16:58:47,051 INFO [main] TransportClientFactory: mode NIO threads 64
24/05/10 16:58:47,151 WARN [main] ThreadPoolSource: Add gauge is_terminating
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:47,151 WARN [main] ThreadPoolSource: Add gauge is_terminated
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:47,151 WARN [main] ThreadPoolSource: Add gauge is_shutdown
failed, the value type class java.lang.Boolean is not a number
24/05/10 16:58:47,162 INFO [main] NettyRpcEnvFactory: Starting RPC Server
[Master] on
celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local:9097 with
advisor endpoint
celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local:9097
24/05/10 16:58:47,306 INFO [main] Utils: Successfully started service
'Master' on port 9097.
24/05/10 16:58:47,577 INFO [main] RaftServer: raft.rpc.type = NETTY (custom)
24/05/10 16:58:47,682 INFO [main] NettyConfigKeys$Server:
raft.netty.server.host = null (default)
24/05/10 16:58:47,684 INFO [main] NettyConfigKeys$Server:
raft.netty.server.port = 9872 (custom)
24/05/10 16:58:47,686 INFO [main] DataStreamServerImpl: raft.datastream.type
= DISABLED (default)
24/05/10 16:58:47,709 INFO [main] RaftServerConfigKeys:
raft.server.threadpool.proxy.cached = true (default)
24/05/10 16:58:47,709 INFO [main] RaftServerConfigKeys:
raft.server.threadpool.proxy.size = 0 (default)
24/05/10 16:58:47,710 INFO [main] RaftServerConfigKeys:
raft.server.rpc.slowness.timeout = 120s (custom)
24/05/10 16:58:47,711 INFO [main] RaftServerConfigKeys:
raft.server.leaderelection.leader.step-down.wait-time = 10s (default)
24/05/10 16:58:47,714 INFO [main] RaftServerConfigKeys:
raft.server.storage.dir = [/mnt/celeborn_ratis] (custom)
...
```
6. Inspect the worker logs
```shell
$ kubectl logs -n celeborn celeborn-worker-0
...
24/05/10 16:59:19,697 INFO [main] MasterClient: connect to master
celeborn-master-1.celeborn-master-svc.celeborn.svc.cluster.local:9097.
24/05/10 16:59:19,717 INFO [main] Worker: Register worker successfully.
24/05/10 16:59:19,723 INFO [main] PushDataHandler: diskReserveSize 5.0 GiB,
diskReserveRatio null
24/05/10 16:59:19,724 INFO [main] PushDataHandler: diskReserveSize 5.0 GiB,
diskReserveRatio null
24/05/10 16:59:19,725 INFO [main] Worker: Worker started.
24/05/10 16:59:27,568 INFO [worker-memory-manager-reporter] MemoryManager:
Direct memory usage: 8.0 MiB/12.0 GiB, disk buffer size: 0.0 B, sort memory
size: 0.0 B, read buffer size: 0.0 B
24/05/10 16:59:37,576 INFO [worker-memory-manager-reporter] MemoryManager:
Direct memory usage: 8.0 MiB/12.0 GiB, disk buffer size: 0.0 B, sort memory
size: 0.0 B, read buffer size: 0.0 B
24/05/10 16:59:47,585 INFO [worker-memory-manager-reporter] MemoryManager:
Direct memory usage: 8.0 MiB/12.0 GiB, disk buffer size: 0.0 B, sort memory
size: 0.0 B, read buffer size: 0.0 B
24/05/10 16:59:49,747 INFO [worker-forward-message-scheduler]
StorageManager: Updated diskInfos:
DiskInfo(maxSlots: 0, committed shuffles 0, running applications 0,
shuffleAllocations: Map(), mountPoint: /mnt/disk1, usableSpace: 32.2 GiB,
avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: SSD)
status: HEALTHY dirs /mnt/disk1/celeborn-worker/shuffle_data
DiskInfo(maxSlots: 0, committed shuffles 0, running applications 0,
shuffleAllocations: Map(), mountPoint: /mnt/disk3, usableSpace: 32.2 GiB,
avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: SSD)
status: HEALTHY dirs /mnt/disk3/celeborn-worker/shuffle_data
DiskInfo(maxSlots: 0, committed shuffles 0, running applications 0,
shuffleAllocations: Map(), mountPoint: /mnt/disk2, usableSpace: 32.2 GiB,
avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: SSD)
status: HEALTHY dirs /mnt/disk2/celeborn-worker/shuffle_data
DiskInfo(maxSlots: 0, committed shuffles 0, running applications 0,
shuffleAllocations: Map(), mountPoint: /mnt/disk4, usableSpace: 32.2 GiB,
avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: SSD)
status: HEALTHY dirs /mnt/disk4/celeborn-worker/shuffle_data
24/05/10 16:59:49,862 INFO [worker-expired-shuffle-cleaner]
ChunkStreamManager: Clean up expired shuffle keys
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]