alanwake created ZEPPELIN-4946:
----------------------------------
Summary: zeppelin server failed to connect spark interpreter on
k8s
Key: ZEPPELIN-4946
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4946
Project: Zeppelin
Issue Type: Bug
Components: zeppelin-server
Affects Versions: 0.9.0
Environment: zeppelin:0.9.0
k8s: 1.16.2
spark: spark-py:3.0-2.7
here are my development environment:
1. standalone spark cluster on k8s , master service expose at
spark://master-0.spark-master.spark.svc.cluster.local:7077
2. zeppelin is deploy at zeppelin namespace.
{code:java}
// deployment.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: zeppelin-server-conf-map
namespace: zeppelin
data:
# 'serviceDomain' is a Domain name to use for accessing Zeppelin UI.
# Should point IP address of 'zeppelin-server' service.
#
# Wildcard subdomain need to be point the same IP address to access service
inside of Pod (such as SparkUI).
# i.e. if service domain is 'local.zeppelin-project.org', DNS configuration
should make 'local.zeppelin-project.org' and '*.local.zeppelin-project.org'
point the same address.
#
# Default value is 'local.zeppelin-project.org' while it points 127.0.0.1 and
`kubectl port-forward zeppelin-server` will give localhost to connects.
# If you have your ingress controller configured to connect to
`zeppelin-server` service and have a domain name for it (with wildcard
subdomain point the same address), you can replace serviceDomain field with
your own domain.
#SERVICE_DOMAIN: zeppelin-server.zeppelin.svc.cluster.local:8080
SERVICE_DOMAIN: local.zeppelin-project.org:8080
ZEPPELIN_K8S_SPARK_CONTAINER_IMAGE: spark-py:3.0-2.7
ZEPPELIN_K8S_CONTAINER_IMAGE: apache/zeppelin:0.9.0
ZEPPELIN_HOME: /zeppelin
ZEPPELIN_SERVER_RPC_PORTRANGE: 12320:12320
# default value of 'master' property for spark interpreter.
#SPARK_MASTER: k8s://https://kubernetes.default.svc
SPARK_MASTER: spark://master-0.spark-master.spark.svc.cluster.local:7077
# default value of 'SPARK_HOME' property for spark interpreter.
SPARK_HOME: /spark---apiVersion: apps/v1
kind: Deployment
metadata:
name: zeppelin
namespace: zeppelin
labels:
app: zeppelin
spec:
replicas: 1
selector:
matchLabels:
app: zeppelin
template:
metadata:
labels:
app: zeppelin
spec:
nodeSelector:
role: worker
containers:
- name: zeppelin
image: apache/zeppelin:0.9.0
securityContext:
runAsUser: 0
envFrom:
- configMapRef:
name: zeppelin-server-conf-map
ports:
- containerPort: 8080
name: web
- containerPort: 12320
name: rpc
resources:
requests:
cpu: 0.2
memory: 200m
volumeMounts:
- name: podyaml
mountPath: /zeppelin/k8s/interpreter
volumes:
- name: podyaml
hostPath:
path: /datadisk/nfs/zeppelin/k8s/interpreter/
{code}
{code:java}
//100-interpreter-spec.yaml
here may be a bug
-c {{zeppelin.k8s.server.rpc.service}} can not work, it's empty.
so i replace it with hard code -c zeppelin-server.zeppelin.svc.cluster.local
{code}
{code:java}
kind: Service
apiVersion: v1
metadata:
name: zeppelin-server
namespace: zeppelin
spec:
type: NodePort
ports:
- port: 8080
targetPort: 8080
nodePort: 30080
name: web
- port: 12320
name: rpc # port name is referenced in the code. So it
shouldn't be changed.
selector:
app: zeppelin
{code}
Reporter: alanwake
Attachments: 1.txt, 2.txt
HELP, Dears!
i am new to here and unfamiliar with java projects. the logs show nothing about
remote address.
{code:java}
[root@master zeppelin]# kubectl get pods -n=zeppelin -o=wide
NAME READY STATUS RESTARTS AGE IP
NODE NOMINATED NODE READINESS GATES
spark-hpvbft 1/1 Running 0 13s 10.244.1.23
node01.51vrk8s.local <none> <none>
zeppelin-df54795fb-wddqs 1/1 Running 0 5m50s 10.244.1.22
node01.51vrk8s.local <none> <none>
{code}
{code:java}
[root@master ~]# kubectl logs spark-hpvbft -n=zeppelin
INFO [2020-07-10 04:19:37,576]
({FIFOScheduler-interpreter_1257482730-Worker-1} Logging.scala[logInfo]:57) -
Initialized BlockManager: BlockManagerId(driver, spark-hpvbft, 36344, None)
INFO [2020-07-10 04:19:37,681]
({FIFOScheduler-interpreter_1257482730-Worker-1}
ContextHandler.java[doStart]:855) - Started
o.s.j.s.ServletContextHandler@69b63f8c{/metrics/json,null,AVAILABLE,@Spark}
INFO [2020-07-10 04:19:37,754]
({FIFOScheduler-interpreter_1257482730-Worker-1}
BaseSparkScalaInterpreter.scala[spark2CreateContext]:293) - Created Spark
session (without Hive support)
INFO [2020-07-10 04:19:41,316]
({FIFOScheduler-interpreter_1257482730-Worker-1} SparkShims.java[loadShims]:61)
- Initializing shims for Spark 3.x
INFO [2020-07-10 04:19:42,727]
({FIFOScheduler-interpreter_1257482730-Worker-1}
AbstractScheduler.java[runJob]:152) - Job 20150210-015259_1403135953 finished
by scheduler interpreter_1257482730
{code}
see details file 1
{code:java}
[root@master ~]# kubectl logs zeppelin-df54795fb-wddqs -n=zeppelin
INFO [2020-07-10 04:19:34,427] ({SchedulerFactory2}
RemoteInterpreter.java[call]:141) - Open RemoteInterpreter
org.apache.zeppelin.spark.SparkInterpreter
INFO [2020-07-10 04:19:34,427] ({SchedulerFactory2}
RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:431) - Push local
angular object registry from ZeppelinServer to remote interpreter group
spark-shared_process
WARN [2020-07-10 04:19:42,736] ({SchedulerFactory2}
NotebookServer.java[onStatusChange]:1901) - Job 20150210-015259_1403135953 is
finished, status: ERROR, exception: null, result: %text warning: there was one
deprecation warning (since 2.0.0); for details, enable `:setting -deprecation'
or `:replay -deprecation'
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
{code}
see details file 2
--
This message was sent by Atlassian Jira
(v8.3.4#803005)