[jira] [Commented] (LIVY-660) How can we use YARN and all the nodes in our cluster when submiting a pySpark job

Yiheng Wang (Jira) Thu, 05 Sep 2019 00:07:19 -0700


    [ 
https://issues.apache.org/jira/browse/LIVY-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923112#comment-16923112
 ]


Yiheng Wang commented on LIVY-660:
----------------------------------

I think it's better to post such question in livy user email group.

 

You just post your livy-client.conf. Do you modify your livy.conf?

> How can we use YARN and all the nodes in our cluster when submiting a pySpark 
> job
> ---------------------------------------------------------------------------------
>
>                 Key: LIVY-660
>                 URL: https://issues.apache.org/jira/browse/LIVY-660
>             Project: Livy
>          Issue Type: Question
>          Components: Server
>    Affects Versions: 0.6.0
>            Reporter: Sebastian Rama
>            Priority: Minor
>
> How can we use YARN and all the nodes in our cluster when submiting a pySpark 
> job?
> We have edited all the required .conf files but nothing happens..... =(
>  
>  
> [root@cdh-node06 conf]# cat livy-client.conf
> #
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #    http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
> # Use this keystore for the SSL certificate and key.
> # livy.keystore =
> dew0wf-e
> # Specify the keystore password.
> # livy.keystore.password =
> #
> welfka
> # Specify the key password.
> # livy.key-password =
>  
> # Hadoop Credential Provider Path to get "livy.keystore.password" and 
> "livy.key-password".
> # Credential Provider can be created using command as follow:
> # hadoop credential create "livy.keystore.password" -value "secret" -provider 
> jceks://hdfs/path/to/livy.jceks
> # livy.hadoop.security.credential.provider.path =
>  
> # What host address to start the server on. By default, Livy will bind to all 
> network interfaces.
> # livy.server.host = 0.0.0.0
>  
> # What port to start the server on.
> # livy.server.port = 8998
>  
> # What base path ui should work on. By default UI is mounted on "/".
> # E.g.: livy.ui.basePath = /my_livy - result in mounting UI on /my_livy/
> # livy.ui.basePath = ""
>  
> # What spark master Livy sessions should use.
> livy.spark.master = yarn
>  
> # What spark deploy mode Livy sessions should use.
> livy.spark.deploy-mode = cluster
>  
> # Configure Livy server http request and response header size.
> # livy.server.request-header.size = 131072
> # livy.server.response-header.size = 131072
>  
> # Enabled to check whether timeout Livy sessions should be stopped.
> # livy.server.session.timeout-check = true
>  
> # Time in milliseconds on how long Livy will wait before timing out an idle 
> session.
> # livy.server.session.timeout = 1h
> #
> # How long a finished session state should be kept in LivyServer for query.
> # livy.server.session.state-retain.sec = 600s
>  
> # If livy should impersonate the requesting users when creating a new session.
> # livy.impersonation.enabled = true
>  
> # Logs size livy can cache for each session/batch. 0 means don't cache the 
> logs.
> # livy.cache-log.size = 200
>  
> # Comma-separated list of Livy RSC jars. By default Livy will upload jars 
> from its installation
> # directory every time a session is started. By caching these files in HDFS, 
> for example, startup
> # time of sessions on YARN can be reduced.
> # livy.rsc.jars =
>  
> # Comma-separated list of Livy REPL jars. By default Livy will upload jars 
> from its installation
> # directory every time a session is started. By caching these files in HDFS, 
> for example, startup
> # time of sessions on YARN can be reduced. Please list all the repl 
> dependencies including
> # Scala version-specific livy-repl jars, Livy will automatically pick the 
> right dependencies
> # during session creation.
> # livy.repl.jars =
>  
> # Location of PySpark archives. By default Livy will upload the file from 
> SPARK_HOME, but
> # by caching the file in HDFS, startup time of PySpark sessions on YARN can 
> be reduced.
> # livy.pyspark.archives =
>  
> # Location of the SparkR package. By default Livy will upload the file from 
> SPARK_HOME, but
> # by caching the file in HDFS, startup time of R sessions on YARN can be 
> reduced.
> # livy.sparkr.package =
>  
> # List of local directories from where files are allowed to be added to user 
> sessions. By
> # default it's empty, meaning users can only reference remote URIs when 
> starting their
> # sessions.
> # livy.file.local-dir-whitelist =
>  
> # Whether to enable csrf protection, by default it is false. If it is 
> enabled, client should add
> # http-header "X-Requested-By" in request if the http method is 
> POST/DELETE/PUT/PATCH.
> # livy.server.csrf-protection.enabled =
>  
> # Whether to enable HiveContext in livy interpreter, if it is true 
> hive-site.xml will be detected
> # on user request and then livy server classpath automatically.
> # livy.repl.enable-hive-context =
>  
> # Recovery mode of Livy. Possible values:
> # off: Default. Turn off recovery. Every time Livy shuts down, it stops and 
> forgets all sessions.
> # recovery: Livy persists session info to the state store. When Livy 
> restarts, it recovers
> #           previous sessions from the state store.
> # Must set livy.server.recovery.state-store and 
> livy.server.recovery.state-store.url to
> # configure the state store.
> # livy.server.recovery.mode = off
>  
> # Where Livy should store state to for recovery. Possible values:
> # <empty>: Default. State store disabled.
> # filesystem: Store state on a file system.
> # zookeeper: Store state in a Zookeeper instance.
> # livy.server.recovery.state-store =
>  
> # For filesystem state store, the path of the state store directory. Please 
> don't use a filesystem
> # that doesn't support atomic rename (e.g. S3). e.g. file:///tmp/livy or 
> hdfs:///.
> # For zookeeper, the address to the Zookeeper servers. e.g. 
> host1:port1,host2:port2
> # livy.server.recovery.state-store.url =
>  
> # If Livy can't find the yarn app within this time, consider it lost.
> # livy.server.yarn.app-lookup-timeout = 120s
> # When the cluster is busy, we may fail to launch yarn app in 
> app-lookup-timeout, then it would
> # cause session leakage, so we need to check session leakage.
> # How long to check livy session leakage
> # livy.server.yarn.app-leakage.check-timeout = 600s
> # how often to check livy session leakage
> # livy.server.yarn.app-leakage.check-interval = 60s
>  
> # How often Livy polls YARN to refresh YARN app state.
> # livy.server.yarn.poll-interval = 5s
> #
> # Days to keep Livy server request logs.
> # livy.server.request-log-retain.days = 5
>  
> # If the Livy Web UI should be included in the Livy Server. Enabled by 
> default.
> # livy.ui.enabled = true
>  
> # Whether to enable Livy server access control, if it is true then all the 
> income requests will
> # be checked if the requested user has permission.
> # livy.server.access-control.enabled = false
>  
> # Allowed users to access Livy, by default any user is allowed to access 
> Livy. If user want to
> # limit who could access Livy, user should list all the permitted users with 
> comma separated.
> # livy.server.access-control.allowed-users = *
>  
> # A list of users with comma separated has the permission to change other 
> user's submitted
> # session, like submitting statements, deleting session.
> # livy.server.access-control.modify-users =
>  
> # A list of users with comma separated has the permission to view other 
> user's infomation, like
> # submitted session state, statement results.
> # livy.server.access-control.view-users =
> #
> # Authentication support for Livy server
> # Livy has a built-in SPnego authentication support for HTTP requests  with 
> below configurations.
> # livy.server.auth.type = kerberos
> # livy.server.auth.kerberos.principal = <spnego principal>
> # livy.server.auth.kerberos.keytab = <spnego keytab>
> # livy.server.auth.kerberos.name-rules = DEFAULT
> #
> # If user wants to use custom authentication filter, configurations are:
> # livy.server.auth.type = <custom>
> # livy.server.auth.<custom>.class = <class of custom auth filter>
> # livy.server.auth.<custom>.param.<foo1> = <bar1>
> # livy.server.auth.<custom>.param.<foo2> = <bar2>
> export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera/jre/
> export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/
> export SPARK_CONF_DIR=$SPARK_HOME/conf
> export HADOOP_HOME=/etc/hadoop/
> export HADOOP_CONF_DIR=/etc/hadoop/conf
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (LIVY-660) How can we use YARN and all the nodes in our cluster when submiting a pySpark job

Reply via email to