Hi all,
Hello, I'm from Beijing DiDi Infinity Technology and Development Co., Ltd.,
My company uses livy to build a spark service platform, hope to complete
the unified submission, monitoring and management of spark jobs through
livy. Currently, livy only supports the programming interface or rest api
to submit job, but for individual users and Large-scale platform that uses
spark client before, most of previous submit job in the form of a shell
script. For those user, Livy does not provide a more convenient way to
submit job.
Livy client is a higher-level spark client. The difference between Livy
client and spark client is that job is submitted through Livy. User don't
need to install spark client locally. And Livy client is designed to
replace the diverse version of spark client scattered on each machine. To
minimize the user's perception of client changes, Livy client adapt command
line parameters in spark client, and because of the usage of Livy, users
can fully obtain Livy's security verification, session management,
multi-tenancy and other features. Currently, There are 12K+ spark jobs
submitted through livy daily and 900+ jobs submit through livy client in my
company.
Livy client adapts the input command line in spark client, let user
executes in the form of shell script. Livy client parses the parameters in
the user command line one by one, converts them into a map, and access Livy
through the rest api to submit job and get job progress, control of spark
level is handed over to backend livy server cluster, and unified submission
in cluster mode ensures the consistency of the execution results.
User start a different Livy interpreter through different livy clients, and
the livy-submit type directly launches SparkYarnApp, as shown in Figure 1:
[image: image2019-4-23 11_9_49.png]
Figure 1
There are these steps in Livy-Client submitting a spark job:  Submit
CommandLine, Parsing parameters, Separate varient params, Start
Interpreter, Load hiveconf, Execute code, polling for result, print result.
as Figure 2:
[image: 屏幕快照 2019-04-29 11.59.20.png]
Figure 2

   - Submit CommandLine:in commandLine, use abbreviation to set common
   spark or livy params, use --conf to set customized spark params or livy rsc
   params, and use --hiveconf to set hive params. you can input --help for
   user guidance.
   - Parsing parameters:use parser like spark-submit to parse command line
   and init SparkSubmitArgument class.
   - Separate varient params:separate spark config, livy config and hive
   config in SparkSubmitArgument.
   - Start interpreter:start LivyInterpreter in Livy-Client, at the same
   time submit a rest request to create session in livy server and
   LivyInterpreter control this session by sessionId which returned from rest
   api respond.
   - Load hiveconf:if script contains hiveconf config,use SET command to
   active config after LivyInterpreter has been started.
   - Execution code:user can use -e or -f param to make client parse code
   line by line, or input code interactively. all code will be contained in
   rest request, livy server receive request and start statement. when
   execution, Livy Client will get sparkUiUrl from livy session info, Client
   will get job or stage progress from sparkui, and print in console.
   - Polling result:Livy-Client use sessionId and statementId to construct
   rest request access to livy, get execution result until complete.
   - Print result:when statement is in finish state, Livy-Client will print
   statement result field in console, livy-submit will not print result, only
   progress.

Livy client has the following advantages over the spark client:

   - Compared with spark client, Livy-Client will almost NOT update which
   spent patient of most user, backend livy server can change spark dependency
   at realtime and user has no need to know that.
   - Livy-Client is more lightweight than spark client and has no
   dependency, moreover spark job running in cluster mode will not occupy
   memory and calculation resources in local machine.
   - All jobs submited by Livy-Client will run in cluster mode which is
   more convenient to shoot the trouble

For more information about Livy client, please see Livy client design doc
<https://docs.google.com/document/d/1Sc-EHLBhLhmqVn7kQqUexxZ1vwEomb8lW-Vvlpj2Gmc/edit?usp=sharing>
or discuss with us in issue link
<https://issues.apache.org/jira/browse/LIVY-596>.

Reply via email to