Hi all, Hello, I'm from Beijing DiDi Infinity Technology and Development Co., Ltd., My company uses livy to build a spark service platform, hope to complete the unified submission, monitoring and management of spark jobs through livy. Currently, livy only supports the programming interface or rest api to submit job, but for individual users and Large-scale platform that uses spark client before, most of previous submit job in the form of a shell script. For those user, Livy does not provide a more convenient way to submit job. Livy client is a higher-level spark client. The difference between Livy client and spark client is that job is submitted through Livy. User don't need to install spark client locally. And Livy client is designed to replace the diverse version of spark client scattered on each machine. To minimize the user's perception of client changes, Livy client adapt command line parameters in spark client, and because of the usage of Livy, users can fully obtain Livy's security verification, session management, multi-tenancy and other features. Currently, There are 12K+ spark jobs submitted through livy daily and 900+ jobs submit through livy client in my company. Livy client adapts the input command line in spark client, let user executes in the form of shell script. Livy client parses the parameters in the user command line one by one, converts them into a map, and access Livy through the rest api to submit job and get job progress, control of spark level is handed over to backend livy server cluster, and unified submission in cluster mode ensures the consistency of the execution results. User start a different Livy interpreter through different livy clients, and the livy-submit type directly launches SparkYarnApp, as shown in Figure 1: [image: image2019-4-23 11_9_49.png] Figure 1 There are these steps in Livy-Client submitting a spark job: Submit CommandLine, Parsing parameters, Separate varient params, Start Interpreter, Load hiveconf, Execute code, polling for result, print result. as Figure 2: [image: 屏幕快照 2019-04-29 11.59.20.png] Figure 2
- Submit CommandLine:in commandLine, use abbreviation to set common spark or livy params, use --conf to set customized spark params or livy rsc params, and use --hiveconf to set hive params. you can input --help for user guidance. - Parsing parameters:use parser like spark-submit to parse command line and init SparkSubmitArgument class. - Separate varient params:separate spark config, livy config and hive config in SparkSubmitArgument. - Start interpreter:start LivyInterpreter in Livy-Client, at the same time submit a rest request to create session in livy server and LivyInterpreter control this session by sessionId which returned from rest api respond. - Load hiveconf:if script contains hiveconf config,use SET command to active config after LivyInterpreter has been started. - Execution code:user can use -e or -f param to make client parse code line by line, or input code interactively. all code will be contained in rest request, livy server receive request and start statement. when execution, Livy Client will get sparkUiUrl from livy session info, Client will get job or stage progress from sparkui, and print in console. - Polling result:Livy-Client use sessionId and statementId to construct rest request access to livy, get execution result until complete. - Print result:when statement is in finish state, Livy-Client will print statement result field in console, livy-submit will not print result, only progress. Livy client has the following advantages over the spark client: - Compared with spark client, Livy-Client will almost NOT update which spent patient of most user, backend livy server can change spark dependency at realtime and user has no need to know that. - Livy-Client is more lightweight than spark client and has no dependency, moreover spark job running in cluster mode will not occupy memory and calculation resources in local machine. - All jobs submited by Livy-Client will run in cluster mode which is more convenient to shoot the trouble For more information about Livy client, please see Livy client design doc <https://docs.google.com/document/d/1Sc-EHLBhLhmqVn7kQqUexxZ1vwEomb8lW-Vvlpj2Gmc/edit?usp=sharing> or discuss with us in issue link <https://issues.apache.org/jira/browse/LIVY-596>.