|
This is an excellent discussion. As mentioned in an earlier
email, we agree with a number of Chester's suggestions, but we
have yet other concerns. I've researched this further in the past
several days, and I've queried my team. This email attempts to
capture those other concerns. Making yarn.Client private has prevented us from moving from Spark 1.0.x to Spark 1.2 or 1.3 despite many alluring new features. The SparkLauncher, which provides “support for programmatically running Spark jobs” (SPARK-3733 and SPARK-4924) will not work in our environment or for our use case -- which requires programmatically initiating and monitoring Spark jobs on Yarn in cluster mode from a cloud-based application server. It is not just that the Yarn ApplicationId is no longer directly or indirectly available. More critically, it violates constraints imposed by any application server and additional constraints imposed by security, process, and dynamic resource allocation requirements in our cloud services environment. In Spark 1.0 and 1.1, with yarn.Client public, our applications' job scheduler marshalls configuration and environmental resources necessary for any Spark job, including cluster-, data- or job-specific parameters, makes the appropriate calls to initialize and run yarn.Client, which together with the other classes in the spark-yarn module requests the Yarn resource manager to start and monitor a job (see Figure 1) on the cluster. (Our job scheduler is not Yarn replacement; it leverages Yarn to coordinate a variety of different Spark analytic and data enrichment jobs.) More recent Spark versions make yarn.Client private and thus remove that capability, but the SparkLauncher, scheduled for Spark 1.4, replaces this simple programmatic solution with one considerably more complicated. Based on our understanding, in this scenario, our job scheduler marshalls configuration and environmental resources for the SparkLauncher much as it did for yarn.Client. It then calls launch() to initialize a new Linux process to execute the spark-submit shell script with the specified configuration and environment, which in turn starts a new JVM (with the Spark assembly jar in its class path) that executes launcher.Main. This ultimately calls yarn.Client (see Figure 2). This is more than an arm's-length transaction. There are three legs: job scheduler SparkLauncher.launch() call → spark-submit bash execution → launcher.Main call to yarn.Client → Yarn resource manager allocation and execution of job driver and executors. Not only is this scenario unnecessarily complicated, it will simply not work. The “programmatic” call to SparkLauncher.launch() starts a new JVM, which is not allowed in any application server, which must own all its JVMs. Perhaps, spark-submit and the launcher.Main JVM process could be hosted outside the application server, but in violation of security and multiple-tenant cloud architectural constraints. We appreciate that yarn.Client was perhaps never intended to be public. Configuring it is not for the faint-of-heart, and some of its methods should indeed be private. We wonder whether there is another option. In researching and discussing these issues with Cloudera and others, we've been told that only one mechanism is supported for starting Spark jobs: the spark-submit scripts. We also have gathered (perhaps mistakenly) from discussions reaching back 20 months that Spark's intention is to have a unified job submission interface for all supported platforms. Unfortunately this doesn't recognize the asymmetries among those platforms. Submitting a local Spark job or a job to a Spark master in cluster mode may indeed require initializing a separate process in order to pass configuration parameters via the environment and command line. But Spark's yarn.Client in cluster mode already has an arm's length relationship with the Yarn resource manager. Configuration may be passed from the job scheduling application to yarn.Client as Strings or property map variables and method parameters. Our request is for a public yarn.Client or some reasonable facsimile. Thanks.
On 05/13/2015 08:22 PM, Patrick Wendell
wrote:
Hey Chester, Thanks for sending this. It's very helpful to have this list.The reason we made the Client API private was that it was never intended to be used by third parties programmatically and we don't intend to support it in its current form as a stable API. We thought the fact that it was for internal use would be obvious since it accepts arguments as a string array of CL args. It was always intended for command line use and the stable API was the command line. When we migrated the Launcher library we figured we covered most of the use cases in the off chance someone was using the Client. It appears we regressed one feature which was a clean way to get the app ID. The items you list here 2-6 all seem like new feature requests rather than a regression caused by us making that API private. I think the way to move forward is for someone to design a proper long-term stable API for the things you mentioned here. That could either be by extension of the Launcher library. Marcelo would be natural to help with this effort since he was heavily involved in both YARN support and the launcher. So I'm curious to hear his opinion on how best to move forward. I do see how apps that run Spark would benefit of having a control plane for querying status, both on YARN and elsewhere. - Patrick On Wed, May 13, 2015 at 5:44 AM, Chester At Work <[email protected]> wrote: |
- Change for submitting to yarn in 1.3.1 Ron's Yahoo!
- Re: Change for submitting to yarn in 1.3.1 Mridul Muralidharan
- Re: Change for submitting to yarn in 1.3.1 Manku Timma
- Re: Change for submitting to yarn in 1.3.... Mridul Muralidharan
- Re: Change for submitting to yarn in ... Kevin Markey
- Re: Change for submitting to yar... Marcelo Vanzin
- Re: Change for submitting to... Patrick Wendell
- Re: Change for submitting to... Chester At Work
- Re: Change for submitting to... Patrick Wendell
- Re: Change for submitting to... Chester @work
- Re: Change for submitting to... Kevin Markey
- Re: Change for submitting to... Marcelo Vanzin
- Re: Change for submitting to... Nathan Kronenfeld
- Re: Change for submitting to... Marcelo Vanzin
- Re: Change for submitting to... Nathan Kronenfeld
- Re: Change for submitting to... Koert Kuipers
- Re: Change for submitting to... Marcelo Vanzin
- Re: Change for submitting to... Chester Chen
- Re: Change for submitting to... Chester Chen
- Re: Change for submitting to... Marcelo Vanzin
- Re: Change for submitting to... Chester At Work
