Steve Loughran created SLIDER-330:
-------------------------------------
Summary: Slider to upload all JARs to AM via distributed cache
Key: SLIDER-330
URL: https://issues.apache.org/jira/browse/SLIDER-330
Project: Slider
Issue Type: New Feature
Components: client
Affects Versions: Slider 0.50
Reporter: Steve Loughran
Currently slider only uploads the JARs we know/think aren't on the YARN
classpath, and sets the CP from that and the {{yarn.application.classpath}}
conf option
This is brittle against Hadoop versions and installations
# the option {{yarn.application.classpath}} may be missing or wrong, at which
point the AM refuses to start
# if YARN adds a different version of a dependency we push up, we end up in >1
version on the CP pain.
# if the Hadoop installation itself has some older JARs which don't link up to
our code (e.g. AM uses protobuf fields not in hadoop-yarn JARs), then AM won't
start with linkage errors.
The solution: isolate our classpath by pushing up all dependencies.
This requires Slider to know all its dependencies, which can be done via one of
* static coding (painful, brittle)
* chained .asm dependency analysis (As Twill does). Efficient and
near-invisible, but breaks for any introspective binding.
* upload {{$SLIDER_HOME/lib/*.jar}}. Relies on mvn to build the files, the
assembly code to add them to RPM and zip.
The last option, upload {{lib/*.jar}} seems the easiest and should be reliable,
provided {{$SLIDER_HOME/lib/*.jar}} is set.
For slider-client-as-API, the homedir may be unset. Here we propose letting the
client config define the homedir, having that over-ride any env var.
For performance, especially long-haul, the distributed cache should be used
--
This message was sent by Atlassian JIRA
(v6.2#6252)