Steve Loughran created SLIDER-330:
-------------------------------------

             Summary: Slider to upload all JARs to AM via distributed cache
                 Key: SLIDER-330
                 URL: https://issues.apache.org/jira/browse/SLIDER-330
             Project: Slider
          Issue Type: New Feature
          Components: client
    Affects Versions: Slider 0.50
            Reporter: Steve Loughran


Currently slider only uploads the JARs we know/think aren't on the YARN 
classpath, and sets the CP from that and the {{yarn.application.classpath}} 
conf option

This is brittle against Hadoop versions and installations
# the option {{yarn.application.classpath}}  may be missing or wrong, at which 
point the AM refuses to start
# if YARN adds a different version of a dependency we push up, we end up in >1 
version on the CP pain.
# if the Hadoop installation itself has some older JARs which don't link up to 
our code (e.g. AM uses protobuf fields not in hadoop-yarn JARs), then AM won't 
start with linkage errors.

The solution: isolate our classpath by pushing up all dependencies.

This requires Slider to know all its dependencies, which can be done via one of
* static coding (painful, brittle)
* chained .asm dependency analysis (As Twill does). Efficient and 
near-invisible, but breaks for any introspective binding.
* upload {{$SLIDER_HOME/lib/*.jar}}. Relies on mvn to build the files, the 
assembly code to add them to RPM and zip.

The last option, upload {{lib/*.jar}} seems the easiest and should be reliable, 
provided {{$SLIDER_HOME/lib/*.jar}} is set. 

For slider-client-as-API, the homedir may be unset. Here we propose letting the 
client config define the homedir, having that over-ride any env var.

For performance, especially long-haul, the distributed cache should be used



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to