[
https://issues.apache.org/jira/browse/SAMZA-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Riccomini updated SAMZA-5:
--------------------------------
Affects Version/s: 0.6.0
> Better support YARN 2.X
> -----------------------
>
> Key: SAMZA-5
> URL: https://issues.apache.org/jira/browse/SAMZA-5
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
>
> Something that's been bothering me recently is how we can support multiple
> YARN 2.X clusters, all on different versions.
> At LinkedIn, we run integration and production YARN clusters for Samza on a
> specific version of YARN, but have a test cluster running a different
> (usually newer) version. For example, we might run integration and production
> clusters on 2.0.5-alpha, but the test cluster on 2.1.0.
> The problem we have right now is that it's somewhat difficult to build a
> binary that can run on all of these clusters. Samza packages are deployed
> through a .tgz file with a bin and lib directory. The lib directory contains
> all jars (including Hadoop) that the job needs to run. The bin directory has
> a run-class.sh script in it, which adds ../lib/*.jar to the classpath.
> To complicate matters, YARN's 2.X versions are all backwards incompatible at
> the protocol level. YARN 2.0.4-alpha can't run on a 2.0.5-alpha cluster, and
> vice-versa. At the API-level (for the APIs Samza uses) 2.0.3-alpha,
> 2.0.4-alpha, and 2.0.5-alpha are all compatible (i.e. you can compile and run
> Samza with any one of these versions). I haven't tested 2.1.0-beta, yet, but
> I assume it's API compatible with 2.0.3-alpha an onward (should be verified,
> though). As a result, the incompatibility issue appears to be purely runtime.
> YARN clusters simple reject RPC calls from protobuf RPC versions that aren't
> the same as the version the YARN cluster is running (2.0.5-alpha clusters
> reject all non-2.0.5-alpha RPC calls).
> Given this set of facts, right now you have to cross-build a Samza tarball
> for each version of Hadoop that you're running on
> (my-job-yarn-2.0.3-alpha.tgz, my-job.yarn-2.0.4-alpha.tgz, etc). This is kind
> of annoying, breaks reproducibility somewhat. Generally, testing on one
> binary, and running in production on another is kind of a bad idea, even if
> they're theoretically exactly the same, minus the YARN jars.
> I was thinking a better solution might be to:
> 1. Change the YarnConfig/YarnJob/ClientHelper to allow arbitrary resources
> with a config scheme like this: yarn.resource.<resource-name>.path=...
> 2. Change run-class.sh to add both ../lib/*.jar and ../.../hadoop-libs/*.jar
> This seems like a good change to me because:
> 1. It allows jobs to attach arbitrary resources in addition to hadoop-libs.
> 2. It allows a single binary to be run with multiple versions of hadoop (vs
> one binary per Hadoop/YARN version).
> 3. If a non-manual (automated, or CI-based) job deployment mechanism is used,
> which automatically runs run-job.sh, then that deployment mechanism can
> auto-inject the appropriate version of YARN (using --config
> yarn.resource.hadoop-libs.path=...) based on the YARN cluster that the
> deployment mechanism is executing the job on.
> 4. If run-job.sh is being run by a user, they can use either the --config
> based approach (described in number 3), or they can use a config-based
> approach (test.properties has a path to YARN 2.1.0-beta, and prod.properties
> has a path to YARN 2.0.5-alpha).
> 5. It means that rolling out a new version of YARN doesn't require a re-build
> of every job package that runs on the cluster. You just need to update the
> config path to point to the new hadoop
> 6. This would theoretically allow us to switch from a JSON/environment
> variable config-passing approach to a distributed cache config-passing
> approach (config is a file on HDFS, or HTTP, or wherever). Right now, config
> is resolved at job-execution time (run-job.sh/JobRunner) into key/value
> pairs, and is passed to the AM/containers via environment variable as a JSON
> blob. This is kind of hacky, has size limits, and probably not the safest
> thing to do. A better (but perhaps less convenient) approach might be to
> require config to be a file which gets attached and un-tar'ed along with
> hadoop-libs and the job package. I haven't fully thought the file-based
> config feature through, but the proposed change would allow us to support
> this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira