Hi Chris,
That doesn’t seem to be working. I also noticed that YARN_HOME seems to be replaced with HADOOP_YARN_HOME. To be safe I set both. $ ls -la /usr/hadoop/conf total 12 drwxr-xr-x 2 root root 4096 Sep 12 17:17 . drwxr-xr-x 5 samza samza 4096 Sep 12 17:17 .. -rw-r--r-- 1 samza samza 1978 Sep 12 17:17 yarn-site.xml $ YARN_HOME=/usr/hadoop HADOOP_YARN_HOME=/usr/hadoop./gradlew samza-shell:runJob -PconfigPath=file:///usr/samza/config/wikipedia-feed.properties <- does not load the yarn-site.xml found in /usr/hadoop/conf/yarn-site.xml The documentation http://www.gradle.org/docs/current/userguide/tutorial_this_and_that.html for 14.2 Gradle properties and system properties is horrible. Is this the correct way to set the environment when using gradle tasks? It does not work. On Fri, Sep 12, 2014 at 12:41 PM, Chris Riccomini <[email protected]> wrote: > Hey Ethan, > I believe that you need to export YARN_HOME=<path to yarn home dir> > The YARN_HOME dir must have a conf/yarn-site.xml in it. > Cheers, > Chris > On 9/12/14 9:38 AM, "Ethan Setnik" <[email protected]> wrote: >>My initial tests suggest that the HA failover is handled gracefully in >>yarn 2.4.0 without any changes to Samza due to the abstraction of >>AMRMClient as mentioned by Zhijie. I’ve also confirmed that upgrading to >>yarn 2.5.0 works as well FWIW. >> >> >> >> >>The one thing that’s not working completely is >>org.apache.samza.job.JobRunner >> >> >> >> >> >>$ ./gradlew samza-shell:runJob >>-PconfigPath=file:///usr/samza/config/wikipedia-feed.properties >> >>... >> >> >> >> >>2014-09-12 16:13:08 JobRunner [INFO] job factory: >>org.apache.samza.job.yarn.YarnJobFactory >> >>2014-09-12 16:13:08 ClientHelper [INFO] trying to connect to RM >>0.0.0.0:8032 >> >>2014-09-12 16:13:08 RMProxy [INFO] Connecting to ResourceManager at >>/0.0.0.0:8032 >> >>… >> >> >> >> >>The JobRunner creates a new instance of YarnJobFactory which initializes >>a default instance of YarnConfiguration instead of reading an existing >>configuration from the file system. >> >> >> >> >> >>class YarnJobFactory extends StreamJobFactory { >> >> def getJob(config: Config) = { >> >> // TODO fix this. needed to support http package locations. >> >> val hConfig = new YarnConfiguration >> >> hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName) >> >> >> >> >> new YarnJob(config, hConfig) >> >> } >> >>} >> >> >> >> >>This default config is then used to create and submit the job. >> >> >> >> >> >>class YarnJob(config: Config, hadoopConfig: Configuration) extends >>StreamJob { >> >> import YarnJob._ >> >> >> >> val client = new ClientHelper(hadoopConfig) >> >> >> >> >> >> >> >> >>It looks as if it’s always using the default configuration. How do I >>specify the environment for hadoop to load the yarn-site.xml I have >>placed in /usr/hadoop/etc/hadoop instead of using the defaults? >> >>On Thu, Sep 11, 2014 at 6:51 PM, Zhijie Shen <[email protected]> >>wrote: >> >>> W.R.T HA RM, AM doesn't need to make any change if it is using >>> AMRMClient(Async), which will automatically take care of the work >>>required >>> when RM failover happens. >>> One thing that may interest Samza is the work-preserving AM restarting. >>>If >>> Samza AM crash for some reason, when it restarts, it can try to get the >>>old >>> containers back, to continue the work. To achieve this, Samza needs some >>> logic optimization on the AM side. >>> On Thu, Sep 11, 2014 at 2:45 PM, Chris Riccomini < >>> [email protected]> wrote: >>>> Hey Ethan, >>>> >>>> Yes, we plan to support this, but haven't done any testing with it yet. >>>> >>>> The main question that I have is whether supporting HA RM requires >>>>Samza's >>>> AM to have code changes, or whether the YARN AM client will >>>>transparently >>>> handle RM failover. Until we run some tests on this (or are told >>>> otherwise), I just don't know (and haven't had time to investigate). >>>>Does >>>> anyone know the answer to this question? >>>> >>>> Cheers, >>>> Chris >>>> >>>> On 9/11/14 1:55 PM, "Ethan Setnik" <[email protected]> wrote: >>>> >>>> >Yarn 2.4.0 brings support for high availability configurations by >>>> >specifying a cluster of resource managers and a state store via >>>>zookeeper. >>>> > >>>> > >>>> >"When there are multiple RMs, the configuration (yarn-site.xml) used >>>>by >>>> >clients and nodes is expected to list all the RMs. Clients, >>>> >ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the >>>>RMs >>>> >in a round-robin fashion until they hit the Active RM. If the Active >>>>goes >>>> >down, they resume the round-robin polling until they hit the "new" >>>> >Active.² >>>> > >>>> > >>>> >Does Samza have any plans to support Yarn HA configurations? >>>> >>>> >>> -- >>> Zhijie Shen >>> Hortonworks Inc. >>> http://hortonworks.com/ >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or >>>entity to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the >>>reader >>> of this message is not the intended recipient, you are hereby notified >>>that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender >>>immediately >>> and delete it from your system. Thank You.
