Hi Panagiotis, Apologies for the late response. The issues you are reporting needed some work and research from our side. We appreciate your feedback.
The first (and easiest) ask is to provide a way for the user to restart services. We are currently working on having the services setup as standard system services. We also merged the fix to restart yarn using the resource manager's actions. Note, that restarting yarn from the resource manager should trigger service restarts on the slaves (fixed in https://jujucharms.com/apache-hadoop-resourcemanager/trusty/2). Regarding the config variables, we are considering a way to enable users to update hadoop properties. We are still in the planning phase so I cannot say when this will become available. For the specific property you are suggesting, namely "mapreduce.jobtracker.address" we are are using the default value and we are a bit skeptical if we should deviate from that. How did you reach to the "yarn" variable for the "mapreduce.jobtracker.address"? In the code you showed us, checkLocalJobRunnerConfiguration checks that the job tracker address is not "local" (through a deprecated property), which is actually the default for yarn. Is it possible that Giraph is not supposed to work in yarn? For getting a configuration of Giraph and Hadoop that seem to work together you could look at Apache Bigtop that packages and deploys the respective projects: https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/ giraph/templates/giraph-site.xml https://github.com/apache/bigtop/blob/master/bigtop-deploy/puppet/modules/ hadoop/templates/mapred-site.xml Note however that since the last release of Giraph (Nov 2014) a lot has changed on the Hadoop side. Again, we would like to thank you for your feedback, Konstantinos On Tue, Apr 12, 2016 at 5:23 AM, Panagiotis Liakos <[email protected]> wrote: > Hi again, > > I investigated this issue a little further and I found out that with > YARN there is no longer a single job tracker to run jobs; instead, > each job has its own ApplicationMaster that takes care of execution > flow. > I was not able to build Apache Giraph using the flags suggested for > Hadoop Yarn in the README file of its release (i.e., "mvn > -Phadoop_yarn -Dhadoop.version=2.2.0 <goals>") > Therefore, I built giraph with a Hadoop 2 flag (-Phadoop_2) and I > submit my jobs as MapReduce applications. > > This works well for the SimpleShortestPathsComputation example of > Giraph as long as I have set the 'mapreduce.jobtracker.address' > property I mentioned in my previous e-mail. > However, I am interested in executing the PageRank algorithm. When I > tried to do that my job failed and I found out in the job history logs > the error: "Aggregation is not enabled." > > After searching for this I figured I additionally have to use the > following properties in the yarn-site.xml file (the 1st would probably > be enough): > > <property> > <name>yarn.log-aggregation-enable</name> > <value>true</value> > </property> > <property> > <description>Where to aggregate logs to.</description> > <name>yarn.nodemanager.remote-app-log-dir</name> > <value>/tmp/logs</value> > </property> > <property> > <name>yarn.log-aggregation.retain-seconds</name> > <value>259200</value> > </property> > <property> > <name>yarn.log-aggregation.retain-check-interval-seconds</name> > <value>3600</value> > </property> > > So, I added these to all my slave nodes as well as the > resourcemanager. I could not find a way to restart hadoop so that > these changes may take effect (stop- scripts do not seem to work). My > workaround was to rebuild my environment, add these properties before > creating any relations, and then create my relations. > > And then "Job job_1460456563820_0003 completed successfully!!!" and I > finally have my PageRank values :) > > Perhaps if I was able to build giraph for Hadoop Yarn I would be able > to submit my jobs as Yarn applications without changes in the client, > slave, and resourcemanager configuration. However, I believe that in > order to execute MapReduce jobs, one has to at least set the > 'mapreduce.jobtracker.address' property, as it is also suggested in > this blog post: > > http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/#mapreduce > > --Panagiotis Liakos > > > 2016-04-11 16:30 GMT+03:00 Panagiotis Liakos <[email protected]>: > > Hi all, > > > > I am trying to setup a cluster with juju in the local environment to > > submit jobs with Apache Giraph. You can find the details of my setup > > at the end of this e-mail. > > > > I have downloaded and build Apache Giraph on my hadoop-client and I > > want to try some examples that execute on two workers. > > > > After a number of failed attempts I found out that I have to set > > property: ''mapreduce.jobtracker.address' (or the deprecated > > 'mapred.job.tracker') to 'yarn' in order to run giraph with > 1 > > workers. > > > > In particular, Giraph considered that this property was set to 'local'. > > At first I found out that I can set a custom attribute with: > > -ca giraph.SplitMasterWorker=false > > to execute my job with one worker. > > Then, after finding the code responsible for this > > behavior( > https://github.com/apache/giraph/blob/7e48523b520afee8e727d1e1aaab801a3bd80f06/giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java#L143 > ) > > I was able to set the correct hadoop property and execute my job with > > 2 workers. > > > > My question is, why is this property not set in the juju client charm? > > Does it enable some otherwise undesired behavior? > > I see that 'mapreduce.framework.name' is set to 'yarn' but apparently > > this is not enough for giraph. > > > > Thank you. > > > > --Panagiotis Liakos > > > > -- > Juju mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju > -- Konstantinos Tsakalozos
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
