Cory Johns has proposed merging lp:~bigdata-dev/charms/trusty/apache-hadoop-hdfs-secondary/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-hdfs-secondary/trunk.
Requested reviews: Juju Big Data Development (bigdata-dev) For more details, see: https://code.launchpad.net/~bigdata-dev/charms/trusty/apache-hadoop-hdfs-secondary/readme/+merge/252619 New READMEs and minor relation cleanups -- Your team Juju Big Data Development is requested to review the proposed merge of lp:~bigdata-dev/charms/trusty/apache-hadoop-hdfs-secondary/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-hdfs-secondary/trunk.
=== added file 'README.dev.md' --- README.dev.md 1970-01-01 00:00:00 +0000 +++ README.dev.md 2015-03-11 17:16:47 +0000 @@ -0,0 +1,54 @@ +## Overview + +This charm provides checkpointing and transaction consolidation for an Apache +Hadoop deployment, and is intended to be used only as a part of that deployment. +This document describes how this charm connects to and interacts with the +other components of the deployment. + + +## Provided Relations + +*There are no required relations for this charm.* + +## Required Relations + +### namenode (dfs) + +This relation connects this charm to the apache-hadoop-hdfs-master charm. +It is a bi-directional interface, with the following keys being exchanged: + +* Sent to hdfs-master: + + *There are no keys sent to the hdfs-master* + +* Received from hdfs-master: + + * `private-address`: Address of the HDFS master unit, to be used as the NameNode + * `ready`: A flag indicating that HDFS is ready to store data + +Ports will soon be added to this relation. + + +## Manual Deployment + +The easiest way to deploy the core Apache Hadoop platform is to use one of +the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). +However, to manually deploy the base Apache Hadoop platform without using one of the +bundles, you can use the following: + + juju deploy apache-hadoop-hdfs-master hdfs-master + juju deploy apache-hadoop-hdfs-secondary secondary-namenode + juju deploy apache-hadoop-yarn-master yarn-master + juju deploy apache-hadoop-compute-slave compute-slave -n3 + juju deploy apache-hadoop-client client + juju add-relation yarn-master hdfs-master + juju add-relation secondary-namenode hdfs-master + juju add-relation compute-slave yarn-master + juju add-relation compute-slave hdfs-master + juju add-relation client yarn-master + juju add-relation client hdfs-master + +This will create a scalable deployment with separate nodes for each master, +and a three unit compute slave (NodeManager and DataNode) cluster. The master +charms also support co-locating using the `--to` option to `juju deploy` for +more dense deployments. === modified file 'README.md' --- README.md 2015-03-04 01:42:08 +0000 +++ README.md 2015-03-11 17:16:47 +0000 @@ -1,117 +1,31 @@ ## Overview -This charm is a component of the Apache Hadoop platform. It is intended -to be deployed with the other components using the bundle: -`bundle:~bigdata-charmers/apache-hadoop` - -**What is Apache Hadoop?** - The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. -It is designed to scale up from single servers to thousands of machines, -each offering local computation and storage. Rather than rely on hardware -to deliver high-avaiability, the library itself is designed to detect -and handle failures at the application layer, so delivering a -highly-availabile service on top of a cluster of computers, each of -which may be prone to failures. - -Apache Hadoop 2.4.1 consists of significant improvements over the previous stable -release (hadoop-1.x). - -Here is a short overview of the improvments to both HDFS and MapReduce. - - - **HDFS Federation** - In order to scale the name service horizontally, federation uses multiple - independent Namenodes/Namespaces. The Namenodes are federated, that is, the - Namenodes are independent and don't require coordination with each other. - The datanodes are used as common storage for blocks by all the Namenodes. - Each datanode registers with all the Namenodes in the cluster. Datanodes - send periodic heartbeats and block reports and handles commands from the - Namenodes. - - More details are available in the HDFS Federation document: - <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html> - - - **MapReduce NextGen aka YARN aka MRv2** - The new architecture introduced in hadoop-0.23, divides the two major functions of the - JobTracker: resource management and job life-cycle management into separate components. - The new ResourceManager manages the global assignment of compute resources to - applications and the per-application ApplicationMaster manages the application‚ - scheduling and coordination. - An application is either a single job in the sense of classic MapReduce jobs or a DAG of - such jobs. - - The ResourceManager and per-machine NodeManager daemon, which manages the user - processes on that machine, form the computation fabric. - - The per-application ApplicationMaster is, in effect, a framework specific - library and is tasked with negotiating resources from the ResourceManager and - working with the NodeManager(s) to execute and monitor the tasks. - - More details are available in the YARN document: - <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html> +This charm deploys a node running the SecondaryNameNode component +[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/), which +periodically consolidates and checkpoints the work done by the +NameNode component for efficiency and improved recovery in the case +of a failure of the NameNode. ## Usage -This charm manages the HDFS Secondary NameNode. -It is intended to be used with `apache-hadoop-hdfs-master`. - -### Dense Deployment: Single YARN / HDFS master deployment - -In this configuration, the YARN and HDFS master components run on the same -machine. This is useful for lower-resource deployments:: - - juju deploy apache-hadoop-hdfs-master hdfs-master - juju deploy apache-hadoop-hdfs-secondary secondary-namenode --to 1 - juju deploy apache-hadoop-yarn-master yarn-master --to 1 - juju deploy apache-hadoop-compute-slave compute-slave - juju deploy apache-hadoop-client client - juju add-relation yarn-master hdfs-master - juju add-relation secondary-namenode hdfs-master - juju add-relation compute-slave yarn-master - juju add-relation compute-slave hdfs-master - juju add-relation client yarn-master - juju add-relation client hdfs-master - -Note that the machine number (`--to 1`) should match the machine number -for the `hdfs-master` charm. If you previously deployed other services -in your environment, you may need to adjust the machine number appropriately. - - -### Scale Out Deployment: Separate HDFS, YARN, and compute nodes - -In this configuration the HDFS and YARN deployments operate on -different service units as separate services:: - - juju deploy apache-hadoop-hdfs-master hdfs-master - juju deploy apache-hadoop-hdfs-secondary secondary-namenode - juju deploy apache-hadoop-yarn-master yarn-master - juju deploy apache-hadoop-compute-slave compute-slave -n 3 - juju deploy apache-hadoop-client client - juju add-relation yarn-master hdfs-master - juju add-relation secondary-namenode hdfs-master - juju add-relation compute-slave yarn-master - juju add-relation compute-slave hdfs-master - juju add-relation client yarn-master - juju add-relation client hdfs-master - -The `-n 3` option can be adjusted according to the number of compute nodes -you need. You can also add additional compute nodes later:: - - juju add-unit compute-slave -n 2 - - -### To deploy a Hadoop service with elasticsearch service:: - # deploy ElasticSearch locally: - **juju deploy elasticsearch elasticsearch** - # elasticsearch-hadoop.jar file will be added to LIBJARS path - # Recommanded to use hadoop -libjars option to included elk jar file - **juju add-unit -n elasticsearch** - # deploy hive service by any senarios mentioned above - # associate Hive with elasticsearch - **juju add-relation {hadoop master}:elasticsearch elasticsearch:client** +This charm is intended to be deployed via one of the +[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). +For example: + + juju quickstart u/bigdata-dev/apache-analytics-sql + +This will deploy the Apache Hadoop platform with Apache Hive available to +perform SQL-like queries against your data. + +You can also manually load and run map-reduce jobs via the client: + + juju scp my-job.jar client/0: + juju ssh client/0 + hadoop jar my-job.jar ## Deploying in Network-Restricted Environments @@ -120,12 +34,14 @@ access. To deploy in this environment, you will need a local mirror to serve the packages and resources required by these charms. + ### Mirroring Packages You can setup a local mirror for apt packages using squid-deb-proxy. For instructions on configuring juju to use this, see the [Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html). + ### Mirroring Resources In addition to apt packages, the Apache Hadoop charms require a few binary @@ -134,7 +50,7 @@ of these resources: sudo pip install jujuresources - juju resources fetch --all apache-hadoop-hdfs-secondary/resources.yaml -d /tmp/resources + juju resources fetch --all apache-hadoop-compute-slave/resources.yaml -d /tmp/resources juju resources serve -d /tmp/resources This will fetch all of the resources needed by this charm and serve them via a @@ -143,14 +59,19 @@ You can fetch the resources for all of the Apache Hadoop charms (`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`, -`apache-hadoop-compute-slave`, `apache-hadoop-client`, etc) into a single +`apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single directory and serve them all with a single `juju resources serve` instance. ## Contact Information -amir sanjar <[email protected]> + +* Amir Sanjar <[email protected]> +* Cory Johns <[email protected]> +* Kevin Monroe <[email protected]> + ## Hadoop + - [Apache Hadoop](http://hadoop.apache.org/) home page - [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html) - [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html) === modified file 'resources.yaml' --- resources.yaml 2015-03-06 22:41:47 +0000 +++ resources.yaml 2015-03-11 17:16:47 +0000 @@ -8,8 +8,8 @@ six: pypi: six charmhelpers: - pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz - hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b + pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz + hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c hash_type: sha256 optional_resources: hadoop-aarch64:
-- Mailing list: https://launchpad.net/~bigdata-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~bigdata-dev More help : https://help.launchpad.net/ListHelp

