Yingyi Bu has uploaded a new change for review. https://asterix-gerrit.ics.uci.edu/1576
Change subject: Add documentation for Ansible and AWS installation options. ...................................................................... Add documentation for Ansible and AWS installation options. Change-Id: I0036823392ab6dde8bddbce8b141aaf166f4e3ca --- M README.md A asterixdb/asterix-doc/src/site/markdown/ansible.md A asterixdb/asterix-doc/src/site/markdown/aws.md M asterixdb/asterix-doc/src/site/markdown/index.md M asterixdb/asterix-doc/src/site/markdown/ncservice.md M asterixdb/asterix-doc/src/site/site.xml 6 files changed, 350 insertions(+), 34 deletions(-) git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/76/1576/1 diff --git a/README.md b/README.md index 792e355..105286d 100644 --- a/README.md +++ b/README.md @@ -22,28 +22,28 @@ AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data storage and analysis. AsterixDB has: -- __Data model__</br> +- __Data model__<br/> A semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas -- __Query languages__</br> +- __Query languages__<br/> Two expressive and declarative query languages (SQL++ and AQL) that support a broad range of queries and analysis over semistructured data -- __Scalability__</br> +- __Scalability__<br/> A parallel runtime query execution engine, Apache Hyracks, that has been scale-tested on up to 1000+ cores and 500+ disks -- __Native storage__</br> +- __Native storage__<br/> Partitioned LSM-based data storage and indexing to support efficient ingestion and management of semistructured data -- __External storage__</br> +- __External storage__<br/> Support for query access to externally stored data (e.g., data in HDFS) as well as to data stored natively by AsterixDB -- __Data types__</br> +- __Data types__<br/> A rich set of primitive data types, including spatial and temporal data in addition to integer, floating point, and textual data -- __Indexing__</br> +- __Indexing__<br/> Secondary indexing options that include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types -- __Transactions__</br> +- __Transactions__<br/> Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store Learn more about AsterixDB at its [website](http://asterixdb.apache.org). diff --git a/asterixdb/asterix-doc/src/site/markdown/ansible.md b/asterixdb/asterix-doc/src/site/markdown/ansible.md new file mode 100644 index 0000000..16d8973 --- /dev/null +++ b/asterixdb/asterix-doc/src/site/markdown/ansible.md @@ -0,0 +1,136 @@ +<!-- + ! Licensed to the Apache Software Foundation (ASF) under one + ! or more contributor license agreements. See the NOTICE file + ! distributed with this work for additional information + ! regarding copyright ownership. The ASF licenses this file + ! to you under the Apache License, Version 2.0 (the + ! "License"); you may not use this file except in compliance + ! with the License. You may obtain a copy of the License at + ! + ! http://www.apache.org/licenses/LICENSE-2.0 + ! + ! Unless required by applicable law or agreed to in writing, + ! software distributed under the License is distributed on an + ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ! KIND, either express or implied. See the License for the + ! specific language governing permissions and limitations + ! under the License. + !--> + +## <a id="toc">Table of Contents</a> ## + +* [Introduction](#Introduction) +* [Prerequisites](#Prerequisites) +* [Configuration and parameters](#config) +* [Manage the lifecycle of your instance](#lifecycle) + +## <a id="Introduction">Introduction</a> +This installation option wraps the basic, low-level installation binaries described in the [NCService +installation option](ncservice.html), and provides several simple scripts to deploy, start, stop, +and erase an AsterixDB instance on a cluster without requiring users to interact with each individual +node in the cluster. + +## <a id="Prerequisites">Prerequisites</a> + * Supported operating systems: **Linux** and **MacOS** + + * Install pip on your client machine: + + CentOS: sudo yum install python-pip + Ubuntu: sudo apt-get install python-pip + MacOS: brew install pip + + * Install Ansible, boto, and boto3 on your client machine: + + pip install ansible + pip install boto + pip install boto3 + + Make sure that the version of Ansible is no less than 2.2.1.0. + + * Configure passwordless ssh from your current client that runs the scripts to all nodes listed in conf/inventory. + + * Download a released [simple server package](http://asterixdb.apache.org/download.html). + + Alternatively, you can follow the [instruction](https://github.com/apache/asterixdb#build-from-source) to + build from source. + + * In the extracted directory from the `simple server package`, navigate to `opt/ansible/` + + $cd opt/ansible + + The following files and directories are in the directory `opt/ansible`: + + README bin conf yaml + + `bin` contains scripts that deploy, start, stop and erase an AsterixDB cluster instance, according to + the configuration specified in files under `conf/`. `yaml` contains internal Ansible scripts that the shell + scripts in `bin` use. + +## <a id="config">Configuration and parameters</a> + * **Parameters**. Edit the instance configuration file `conf/cc.conf` when necessary. + You can add/update whatever parameters in the **[common]** and **[nc]** sections (except IPs and ports). + For example: + + [common] + log.level=INFO + + [nc] + txn.log.dir=txnlog + iodevices=iodevice + command=asterixnc + + More parameters and their usage can be found [here](ncservice.html#Parameters). + Note that with this installation option, all parameters in the **[cc]** section will use defaults and cannot be + changed. + + + * **Nodes and account**. Edit the inventory file `conf/inventory` when necessary. + You mostly only need to sepecify the node DNS names (or IPs) for the cluster controller, i.e., the master node, + in the **[cc]** section, and node controllers, i.e., slave nodes, in the **[ncs]** section. + The following example configures a local "cluster" that only has one slave node (localhost) and use + localhost as the master node too. + + [cc] + localhost + + [ncs] + localhost + + If the ssh user account for target machines is different from your current username, please uncomment + and edit the following two lines: + + ;[all:vars] + ;ansible_ssh_user=<fill with your ssh account username> + + If you want to specify advanced Ansible builtin variables, please refer to the following Ansible documentation: + http://docs.ansible.com/ansible/intro_inventory.html. + + * **Remote working directories**. Edit `conf/instance_settings.yml` to change the instance binary directories + when necessary. By default, the binary directory will be under the home directory (as the value of + Ansible builtin variable ansible_env.HOME) of the ssh user account on each node. + + # The parent directory for the working directory. + basedir: "{{ ansible_env.HOME }}" + + # The working directory. + binarydir: "{{ basedir }}/{{ product }}" + + +## <a id="lifecycle">Manage the lifecycle of your instance</a> + * Deploy the binary to all nodes: + + bin/deploy.sh + + * Launch your cluster instance: + + bin/start.sh + + Now you can use the cluster instance. + + * If you want to stop the cluster instance, run the following script: + + bin/stop.sh + + * If you want to remove the binary on all nodes, run the following script: + + bin/erase.sh diff --git a/asterixdb/asterix-doc/src/site/markdown/aws.md b/asterixdb/asterix-doc/src/site/markdown/aws.md new file mode 100644 index 0000000..18a51cb --- /dev/null +++ b/asterixdb/asterix-doc/src/site/markdown/aws.md @@ -0,0 +1,170 @@ +<!-- + ! Licensed to the Apache Software Foundation (ASF) under one + ! or more contributor license agreements. See the NOTICE file + ! distributed with this work for additional information + ! regarding copyright ownership. The ASF licenses this file + ! to you under the Apache License, Version 2.0 (the + ! "License"); you may not use this file except in compliance + ! with the License. You may obtain a copy of the License at + ! + ! http://www.apache.org/licenses/LICENSE-2.0 + ! + ! Unless required by applicable law or agreed to in writing, + ! software distributed under the License is distributed on an + ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ! KIND, either express or implied. See the License for the + ! specific language governing permissions and limitations + ! under the License. + !--> + +## <a id="toc">Table of Contents</a> ## + +* [Introduction](#Introduction) +* [Prerequisites](#Prerequisites) +* [Configuration](#config) +* [Manage the lifecycle of your instance](#lifecycle) + +## <a id="Introduction">Introduction</a> + Note that you can always manually launch a number of Amazon Web Services EC2 instances and then run the + Ansible cluster installation scripts as described [here](ansible.html) separately to manage the + lifecycle of an AsterixDB instance on those EC2 instances. + + However, via this installation option, we provide a combo solution for automating both AWS EC2 + and AsterixDB, where you can run only one script to start/stop an AsterixDB instance on AWS. + +## <a id="Prerequisites">Prerequisites</a> + * Supported operating systems for the client: **Linux** and **MacOS** + + * Supported operating systems for Amazon Web Services instances: **Linux** + + * Install pip on your client machine: + + CentOS: sudo yum install python-pip + Ubuntu: sudo apt-get install python-pip + MacOS: brew install pip + + * Install Ansible, boto, and boto3 on your client machine: + + pip install ansible + pip install boto + pip install boto3 + + Make sure that the version of Ansible is no less than 2.2.1.0. + + * Download a released [simple server package](http://asterixdb.apache.org/download.html). + + Alternatively, you can follow the [instruction](https://github.com/apache/asterixdb#build-from-source) to + build from source. + + * In the extracted directory from the `simple server package`, navigate to `opt/aws/` + + $cd opt/aws + + The following files and directories are in the directory `opt/ansible`: + + README bin conf yaml + + `bin` contains scripts that start and terminate an AWS-based cluster instance, according to the configuration + specified in files under `conf/`. `yaml` contains internal Ansible scripts that the shell scripts in `bin` use. + + * Create an AWS account and an IAM user. + + Set up a security group that you'd like to use for your AWS cluster. + **The security group should at least allow all TCP connection from anywhere.** + Fill `group` in `conf/aws_settings.yml` by the name of the security group. + + * Retrieve your AWS EC2 key pair name and fill that for `keypair` `conf/aws_settings.yml`; + + retrieve your AWS IAM `access key ID` and fill that for `access_key_id` in `conf/aws_settings.yml`; + + retrieve your AWS IAM `secret access key` and fill that for `secret_access_key` in `conf/aws_settings.yml`. + + Note that you can only read or download `access key ID` and `secret access key` once from your AWS console. + If you forget them, you have to create new keys again and delete the old ones. + + * Configure your ssh setting by editing `~/.ssh/config` and adding the following entry: + + Host *.amazonaws.com + IdentityFile <path_of_private_key> + + Note that \<path_of_private_key\> should be replaced by the path to the file that stores the private key for the + key pair that you uploaded to AWS and used in `conf/aws_settings`. For example: + + Host *.amazonaws.com + IdentityFile ~/.ssh/id_rsa + +### <a id="config">Configuration</a> + * **AWS settings**. Edit conf/instance_settings.yml. The meaning of each parameter is listed as follows: + + # The OS image id for ec2 instances. + image: ami-76fa4116 + + # The data center region for ec2 instances. + region: us-west-2 + + # The tag for each ec2 machine. + tag: scale_test + + # The name of a security group that appears in your AWS console. + group: default + + # The name of a key pair that appears in your AWS console. + keypair: <to be filled> + + # The AWS access key id for your IAM user. + access_key_id: <to be filled> + + # The AWS secrety key for your IAM user. + secret_access_key: <to be filled> + + # The AWS instance type. A full list of available types are listed at: + # https://aws.amazon.com/ec2/instance-types/ + instance_type: t2.micro + + # The number of ec2 instances that construct a cluster. + count: 3 + + # The user name. + user: ec2-user + + # Whether to reuse one nc machine to host cc. + cc_on_nc: false + + **As described in [prerequisites](#Prerequisites), the following parameters must be customized correctly:** + + # The name of a security group that appears in your AWS console. + group: default + + # The name of a key pair that appears in your AWS console. + keypair: <to be filled> + + # The AWS access key id for your IAM user. + access_key_id: <to be filled> + + # The AWS secrety key for your IAM user. + secret_access_key: <to be filled> + + * **Remote working directories**. Edit conf/instance_settings.yml to change the instance binary directories + when necessary. By default, the binary directory will be under the home directory (as the value of + Ansible builtin variable ansible_env.HOME) of the ssh user account on each node. + + # The parent directory for the working directory. + basedir: "{{ ansible_env.HOME }}" + + # The working directory. + binarydir: "{{ basedir }}/{{ product }}" + + +### <a id="lifecycle">Manage the lifecycle of your instance</a> + * Start an AWS-based cluster: + + bin/start.sh + + Now you can use the cluster instance through the public IP or DNS name of the master node. + + * If you want to stop the cluster instance, run the following script: + + bin/stop.sh + + Note that it will destroy everything in the cluster instance you installed and terminates all AWS nodes + for the cluster. diff --git a/asterixdb/asterix-doc/src/site/markdown/index.md b/asterixdb/asterix-doc/src/site/markdown/index.md index bc7ca9c..303678d 100644 --- a/asterixdb/asterix-doc/src/site/markdown/index.md +++ b/asterixdb/asterix-doc/src/site/markdown/index.md @@ -19,26 +19,34 @@ # AsterixDB # -AsterixDB is a BDMS (Big Data Management System) with a rich feature set that -sets it apart from other Big Data platforms. -Its feature set makes it well-suited to modern needs such as web data -warehousing and social data storage and analysis. AsterixDB has: +AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data +platforms. Its feature set makes it well-suited to modern needs such as web data warehousing and social data +storage and analysis. AsterixDB has: - * A semistructured NoSQL style data model (ADM) resulting from extending JSON - with object database ideas - * Two expressive and declarative query languages (SQL++ and AQL) that support a broad - range of queries and analysis over semistructured data - * A parallel runtime query execution engine, Apache Hyracks, that has been - scale-tested on up to 1000+ cores and 500+ disks - * Partitioned LSM-based data storage and indexing to support efficient - ingestion and management of semistructured data - * Support for query access to externally stored data (e.g., data in HDFS) as - well as to data stored natively by AsterixDB - * A rich set of primitive data types, including spatial and temporal data in - addition to integer, floating point, and textual data - * Secondary indexing options that include B+ trees, R trees, and inverted - keyword (exact and fuzzy) index types - * Support for fuzzy and spatial queries as well as for more traditional - parametric queries - * Basic transactional (concurrency and recovery) capabilities akin to those of - a NoSQL store +- __Data model__<br/> +A semistructured NoSQL style data model ([ADM](datamodel.html)) resulting from extending JSON with object database ideas + +- __Query languages__<br/> +Two expressive and declarative query languages ([SQL++](sqlpp/manual.html) and [AQL](aql/manual.html)) that +support a broad range of queries and analysis over semistructured data + +- __Scalability__<br/> +A parallel runtime query execution engine, Apache Hyracks, that has been scale-tested on up to 1000+ cores and +500+ disks + +- __Native storage__<br/> +Partitioned LSM-based data storage and indexing to support efficient ingestion and management of semistructured data + +- __External storage__<br/> +Support for query access to externally stored data (e.g., data in HDFS) as well as to data stored natively by AsterixDB + +- __Data types__<br/> +A rich set of primitive data types, including spatial and temporal data in addition to integer, floating point, +and textual data + +- __Indexing__<br/> +Secondary indexing options that include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types + +- __Transactions__<br/> +Basic transactional (concurrency and recovery) capabilities akin to those of a NoSQL store + diff --git a/asterixdb/asterix-doc/src/site/markdown/ncservice.md b/asterixdb/asterix-doc/src/site/markdown/ncservice.md index 243d908..ad36a58 100644 --- a/asterixdb/asterix-doc/src/site/markdown/ncservice.md +++ b/asterixdb/asterix-doc/src/site/markdown/ncservice.md @@ -38,7 +38,7 @@ This folder should contain 4 scripts, two pairs of `.sh` and `.bat` files respectively. `start-sample-cluster.sh` will simply start a basic sample cluster -using the coniguration files located in `samples/local/conf/`. +using the configuration files located in `samples/local/conf/`. user@localhost:~/a/o/l/bin $./start-sample-cluster.sh @@ -374,7 +374,7 @@ | common | txn.log.partitionsize | N/A | 268435456 (256 MB) | -# For the optional NCService process configuration file, the following parameters, under "[ncservice]" section. +For the optional NCService process configuration file, the following parameters, under "[ncservice]" section. | Parameter | Meaning | Default | |----------|--------|-------| diff --git a/asterixdb/asterix-doc/src/site/site.xml b/asterixdb/asterix-doc/src/site/site.xml index 99c87e5..3e768bf 100644 --- a/asterixdb/asterix-doc/src/site/site.xml +++ b/asterixdb/asterix-doc/src/site/site.xml @@ -75,8 +75,10 @@ <menu name="Get Started - Installation"> <item name="Option 1: using NCService" href="ncservice.html"/> - <item name="Option 2: using Managix" href="install.html"/> - <item name="Option 3: using YARN" href="yarn.html"/> + <item name="Option 2: using Ansible" href="ansible.html"/> + <item name="Option 3: using Amazon Web Services" href="aws.html"/> + <item name="Option 4: using YARN" href="yarn.html"/> + <item name="Option 5: using Managix (deprecated)" href="install.html"/> </menu> <menu name = "AsterixDB Primer"> -- To view, visit https://asterix-gerrit.ics.uci.edu/1576 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0036823392ab6dde8bddbce8b141aaf166f4e3ca Gerrit-PatchSet: 1 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Yingyi Bu <[email protected]>
