http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/deployment-resource-isolation.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/deployment-resource-isolation.md b/docs/docs/deployment/deployment-resource-isolation.md deleted file mode 100644 index ee47802..0000000 --- a/docs/docs/deployment/deployment-resource-isolation.md +++ /dev/null @@ -1,112 +0,0 @@ -CGroup (abbreviated from control groups) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups.In Gearpump, we use cgroup to manage CPU resources. - -## Start CGroup Service - -CGroup feature is only supported by Linux whose kernel version is larger than 2.6.18. Please also make sure the SELinux is disabled before start CGroup. - -The following steps are supposed to be executed by root user. - -1. Check `/etc/cgconfig.conf` exist or not. If not exists, please `yum install libcgroup`. - -2. Run following command to see whether the **cpu** subsystem is already mounted to the file system. - - :::bash - lssubsys -m - - Each subsystem in CGroup will have a corresponding mount file path in local file system. For example, the following output shows that **cpu** subsystem is mounted to file path `/sys/fs/cgroup/cpu` - - :::bash - cpu /sys/fs/cgroup/cpu - net_cls /sys/fs/cgroup/net_cls - blkio /sys/fs/cgroup/blkio - perf_event /sys/fs/cgroup/perf_event - - -3. If you want to assign permission to user **gear** to launch Gearpump Worker and applications with resource isolation enabled, you need to check gear's uid and gid in `/etc/passwd` file, let's take **500** for example. - -4. Add following content to `/etc/cgconfig.conf` - - - # The mount point of cpu subsystem. - # If your system already mounted it, this segment should be eliminated. - mount { - cpu = /cgroup/cpu; - } - - # Here the group name "gearpump" represents a node in CGroup's hierarchy tree. - # When the CGroup service is started, there will be a folder generated under the mount point of cpu subsystem, - # whose name is "gearpump". - - group gearpump { - perm { - task { - uid = 500; - gid = 500; - } - admin { - uid = 500; - gid = 500; - } - } - cpu { - } - } - - - Please note that if the output of step 2 shows that **cpu** subsystem is already mounted, then the `mount` segment should not be included. - -4. Then Start cgroup service - - :::bash - sudo service cgconfig restart - - -5. There should be a folder **gearpump** generated under the mount point of cpu subsystem and its owner is **gear:gear**. - -6. Repeat the above-mentioned steps on each machine where you want to launch Gearpump. - -## Enable Cgroups in Gearpump -1. Login into the machine which has CGroup prepared with user **gear**. - - :::bash - ssh gear@node - - -2. Enter into Gearpump's home folder, edit gear.conf under folder `${GEARPUMP_HOME}/conf/` - - :::bash - gearpump.worker.executor-process-launcher = "org.apache.gearpump.cluster.worker.CGroupProcessLauncher" - - gearpump.cgroup.root = "gearpump" - - - Please note the gearpump.cgroup.root **gearpump** must be consistent with the group name in /etc/cgconfig.conf. - -3. Repeat the above-mentioned steps on each machine where you want to launch Gearpump - -4. Start the Gearpump cluster, please refer to [Deploy Gearpump in Standalone Mode](deployment-standalone) - -## Launch Application From Command Line -1. Login into the machine which has Gearpump distribution. - -2. Enter into Gearpump's home folder, edit gear.conf under folder `${GEARPUMP_HOME}/conf/` - - :::bash - gearpump.cgroup.cpu-core-limit-per-executor = ${your_preferred_int_num} - - - Here the configuration is the number of CPU cores per executor can use and -1 means no limitation - -3. Submit application - - :::bash - bin/gear app -jar examples/sol-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}-assembly.jar -streamProducer 10 -streamProcessor 10 - - -4. Then you can run command `top` to monitor the cpu usage. - -## Launch Application From Dashboard -If you want to submit the application from dashboard, by default the `gearpump.cgroup.cpu-core-limit-per-executor` is inherited from Worker's configuration. You can provide your own conf file to override it. - -## Limitations -Windows and Mac OS X don't support CGroup, so the resource isolation will not work even if you turn it on. There will not be any limitation for single executor's cpu usage.
http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/deployment-security.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/deployment-security.md b/docs/docs/deployment/deployment-security.md deleted file mode 100644 index e20fc67..0000000 --- a/docs/docs/deployment/deployment-security.md +++ /dev/null @@ -1,80 +0,0 @@ -Until now Gearpump supports deployment in a secured Yarn cluster and writing to secured HBase, where "secured" means Kerberos enabled. -Further security related feature is in progress. - -## How to launch Gearpump in a secured Yarn cluster -Suppose user `gear` will launch gearpump on YARN, then the corresponding principal `gear` should be created in KDC server. - -1. Create Kerberos principal for user `gear`, on the KDC machine - - :::bash - sudo kadmin.local - - In the kadmin.local or kadmin shell, create the principal - - :::bash - kadmin: addprinc gear/[email protected] - - Remember that user `gear` must exist on every node of Yarn. - -2. Upload the gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip to remote HDFS Folder, suggest to put it under `/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip` - -3. Create HDFS folder /user/gear/, make sure all read-write rights are granted for user `gear` - - :::bash - drwxr-xr-x - gear gear 0 2015-11-27 14:03 /user/gear - - -4. Put the YARN configurations under classpath. - Before calling `yarnclient launch`, make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under `$HADOOP_HOME/etc/hadoop` from one of the YARN cluster machine to `conf/yarnconf` of gearpump. `$HADOOP_HOME` points to the Hadoop installation directory. - -5. Get Kerberos credentials to submit the job: - - :::bash - kinit gearpump/[email protected] - - - Here you can login with keytab or password. Please refer Kerberos's document for details. - - :::bash - yarnclient launch -package /usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip - - -## How to write to secured HBase -When the remote HBase is security enabled, a kerberos keytab and the corresponding principal name need to be -provided for the gearpump-hbase connector. Specifically, the `UserConfig` object passed into the HBaseSink should contain -`{("gearpump.keytab.file", "\\$keytab"), ("gearpump.kerberos.principal", "\\$principal")}`. example code of writing to secured HBase: - - :::scala - val principal = "gearpump/[email protected]" - val keytabContent = Files.toByteArray(new File("path_to_keytab_file")) - val appConfig = UserConfig.empty - .withString("gearpump.kerberos.principal", principal) - .withBytes("gearpump.keytab.file", keytabContent) - val sink = new HBaseSink(appConfig, "$tableName") - val sinkProcessor = DataSinkProcessor(sink, "$sinkNum") - val split = Processor[Split]("$splitNum") - val computation = split ~> sinkProcessor - val application = StreamApplication("HBase", Graph(computation), UserConfig.empty) - - -Note here the keytab file set into config should be a byte array. - -## Future Plan - -### More external components support -1. HDFS -2. Kafka - -### Authentication(Kerberos) -Since Gearpumpâs Master-Worker structure is similar to HDFSâs NameNode-DataNode and Yarnâs ResourceManager-NodeManager, we may follow the way they use. - -1. User creates kerberos principal and keytab for Gearpump. -2. Deploy the keytab files to all the cluster nodes. -3. Configure Gearpumpâs conf file, specify kerberos principal and local keytab file location. -4. Start Master and Worker. - -Every application has a submitter/user. We will separate the application from different users, like different log folders for different applications. -Only authenticated users can submit the application to Gearpump's Master. - -### Authorization -Hopefully more on this soon http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/deployment-standalone.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/deployment-standalone.md b/docs/docs/deployment/deployment-standalone.md deleted file mode 100644 index c9d5549..0000000 --- a/docs/docs/deployment/deployment-standalone.md +++ /dev/null @@ -1,59 +0,0 @@ -Standalone mode is a distributed cluster mode. That is, Gearpump runs as service without the help from other services (e.g. YARN). - -To deploy Gearpump in cluster mode, please first check that the [Pre-requisites](hardware-requirement) are met. - -### How to Install -You need to have Gearpump binary at hand. Please refer to [How to get gearpump distribution](get-gearpump-distribution) to get the Gearpump binary. - -You are suggested to unzip the package to same directory path on every machine you planned to install Gearpump. -To install Gearpump, you at least need to change the configuration in `conf/gear.conf`. - -Config | Default value | Description ------------- | ---------------|------------ -gearpump.hostname | "127.0.0.1" | Host or IP address of current machine. The ip/host need to be reachable from other machines in the cluster. -gearpump.cluster.masters | ["127.0.0.1:3000"] | List of all master nodes, with each item represents host and port of one master. -gearpump.worker.slots | 1000 | how many slots this worker has - -Besides this, there are other optional configurations related with logs, metrics, transports, ui. You can refer to [Configuration Guide](deployment-configuration) for more details. - -### Start the Cluster Daemons in Standlone mode -In Standalone mode, you can start master and worker in different JVMs. - -##### To start master: - - :::bash - bin/master -ip xx -port xx - -The ip and port will be checked against settings under `conf/gear.conf`, so you need to make sure they are consistent. - -**NOTE:** You may need to execute `chmod +x bin/*` in shell to make the script file `master` executable. - -**NOTE**: for high availability, please check [Master HA Guide](deployment-ha) - -##### To start worker: - - :::bash - bin/worker - -### Start UI - - :::bash - bin/services - - -After UI is started, you can browse to `http://{web_ui_host}:8090` to view the cluster status. -The default username and password is "admin:admin", you can check -[UI Authentication](deployment-ui-authentication) to find how to manage users. - - - -**NOTE:** The UI port can be configured in `gear.conf`. Check [Configuration Guide](deployment-configuration) for information. - -### Bash tool to start cluster - -There is a bash tool `bin/start-cluster.sh` can launch the cluster conveniently. You need to change the file `conf/masters`, `conf/workers` and `conf/dashboard` to specify the corresponding machines. -Before running the bash tool, please make sure the Gearpump package is already unzipped to the same directory path on every machine. -`bin/stop-cluster.sh` is used to stop the whole cluster of course. - -The bash tool is able to launch the cluster without changing the `conf/gear.conf` on every machine. The bash sets the `gearpump.cluster.masters` and other configurations using JAVA_OPTS. -However, please note when you log into any these unconfigured machine and try to launch the dashboard or submit the application, you still need to modify `conf/gear.conf` manually because the JAVA_OPTS is missing. http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/deployment-ui-authentication.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/deployment-ui-authentication.md b/docs/docs/deployment/deployment-ui-authentication.md deleted file mode 100644 index 0990192..0000000 --- a/docs/docs/deployment/deployment-ui-authentication.md +++ /dev/null @@ -1,290 +0,0 @@ -## What is this about? - -## How to enable UI authentication? - -1. Change config file gear.conf, find entry `gearpump-ui.gearpump.ui-security.authentication-enabled`, change the value to true - - :::bash - gearpump-ui.gearpump.ui-security.authentication-enabled = true - - - Restart the UI dashboard, and then the UI authentication is enabled. It will prompt for user name and password. - -## How many authentication methods Gearpump UI server support? - -Currently, It supports: - -1. Username-Password based authentication and -2. OAuth2 based authentication. - -User-Password based authentication is enabled when `gearpump-ui.gearpump.ui-security.authentication-enabled`, - and **CANNOT** be disabled. - -UI server admin can also choose to enable **auxiliary** OAuth2 authentication channel. - -## User-Password based authentication - - User-Password based authentication covers all authentication scenarios which requires - user to enter an explicit username and password. - - Gearpump provides a built-in ConfigFileBasedAuthenticator which verify user name and password - against password hashcode stored in config files. - - However, developer can choose to extends the `org.apache.gearpump.security.Authenticator` to provide a custom - User-Password based authenticator, to support LDAP, Kerberos, and Database-based authentication... - -### ConfigFileBasedAuthenticator: built-in User-Password Authenticator - -ConfigFileBasedAuthenticator store all user name and password hashcode in configuration file gear.conf. Here -is the steps to configure ConfigFileBasedAuthenticator. - -#### How to add or remove user? - -For the default authentication plugin, it has three categories of users: admins, users, and guests. - -* admins: have unlimited permission, like shutdown a cluster, add/remove machines. -* users: have limited permission to submit an application and etc.. -* guests: can not submit/kill applications, but can view the application status. - -System administrator can add or remove user by updating config file `conf/gear.conf`. - -Suppose we want to add user jerry as an administrator, here are the steps: - -1. Pick a password, and generate the digest for this password. Suppose we use password `ilovegearpump`, - to generate the digest: - - :::bash - bin/gear org.apache.gearpump.security.PasswordUtil -password ilovegearpump - - - It will generate a digest value like this: - - :::bash - CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA== - - -2. Change config file conf/gear.conf at path `gearpump-ui.gearpump.ui-security.config-file-based-authenticator.admins`, - add user `jerry` in this list: - - :::bash - admins = { - ## Default Admin. Username: admin, password: admin - ## !!! Please replace this builtin account for production cluster for security reason. !!! - "admin" = "AeGxGOxlU8QENdOXejCeLxy+isrCv0TrS37HwA==" - "jerry" = "CgGxGOxlU8ggNdOXejCeLxy+isrCv0TrS37HwA==" - } - - -3. Restart the UI dashboard by `bin/services` to make the change effective. - -4. Group "admins" have very unlimited permission, you may want to restrict the permission. In that case - you can modify `gearpump-ui.gearpump.ui-security.config-file-based-authenticator.users` or - `gearpump-ui.gearpump.ui-security.config-file-based-authenticator.guests`. - -5. See description at `conf/gear.conf` to find more information. - -#### What is the default user and password? - -For ConfigFileBasedAuthenticator, Gearpump distribution is shipped with two default users: - -1. username: admin, password: admin -2. username: guest, password: guest - -User `admin` has unlimited permissions, while `guest` can only view the application status. - -For security reason, you need to remove the default users `admin` and `guest` for cluster in production. - -#### Is this secure? - -Firstly, we will NOT store any user password in any way so only the user himself knows the password. -We will use one-way hash digest to verify the user input password. - -### How to develop a custom User-Password Authenticator for LDAP, Database, and etc.. - -If developer choose to define his/her own User-Password based authenticator, it is required that user - modify configuration option: - - :::bash - ## Replace "org.apache.gearpump.security.CustomAuthenticator" with your real authenticator class. - gearpump.ui-security.authenticator = "org.apache.gearpump.security.CustomAuthenticator" - - -Make sure CustomAuthenticator extends interface: - - :::scala - trait Authenticator { - - def authenticate(user: String, password: String, ec: ExecutionContext): Future[AuthenticationResult] - } - - -## OAuth2 based authentication - -OAuth2 based authentication is commonly use to achieve social login with social network account. - -Gearpump provides generic OAuth2 Authentication support which allow user to extend to support new authentication sources. - -Basically, OAuth2 based Authentication contains these steps: - 1. User accesses Gearpump UI website, and choose to login with OAuth2 server. - 2. Gearpump UI website redirects user to OAuth2 server domain authorization endpoint. - 3. End user complete the authorization in the domain of OAuth2 server. - 4. OAuth2 server redirects user back to Gearpump UI server. - 5. Gearpump UI server verify the tokens and extract credentials from query - parameters and form fields. - -### Terminologies - -For terms like client Id, and client secret, please refers to guide [RFC 6749](https://tools.ietf.org/html/rfc6749) - -### Enable web proxy for UI server - -To enable OAuth2 authentication, the Gearpump UI server should have network access to OAuth2 server, as - some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make - sure you have configured the proxy properly for UI server. - -#### If you are on Windows - - :::bash - set JAVA_OPTS=-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088 - bin/services - - -#### If you are on Linux - - :::bash - export JAVA_OPTS="-Dhttp.proxyHost=xx.com -Dhttp.proxyPort=8088 -Dhttps.proxyHost=xx.com -Dhttps.proxyPort=8088" - bin/services - - -### Google Plus OAuth2 Authenticator - -Google Plus OAuth2 Authenticator does authentication with Google OAuth2 service. It extracts the email address -from Google user profile as credentials. - -To use Google OAuth2 Authenticator, there are several steps: - -1. Register your application (Gearpump UI server here) as an application to Google developer console. -2. Configure the Google OAuth2 information in gear.conf -3. Configure network proxy for Gearpump UI server if applies. - -#### Step1: Register your website as an OAuth2 Application on Google - -1. Create an application representing your website at [https://console.developers.google.com](https://console.developers.google.com) -2. In "API Manager" of your created application, enable API "Google+ API" -3. Create OAuth client ID for this application. In "Credentials" tab of "API Manager", -choose "Create credentials", and then select OAuth client ID. Follow the wizard -to set callback URL, and generate client ID, and client Secret. - -**NOTE:** Callback URL is NOT optional. - -#### Step2: Configure the OAuth2 information in gear.conf - -1. Enable OAuth2 authentication by setting `gearpump.ui-security.oauth2-authenticator-enabled` -as true. -2. Configure section `gearpump.ui-security.oauth2-authenticators.google` in gear.conf. Please make sure -class name, client ID, client Secret, and callback URL are set properly. - -**NOTE:** Callback URL set here should match what is configured on Google in step1. - -#### Step3: Configure the network proxy if applies. - -To enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as - some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make - sure you have configured the proxy properly for UI server. - -For guide of how to configure web proxy for UI server, please refer to section "Enable web proxy for UI server" above. - -#### Step4: Restart the UI server and try to click the Google login icon on UI server. - -### CloudFoundry UAA server OAuth2 Authenticator - -CloudFoundryUaaAuthenticator does authentication by using CloudFoundry UAA OAuth2 service. It extracts the email address - from Google user profile as credentials. - -For what is UAA (User Account and Authentication Service), please see guide: [UAA](https://github.com/cloudfoundry/uaa) - -To use Google OAuth2 Authenticator, there are several steps: - -1. Register your application (Gearpump UI server here) as an application to UAA with helper tool `uaac`. -2. Configure the Google OAuth2 information in gear.conf -3. Configure network proxy for Gearpump UI server if applies. - -#### Step1: Register your application to UAA with `uaac` - -1. Check tutorial on uaac at [https://docs.cloudfoundry.org/adminguide/uaa-user-management.html](https://docs.cloudfoundry.org/adminguide/uaa-user-management.html) -2. Open a bash shell, set the UAA server by command `uaac target` - - :::bash - uaac target [your uaa server url] - - -3. Login in as user admin by - - :::bash - uaac token client get admin -s MyAdminPassword - - -4. Create a new Application (Client) in UAA, - - :::bash - uaac client add [your_client_id] - --scope "openid cloud_controller.read" - --authorized_grant_types "authorization_code client_credentials refresh_token" - --authorities "openid cloud_controller.read" - --redirect_uri [your_redirect_url] - --autoapprove true - --secret [your_client_secret] - - -#### Step2: Configure the OAuth2 information in gear.conf - -1. Enable OAuth2 authentication by setting `gearpump.ui-security.oauth2-authenticator-enabled` as true. -2. Navigate to section `gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa` -3. Config gear.conf `gearpump.ui-security.oauth2-authenticators.cloudfoundryuaa` section. -Please make sure class name, client ID, client Secret, and callback URL are set properly. - -**NOTE:** The callback URL here should match what you set on CloudFoundry UAA in step1. - -#### Step3: Configure network proxy for Gearpump UI server if applies - -To enable OAuth2 authentication, the Gearpump UI server should have network access to Google service, as - some requests are initiated directly inside Gearpump UI server. So, if you are behind a firewall, make - sure you have configured the proxy properly for UI server. - -For guide of how to configure web proxy for UI server, please refer to please refer to section "Enable web proxy for UI server" above. - -#### Step4: Restart the UI server and try to click the CloudFoundry login icon on UI server. - -#### Step5: You can also enable additional authenticator for CloudFoundry UAA by setting config: - - :::bash - additional-authenticator-enabled = true - - -Please see description in gear.conf for more information. - -#### Extends OAuth2Authenticator to support new Authorization service like Facebook, or Twitter. - -You can follow the Google OAuth2 example code to define a custom OAuth2Authenticator. Basically, the steps includes: - -1. Define an OAuth2Authenticator implementation. - -2. Add an configuration entry under `gearpump.ui-security.oauth2-authenticators`. For example: - - ## name of this authenticator - "socialnetworkx" { - "class" = "org.apache.gearpump.services.security.oauth2.impl.SocialNetworkXAuthenticator" - - ## Please make sure this URL matches the name - "callback" = "http://127.0.0.1:8090/login/oauth2/socialnetworkx/callback" - - "clientId" = "gearpump_test2" - "clientSecret" = "gearpump_test2" - "defaultUserRole" = "guest" - - ## Make sure socialnetworkx.png exists under dashboard/icons - "icon" = "/icons/socialnetworkx.png" - } - - - The configuration entry is supposed to be used by class `SocialNetworkXAuthenticator`. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/deployment-yarn.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/deployment-yarn.md b/docs/docs/deployment/deployment-yarn.md deleted file mode 100644 index 401fa46..0000000 --- a/docs/docs/deployment/deployment-yarn.md +++ /dev/null @@ -1,135 +0,0 @@ -## How to launch a Gearpump cluster on YARN - -1. Upload the `gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip` to remote HDFS Folder, suggest to put it under `/usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip` - -2. Make sure the home directory on HDFS is already created and all read-write rights are granted for user. For example, user gear's home directory is `/user/gear` - -3. Put the YARN configurations under classpath. - Before calling `yarnclient launch`, make sure you have put all yarn configuration files under classpath. Typically, you can just copy all files under `$HADOOP_HOME/etc/hadoop` from one of the YARN Cluster machine to `conf/yarnconf` of gearpump. `$HADOOP_HOME` points to the Hadoop installation directory. - -4. Launch the gearpump cluster on YARN - - :::bash - yarnclient launch -package /usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip - - - If you don't specify package path, it will read default package-path (`gearpump.yarn.client.package-path`) from `gear.conf`. - - **NOTE:** You may need to execute `chmod +x bin/*` in shell to make the script file `yarnclient` executable. - -5. After launching, you can browser the Gearpump UI via YARN resource manager dashboard. - -## How to configure the resource limitation of Gearpump cluster - -Before launching a Gearpump cluster, please change configuration section `gearpump.yarn` in `gear.conf` to configure the resource limitation, like: - -1. The number of worker containers. -2. The YARN container memory size for worker and master. - -## How to submit a application to Gearpump cluster. - -To submit the jar to the Gearpump cluster, we first need to know the Master address, so we need to get -a active configuration file first. - -There are two ways to get an active configuration file: - -1. Option 1: specify "-output" option when you launch the cluster. - - :::bash - yarnclient launch -package /usr/lib/gearpump/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip -output /tmp/mycluster.conf - - - It will return in console like this: - - :::bash - ==Application Id: application_1449802454214_0034 - - - -2. Option 2: Query the active configuration file - - :::bash - yarnclient getconfig -appid <yarn application id> -output /tmp/mycluster.conf - - - yarn application id can be found from the output of step1 or from YARN dashboard. - -3. After you downloaded the configuration file, you can launch application with that config file. - - :::bash - gear app -jar examples/wordcount-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.jar -conf /tmp/mycluster.conf - - -4. To run Storm application over Gearpump on YARN, please store the configuration file with `-output application.conf` - and then launch Storm application with - - :::bash - storm -jar examples/storm-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.jar storm.starter.ExclamationTopology exclamation - - -5. Now the application is running. To check this: - - :::bash - gear info -conf /tmp/mycluster.conf - - -6. To Start a UI server, please do: - - :::bash - services -conf /tmp/mycluster.conf - - - The default username and password is "admin:admin", you can check [UI Authentication](deployment-ui-authentication) to find how to manage users. - - -## How to add/remove machines dynamically. - -Gearpump yarn tool allows to dynamically add/remove machines. Here is the steps: - -1. First, query to get active resources. - - :::bash - yarnclient query -appid <yarn application id> - - - The console output will shows how many workers and masters there are. For example, I have output like this: - - :::bash - masters: - container_1449802454214_0034_01_000002(IDHV22-01:35712) - workers: - container_1449802454214_0034_01_000003(IDHV22-01:35712) - container_1449802454214_0034_01_000006(IDHV22-01:35712) - - -2. To add a new worker machine, you can do: - - :::bash - yarnclient addworker -appid <yarn application id> -count 2 - - - This will add two new workers machines. Run the command in first step to check whether the change is effective. - -3. To remove old machines, use: - - :::bash - yarnclient removeworker -appid <yarn application id> -container <worker container id> - - - The worker container id can be found from the output of step 1. For example "container_1449802454214_0034_01_000006" is a good container id. - -## Other usage: - -1. To kill a cluster, - - :::bash - yarnclient kill -appid <yarn application id> - - - **NOTE:** If the application is not launched successfully, then this command won't work. Please use "yarn application -kill <appId>" instead. - -2. To check the Gearpump version - - :::bash - yarnclient version -appid <yarn application id> - http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/get-gearpump-distribution.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/get-gearpump-distribution.md b/docs/docs/deployment/get-gearpump-distribution.md deleted file mode 100644 index 8a31113..0000000 --- a/docs/docs/deployment/get-gearpump-distribution.md +++ /dev/null @@ -1,83 +0,0 @@ -### Prepare the binary -You can either download pre-build release package or choose to build from source code. - -#### Download Release Binary - -If you choose to use pre-build package, then you don't need to build from source code. The release package can be downloaded from: - -##### [Download page](http://gearpump.incubator.apache.org/downloads.html) - -#### Build from Source code - -If you choose to build the package from source code yourself, you can follow these steps: - -1). Clone the Gearpump repository - - :::bash - git clone https://github.com/apache/incubator-gearpump.git - cd gearpump - - -2). Build package - - :::bash - ## Please use scala 2.11 - ## The target package path: output/target/gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip - sbt clean assembly packArchiveZip - - - After the build, there will be a package file gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip generated under output/target/ folder. - - **NOTE:** - Please set JAVA_HOME environment before the build. - - On linux: - - :::bash - export JAVA_HOME={path/to/jdk/root/path} - - - On Windows: - - :::bash - set JAVA_HOME={path/to/jdk/root/path} - - - **NOTE:** -The build requires network connection. If you are behind an enterprise proxy, make sure you have set the proxy in your env before running the build commands. -For windows: - - :::bash - set HTTP_PROXY=http://host:port - set HTTPS_PROXY= http://host:port - - -For Linux: - - :::bash - export HTTP_PROXY=http://host:port - export HTTPS_PROXY= http://host:port - - -### Gearpump package structure - -You need to flatten the `.zip` file to use it. On Linux, you can - - :::bash - unzip gearpump-{{SCALA_BINARY_VERSION}}-{{GEARPUMP_VERSION}}.zip - - -After decompression, the directory structure looks like picture 1. - - - -Under bin/ folder, there are script files for Linux(bash script) and Windows(.bat script). - -script | function ---------|------------ -local | You can start the Gearpump cluster in single JVM(local mode), or in a distributed cluster(cluster mode). To start the cluster in local mode, you can use the local /local.bat helper scripts, it is very useful for developing or troubleshooting. -master | To start Gearpump in cluster mode, you need to start one or more master nodes, which represent the global resource management center. master/master.bat is launcher script to boot the master node. -worker | To start Gearpump in cluster mode, you also need to start several workers, with each worker represent a set of local resources. worker/worker.bat is launcher script to start the worker node. -services | This script is used to start backend REST service and other services for frontend UI dashboard (Default user "admin, admin"). - -Please check [Command Line Syntax](../introduction/commandline) for more information for each script. http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/deployment/hardware-requirement.md ---------------------------------------------------------------------- diff --git a/docs/docs/deployment/hardware-requirement.md b/docs/docs/deployment/hardware-requirement.md deleted file mode 100644 index dfe765b..0000000 --- a/docs/docs/deployment/hardware-requirement.md +++ /dev/null @@ -1,30 +0,0 @@ -### Pre-requisite - -Gearpump cluster can be installed on Windows OS and Linux. - -Before installation, you need to decide how many machines are used to run this cluster. - -For each machine, the requirements are listed in table below. - -**Table: Environment requirement on single machine** - -Resource | Requirements ------------- | --------------------------- -Memory | 2GB free memory is required to run the cluster. For any production system, 32GB memory is recommended. -Java | JRE 6 or above -User permission | Root permission is not required -Network Ethernet |(TCP/IP) -CPU | Nothing special -HDFS installation | Default is not required. You only need to install it when you want to store the application jars in HDFS. -Kafka installation | Default is not required. You need to install Kafka when you want the at-least once message delivery feature. Currently, the only supported data source for this feature is Kafka - -**Table: The default port used in Gearpump:** - -| usage | Port | Description | ------------- | ---------------|------------ - Dashboard UI | 8090 | Web UI. -Dashboard web socket service | 8091 | UI backend web socket service for long connection. -Master port | 3000 | Every other role like worker, appmaster, executor, user use this port to communicate with Master. - -You need to ensure that your firewall has not banned these ports to ensure Gearpump can work correctly. -And you can modify the port configuration. Check [Configuration](deployment-configuration) section for details. http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/dev/dev-connectors.md ---------------------------------------------------------------------- diff --git a/docs/docs/dev/dev-connectors.md b/docs/docs/dev/dev-connectors.md deleted file mode 100644 index 01deb3e..0000000 --- a/docs/docs/dev/dev-connectors.md +++ /dev/null @@ -1,237 +0,0 @@ -## Basic Concepts -`DataSource` and `DataSink` are the two main concepts Gearpump use to connect with the outside world. - -### DataSource -`DataSource` is the start point of a streaming processing flow. - - -### DataSink -`DataSink` is the end point of a streaming processing flow. - -## Implemented Connectors - -### `DataSource` implemented -Currently, we have following `DataSource` supported. - -Name | Description ------| ---------- -`CollectionDataSource` | Convert a collection to a recursive data source. E.g. `seq(1, 2, 3)` will output `1,2,3,1,2,3...`. -`KafkaSource` | Read from Kafka. - -### `DataSink` implemented -Currently, we have following `DataSink` supported. - -Name | Description ------| ---------- -`HBaseSink` | Write the message to HBase. The message to write must be HBase `Put` or a tuple of `(rowKey, family, column, value)`. -`KafkaSink` | Write to Kafka. - -## Use of Connectors - -### Use of Kafka connectors - -To use Kafka connectors in your application, you first need to add the `gearpump-external-kafka` library dependency in your application: - -#### SBT - - :::sbt - "org.apache.gearpump" %% "gearpump-external-kafka" % {{GEARPUMP_VERSION}} - -#### XML - - :::xml - <dependency> - <groupId>org.apache.gearpump</groupId> - <artifactId>gearpump-external-kafka</artifactId> - <version>{{GEARPUMP_VERSION}}</version> - </dependency> - - -This is a simple example to read from Kafka and write it back using `KafkaSource` and `KafkaSink`. Users can optionally set a `CheckpointStoreFactory` such that Kafka offsets are checkpointed and at-least-once message delivery is guaranteed. - -#### Low level API - - :::scala - val appConfig = UserConfig.empty - val props = new Properties - props.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect) - props.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList) - props.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName) - val source = new KafkaSource(sourceTopic, props) - val checkpointStoreFactory = new KafkaStoreFactory(props) - source.setCheckpointStore(checkpointStoreFactory) - val sourceProcessor = DataSourceProcessor(source, sourceNum) - val sink = new KafkaSink(sinkTopic, props) - val sinkProcessor = DataSinkProcessor(sink, sinkNum) - val partitioner = new ShufflePartitioner - val computation = sourceProcessor ~ partitioner ~> sinkProcessor - val app = StreamApplication(appName, Graph(computation), appConfig) - -#### High level API - - :::scala - val props = new Properties - val appName = "KafkaDSL" - props.put(KafkaConfig.ZOOKEEPER_CONNECT_CONFIG, zookeeperConnect) - props.put(KafkaConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList) - props.put(KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX_CONFIG, appName) - - val app = StreamApp(appName, context) - - if (atLeastOnce) { - val checkpointStoreFactory = new KafkaStoreFactory(props) - KafkaDSL.createAtLeastOnceStream(app, sourceTopic, checkpointStoreFactory, props, sourceNum) - .writeToKafka(sinkTopic, props, sinkNum) - } else { - KafkaDSL.createAtMostOnceStream(app, sourceTopic, props, sourceNum) - .writeToKafka(sinkTopic, props, sinkNum) - } - - -In the above example, configurations are set through Java properties and shared by `KafkaSource`, `KafkaSink` and `KafkaCheckpointStoreFactory`. -Their configurations can be defined differently as below. - -#### `KafkaSource` configurations - -Name | Descriptions | Type | Default ----- | ------------ | ---- | ------- -`KafkaConfig.ZOOKEEPER_CONNECT_CONFIG` | Zookeeper connect string for Kafka topics management | String -`KafkaConfig.CLIENT_ID_CONFIG` | An id string to pass to the server when making requests | String | "" -`KafkaConfig.GROUP_ID_CONFIG` | A string that uniquely identifies a set of consumers within the same consumer group | "" -`KafkaConfig.FETCH_SLEEP_MS_CONFIG` | The amount of time(ms) to sleep when hitting fetch.threshold | Int | 100 -`KafkaConfig.FETCH_THRESHOLD_CONFIG` | Size of internal queue to keep Kafka messages. Stop fetching and go to sleep when hitting the threshold | Int | 10000 -`KafkaConfig.PARTITION_GROUPER_CLASS_CONFIG` | Partition grouper class to group partitions among source tasks | Class | DefaultPartitionGrouper -`KafkaConfig.MESSAGE_DECODER_CLASS_CONFIG` | Message decoder class to decode raw bytes from Kafka | Class | DefaultMessageDecoder -`KafkaConfig.TIMESTAMP_FILTER_CLASS_CONFIG` | Timestamp filter class to filter out late messages | Class | DefaultTimeStampFilter - - -#### `KafkaSink` configurations - -Name | Descriptions | Type | Default ----- | ------------ | ---- | ------- -`KafkaConfig.BOOTSTRAP_SERVERS_CONFIG` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster | String | -`KafkaConfig.CLIENT_ID_CONFIG` | An id string to pass to the server when making requests | String | "" - -#### `KafkaCheckpointStoreFactory` configurations - -Name | Descriptions | Type | Default ----- | ------------ | ---- | ------- -`KafkaConfig.ZOOKEEPER_CONNECT_CONFIG` | Zookeeper connect string for Kafka topics management | String | -`KafkaConfig.BOOTSTRAP_SERVERS_CONFIG` | A list of host/port pairs to use for establishing the initial connection to the Kafka cluster | String | -`KafkaConfig.CHECKPOINT_STORE_NAME_PREFIX` | Name prefix for checkpoint store | String | "" -`KafkaConfig.REPLICATION_FACTOR` | Replication factor for checkpoint store topic | Int | 1 - -### Use of `HBaseSink` - -To use `HBaseSink` in your application, you first need to add the `gearpump-external-hbase` library dependency in your application: - -#### SBT - - :::sbt - "org.apache.gearpump" %% "gearpump-external-hbase" % {{GEARPUMP_VERSION}} - -#### XML - :::xml - <dependency> - <groupId>org.apache.gearpump</groupId> - <artifactId>gearpump-external-hbase</artifactId> - <version>{{GEARPUMP_VERSION}}</version> - </dependency> - - -To connect to HBase, you need to provide following info: - - * the HBase configuration to tell which HBase service to connect - * the table name (you must create the table yourself, see the [HBase documentation](https://hbase.apache.org/book.html)) - -Then, you can use `HBaseSink` in your application: - - :::scala - //create the HBase data sink - val sink = HBaseSink(UserConfig.empty, tableName, HBaseConfiguration.create()) - - //create Gearpump Processor - val sinkProcessor = DataSinkProcessor(sink, parallelism) - - - :::scala - //assume stream is a normal `Stream` in DSL - stream.writeToHbase(UserConfig.empty, tableName, parallelism, "write to HBase") - - -You can tune the connection to HBase via the HBase configuration passed in. If not passed, Gearpump will try to check local classpath to find a valid HBase configuration (`hbase-site.xml`). - -Attention, due to the issue discussed [here](http://stackoverflow.com/questions/24456484/hbase-managed-zookeeper-suddenly-trying-to-connect-to-localhost-instead-of-zooke) you may need to create additional configuration for your HBase sink: - - :::scala - def hadoopConfig = { - val conf = new Configuration() - conf.set("hbase.zookeeper.quorum", "zookeeperHost") - conf.set("hbase.zookeeper.property.clientPort", "2181") - conf - } - val sink = HBaseSink(UserConfig.empty, tableName, hadoopConfig) - - -## How to implement your own `DataSource` - -To implement your own `DataSource`, you need to implement two things: - -1. The data source itself -2. a helper class to easy the usage in a DSL - -### Implement your own `DataSource` -You need to implement a class derived from `org.apache.gearpump.streaming.transaction.api.TimeReplayableSource`. - -### Implement DSL helper (Optional) -If you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper. -You can refer `KafkaDSLUtil` as an example in Gearpump source. - -Below is some code snippet from `KafkaDSLUtil`: - - :::scala - object KafkaDSLUtil { - - def createStream[T]( - app: StreamApp, - topics: String, - parallelism: Int, - description: String, - properties: Properties): dsl.Stream[T] = { - app.source[T](new KafkaSource(topics, properties), parallelism, description) - } - } - - -## How to implement your own `DataSink` -To implement your own `DataSink`, you need to implement two things: - -1. The data sink itself -2. a helper class to make it easy use in DSL - -### Implement your own `DataSink` -You need to implement a class derived from `org.apache.gearpump.streaming.sink.DataSink`. - -### Implement DSL helper (Optional) -If you would like to have a DSL at hand you may start with this customized stream; it is better if you can implement your own DSL helper. -You can refer `HBaseDSLSink` as an example in Gearpump source. - -Below is some code snippet from `HBaseDSLSink`: - - :::scala - class HBaseDSLSink[T](stream: Stream[T]) { - def writeToHbase(userConfig: UserConfig, table: String, parallism: Int, description: String): Stream[T] = { - stream.sink(HBaseSink[T](userConfig, table), parallism, userConfig, description) - } - - def writeToHbase(userConfig: UserConfig, configuration: Configuration, table: String, parallism: Int, description: String): Stream[T] = { - stream.sink(HBaseSink[T](userConfig, table, configuration), parallism, userConfig, description) - } - } - - object HBaseDSLSink { - implicit def streamToHBaseDSLSink[T](stream: Stream[T]): HBaseDSLSink[T] = { - new HBaseDSLSink[T](stream) - } - } - http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/dev/dev-custom-serializer.md ---------------------------------------------------------------------- diff --git a/docs/docs/dev/dev-custom-serializer.md b/docs/docs/dev/dev-custom-serializer.md deleted file mode 100644 index b1abeda..0000000 --- a/docs/docs/dev/dev-custom-serializer.md +++ /dev/null @@ -1,137 +0,0 @@ -Gearpump has a built-in serialization framework with a shaded Kryo version, which allows you to customize how a specific message type can be serialized. - -#### Register a class before serialization. - -Note, to use built-in kryo serialization framework, Gearpump requires all classes to be registered explicitly before using, no matter you want to use a custom serializer or not. If not using custom serializer, Gearpump will use default com.esotericsoftware.kryo.serializers.FieldSerializer to serialize the class. - -To register a class, you need to change the configuration file gear.conf(or application.conf if you want it only take effect for single application). - - :::json - gearpump { - serializers { - ## We will use default FieldSerializer to serialize this class type - "org.apache.gearpump.UserMessage" = "" - - ## we will use custom serializer to serialize this class type - "org.apache.gearpump.UserMessage2" = "org.apache.gearpump.UserMessageSerializer" - } - } - - -#### How to define a custom serializer for built-in kryo serialization framework - -When you decide that you want to define a custom serializer, you can do this in two ways. - -Please note that Gearpump shaded the original Kryo dependency. The package name ```com.esotericsoftware``` was relocated to ```org.apache.gearpump.esotericsoftware```. So in the following customization, you should import corresponding shaded classes, the example code will show that part. - -In general you should use the shaded version of a library whenever possible in order to avoid binary incompatibilities, eg don't use: - - :::scala - import com.google.common.io.Files - - -but rather - - :::scala - import org.apache.gearpump.google.common.io.Files - - -##### System Level Serializer - -If the serializer is widely used, you can define a global serializer which is available to all applications(or worker or master) in the system. - -###### Step1: you first need to develop a java library which contains the custom serializer class. here is an example: - - :::scala - package org.apache.gearpump - - import org.apache.gearpump.esotericsoftware.kryo.{Kryo, Serializer} - import org.apache.gearpump.esotericsoftware.kryo.io.{Input, Output} - - class UserMessage(longField: Long, intField: Int) - - class UserMessageSerializer extends Serializer[UserMessage] { - override def write(kryo: Kryo, output: Output, obj: UserMessage) = { - output.writeLong(obj.longField) - output.writeInt(obj.intField) - } - - override def read(kryo: Kryo, input: Input, typ: Class[UserMessage]): UserMessage = { - val longField = input.readLong() - val intField = input.readInt() - new UserMessage(longField, intField) - } - } - - -###### Step2: Distribute the libraries - -Distribute the jar file to lib/ folder of every Gearpump installation in the cluster. - -###### Step3: change gear.conf on every machine of the cluster: - - :::json - gearpump { - serializers { - "org.apache.gearpump.UserMessage" = "org.apache.gearpump.UserMessageSerializer" - } - } - - -###### All set! - -##### Define Application level custom serializer -If all you want is to define an application level serializer, which is only visible to current application AppMaster and Executors(including tasks), you can follow a different approach. - -###### Step1: Define your custom Serializer class - -You should include the Serializer class in your application jar. Here is an example to define a custom serializer: - - :::scala - package org.apache.gearpump - - import org.apache.gearpump.esotericsoftware.kryo.{Kryo, Serializer} - import org.apache.gearpump.esotericsoftware.kryo.io.{Input, Output} - - class UserMessage(longField: Long, intField: Int) - - class UserMessageSerializer extends Serializer[UserMessage] { - override def write(kryo: Kryo, output: Output, obj: UserMessage) = { - output.writeLong(obj.longField) - output.writeInt(obj.intField) - } - - override def read(kryo: Kryo, input: Input, typ: Class[UserMessage]): UserMessage = { - val longField = input.readLong() - val intField = input.readInt() - new UserMessage(longField, intField) - } - } - - -###### Step2: Put a application.conf in your classpath on Client machine where you submit the application, - - :::json - ### content of application.conf - gearpump { - serializers { - "org.apache.gearpump.UserMessage" = "org.apache.gearpump.UserMessageSerializer" - } - } - - -###### Step3: All set! - -#### Advanced: Choose another serialization framework - -Note: This is only for advanced user which require deep customization of Gearpump platform. - -There are other serialization framework besides Kryo, like Protobuf. If user don't want to use the built-in kryo serialization framework, he can customize a new serialization framework. - -basically, user need to define in gear.conf(or application.conf for single application's scope) file like this: - - :::bash - gearpump.serialization-framework = "org.apache.gearpump.serializer.CustomSerializationFramework" - - -Please find an example in gearpump storm module, search "StormSerializationFramework" in source code. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/dev/dev-ide-setup.md ---------------------------------------------------------------------- diff --git a/docs/docs/dev/dev-ide-setup.md b/docs/docs/dev/dev-ide-setup.md deleted file mode 100644 index fa983ba..0000000 --- a/docs/docs/dev/dev-ide-setup.md +++ /dev/null @@ -1,29 +0,0 @@ -### Intellij IDE Setup - -1. In Intellij, download scala plugin. We are using scala version {{SCALA_BINARY_VERSION}} -2. Open menu "File->Open" to open Gearpump root project, then choose the Gearpump source folder. -3. All set. - -**NOTE:** Intellij Scala plugin is already bundled with sbt. If you have Scala plugin installed, please don't install additional sbt plugin. Check your settings at "Settings -> Plugins" -**NOTE:** If you are behind a proxy, to speed up the build, please set the proxy for sbt in "Settings -> Build Tools > SBT". in input field "VM parameters", add - - :::bash - -Dhttp.proxyHost=<proxy host> - -Dhttp.proxyPort=<port like 911> - -Dhttps.proxyHost=<proxy host> - -Dhttps.proxyPort=<port like 911> - - -### Eclipse IDE Setup - -I will show how to do this in eclipse LUNA. - -There is a sbt-eclipse plugin to generate eclipse project files, but seems there are some bugs, and some manual fix is still required. Here is the steps that works for me: - -1. Install latest version eclipse luna -2. Install latest scala-IDE http://scala-ide.org/download/current.html I use update site address: http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site -3. Open a sbt shell under the root folder of Gearpump. enter "eclipse", then we get all eclipse project file generated. -4. Use eclipse import wizard. File->Import->Existing projects into Workspace, make sure to tick the option "Search for nested projects" -5. Then it may starts to complain about encoding error, like "IO error while decoding". You need to fix the eclipse default text encoding by changing configuration at "Window->Preference->General->Workspace->Text file encoding" to UTF-8. -6. Then the project gearpump-external-kafka may still cannot compile. The reason is that there is some dependencies missing in generated .classpath file by sbt-eclipse. We need to do some manual fix. Right click on project icon of gearpump-external-kafka in eclipse, then choose menu "Build Path->Configure Build Path". A window will popup. Under the tab "projects", click add, choose "gearpump-streaming" -7. All set. Now the project should compile OK in eclipse. http://git-wip-us.apache.org/repos/asf/incubator-gearpump/blob/cc0578e5/docs/docs/dev/dev-non-streaming-example.md ---------------------------------------------------------------------- diff --git a/docs/docs/dev/dev-non-streaming-example.md b/docs/docs/dev/dev-non-streaming-example.md deleted file mode 100644 index 3d1d5e0..0000000 --- a/docs/docs/dev/dev-non-streaming-example.md +++ /dev/null @@ -1,133 +0,0 @@ -We'll use [Distributed Shell](https://github.com/apache/incubator-gearpump/blob/master/examples/distributedshell) as an example to illustrate how to do that. - -What Distributed Shell do is that user send a shell command to the cluster and the command will the executed on each node, then the result will be return to user. - -### Maven/Sbt Settings - -Repository and library dependencies can be found at [Maven Setting](http://gearpump.incubator.apache.org/downloads.html#maven-dependencies) - -### Define Executor Class - - :::scala - class ShellExecutor(executorContext: ExecutorContext, userConf : UserConfig) extends Actor{ - import executorContext._ - - override def receive: Receive = { - case ShellCommand(command, args) => - val process = Try(s"$command $args" !!) - val result = process match { - case Success(msg) => msg - case Failure(ex) => ex.getMessage - } - sender ! ShellCommandResult(executorId, result) - } - } - -So ShellExecutor just receive the ShellCommand and try to execute it and return the result to the sender, which is quite simple. - -### Define AppMaster Class -For a non-streaming application, you have to write your own AppMaster. - -Here is a typical user defined AppMaster, please note that some trivial codes are omitted. - - :::scala - class DistShellAppMaster(appContext : AppMasterContext, app : Application) extends ApplicationMaster { - protected var currentExecutorId = 0 - - override def preStart(): Unit = { - ActorUtil.launchExecutorOnEachWorker(masterProxy, getExecutorJvmConfig, self) - } - - override def receive: Receive = { - case ExecutorSystemStarted(executorSystem) => - import executorSystem.{address, worker, resource => executorResource} - val executorContext = ExecutorContext(currentExecutorId, worker.workerId, appId, self, executorResource) - val executor = context.actorOf(Props(classOf[ShellExecutor], executorContext, app.userConfig) - .withDeploy(Deploy(scope = RemoteScope(address))), currentExecutorId.toString) - executorSystem.bindLifeCycleWith(executor) - currentExecutorId += 1 - case StartExecutorSystemTimeout => - masterProxy ! ShutdownApplication(appId) - context.stop(self) - case msg: ShellCommand => - Future.fold(context.children.map(_ ? msg))(new ShellCommandResultAggregator) { (aggregator, response) => - aggregator.aggregate(response.asInstanceOf[ShellCommandResult]) - }.map(_.toString()) pipeTo sender - } - - private def getExecutorJvmConfig: ExecutorSystemJvmConfig = { - val config: Config = Option(app.clusterConfig).map(_.getConfig).getOrElse(ConfigFactory.empty()) - val jvmSetting = Util.resolveJvmSetting(config.withFallback(context.system.settings.config)).executor - ExecutorSystemJvmConfig(jvmSetting.classPath, jvmSetting.vmargs, - appJar, username, config) - } - } - - -So when this `DistShellAppMaster` started, first it will request resources to launch one executor on each node, which is done in method `preStart` - -Then the DistShellAppMaster's receive handler will handle the allocated resource to launch the `ShellExecutor` we want. If you want to write your application, you can just use this part of code. The only thing needed is replacing the Executor class. - -There may be a situation that the resource allocation failed which will bring the message `StartExecutorSystemTimeout`, the normal pattern to handle that is just what we do: shut down the application. - -The real application logic part is in `ShellCommand` message handler, which is specific to different applications. Here we distribute the shell command to each executor and aggregate the results to the client. - -For method `getExecutorJvmConfig`, you can just use this part of code in your own application. - -### Define Application -Now its time to launch the application. - - :::scala - object DistributedShell extends App with ArgumentsParser { - private val LOG: Logger = LogUtil.getLogger(getClass) - - override val options: Array[(String, CLIOption[Any])] = Array.empty - - LOG.info(s"Distributed shell submitting application...") - val context = ClientContext() - val appId = context.submit(Application[DistShellAppMaster]("DistributedShell", UserConfig.empty)) - context.close() - LOG.info(s"Distributed Shell Application started with appId $appId !") - } - -The application class extends `App` and `ArgumentsParser which make it easier to parse arguments and run main functions. This part is similar to the streaming applications. - -The main class `DistributeShell` will submit an application to `Master`, whose `AppMaster` is `DistShellAppMaster`. - -### Define an optional Client class - -Now, we can define a `Client` class to talk with `AppMaster` to pass our commands to it. - - :::scala - object DistributedShellClient extends App with ArgumentsParser { - implicit val timeout = Constants.FUTURE_TIMEOUT - import scala.concurrent.ExecutionContext.Implicits.global - private val LOG: Logger = LoggerFactory.getLogger(getClass) - - override val options: Array[(String, CLIOption[Any])] = Array( - "master" -> CLIOption[String]("<host1:port1,host2:port2,host3:port3>", required = true), - "appid" -> CLIOption[Int]("<the distributed shell appid>", required = true), - "command" -> CLIOption[String]("<shell command>", required = true), - "args" -> CLIOption[String]("<shell arguments>", required = true) - ) - - val config = parse(args) - val context = ClientContext(config.getString("master")) - val appid = config.getInt("appid") - val command = config.getString("command") - val arguments = config.getString("args") - val appMaster = context.resolveAppID(appid) - (appMaster ? ShellCommand(command, arguments)).map { reslut => - LOG.info(s"Result: $reslut") - context.close() - } - } - - -In the `DistributedShellClient`, it will resolve the appid to the real appmaster(the application id will be printed when launching `DistributedShell`). - -Once we got the `AppMaster`, then we can send `ShellCommand` to it and wait for the result. - -### Submit application - -After all these, you need to package everything into a uber jar and submit the jar to Gearpump Cluster. Please check [Application submission tool](../introduction/commandline) to command line tool syntax.
