This is an automated email from the ASF dual-hosted git repository. mwalch pushed a commit to branch gh-pages in repository https://gitbox.apache.org/repos/asf/fluo-website.git
The following commit(s) were added to refs/heads/gh-pages by this push: new 55511fd Fixes #82 Moved docs source from Fluo repo to website (#83) 55511fd is described below commit 55511fdbf89da3bc2825d8cf9194608e99198bd0 Author: Mike Walch <mwa...@apache.org> AuthorDate: Mon Sep 25 16:02:44 2017 -0400 Fixes #82 Moved docs source from Fluo repo to website (#83) --- _config.yml | 18 ++ _fluo-docs-1-2/administration/metrics.md | 189 +++++++++++++++++ _fluo-docs-1-2/development/applications.md | 314 +++++++++++++++++++++++++++++ _fluo-docs-1-2/getting-started/design.md | 45 +++++ _fluo-docs-1-2/getting-started/install.md | 115 +++++++++++ _fluo-docs-1-2/index.md | 4 + _layouts/default.html | 8 +- _layouts/fluo-docs-1.2.html | 55 +++++ css/fluo.scss | 6 + docs/fluo/1.0.0-incubating/index.md | 2 + docs/fluo/1.1.0-incubating/index.md | 2 + docs/index.md | 4 +- resources/docs/fluo-architecture.odg | Bin 0 -> 16670 bytes resources/docs/fluo-architecture.png | Bin 0 -> 61085 bytes 14 files changed, 758 insertions(+), 4 deletions(-) diff --git a/_config.yml b/_config.yml index 423b318..657cb54 100644 --- a/_config.yml +++ b/_config.yml @@ -5,6 +5,12 @@ timezone: Etc/UTC permalink: pretty +# Collection names cannot contain periods +collections: + fluo-docs-1-2: + output: true + permalink: "/docs/fluo/1.2/:path" + defaults: - scope: @@ -34,6 +40,18 @@ defaults: values: layout: "tour" permalink: "/tour/:basename/" + - + scope: + path: "_fluo-docs-1-2" + type: "fluo-docs-1-2" + values: + layout: "fluo-docs-1.2" + title_prefix: "Fluo Documentation - " + version: "1.2.0" + minor_release: "1.2" + docs_base: "/docs/fluo/1.2" + javadoc_base: "https://static.javadoc.io/org.apache.fluo/fluo-api/1.1.0-incubating" + github_base: "https://github.com/apache/fluo/blob/master" # Number of posts displayed on the home page. num_home_posts: 5 diff --git a/_fluo-docs-1-2/administration/metrics.md b/_fluo-docs-1-2/administration/metrics.md new file mode 100644 index 0000000..9576ee6 --- /dev/null +++ b/_fluo-docs-1-2/administration/metrics.md @@ -0,0 +1,189 @@ +--- +title: Metrics +category: administration +order: 1 +--- + +A Fluo application can be configured (in [fluo-app.properties]) to report metrics. When metrics are +configured, Fluo will report some 'default' metrics about an application that help users monitor its +performance. Users can also write code to report 'application-specific' metrics from their +applications. Both 'application-specific' and 'default' metrics share the same reporter configured +by [fluo-app.properties] and are described in detail below. + +## Configuring reporters + +Fluo metrics are not published by default. To publish metrics, configure a reporter in the 'metrics' +section of [fluo-app.properties]. There are several different reporter types (i.e Console, CSV, +Graphite, JMX, SLF4J) that are implemented using [Dropwizard]. The choice of which reporter to use +depends on the visualization tool used. If you are not currently using a visualization tool, there +is documentation at end of this page for reporting Fluo metrics to Grafana/InfluxDB. + +## Metrics names + +When Fluo metrics are reported, they are published using a naming scheme that encodes additional +information. This additional information is represented using all caps variables (i.e `METRIC`) +below. + +Default metrics start with `fluo.class` or `fluo.system` and have following naming schemes: + + fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS + fluo.system.APPLICATION.REPORTER_ID.METRIC + +Application metrics start with `fluo.app` and have following scheme: + + fluo.app.REPORTER_ID.METRIC + +The variables below describe the additional information that is encoded in metrics names. + +1. `APPLICATION` - Fluo application name +2. `REPORTER_ID` - Unique ID of the Fluo oracle, worker, or client that is reporting the metric. + When running in YARN, this ID is of the format `worker-INSTANCE_ID` or `oracle-INSTANCE_ID` + where `INSTANCE_ID` corresponds to instance number. When not running in YARN, this ID consists + of a hostname and a base36 long that is unique across all fluo processes. +3. `METRIC` - Name of the metric. For 'default' metrics, this is set by Fluo. For 'application' + metrics, this is set by user. Name should be unique and avoid using period '.' in name. +4. `CLASS` - Name of Fluo observer or loader class that produced metric. This allows things like + transaction collisions to be tracked per class. + +## Application-specific metrics + +Application metrics are implemented by retrieving a [MetricsReporter] from an [Observer], [Loader], +or [FluoClient]. These metrics are named using the format `fluo.app.REPORTER_ID.METRIC`. + +## Default metrics + +Default metrics report for a particular Observer/Loader class or system-wide. + +Below are metrics that are reported from each Observer/Loader class that is configured in a Fluo +application. These metrics are reported after each transaction and named using the format +`fluo.class.APPLICATION.REPORTER_ID.METRIC.CLASS`. + +* tx_lock_wait_time - [Timer] + - Time transaction spent waiting on locks held by other transactions. + - Only updated for transactions that have non-zero lock time. +* tx_execution_time - [Timer] + - Time transaction took to execute. + - Updated for failed and successful transactions. + - This does not include commit time, only the time from start until commit is called. +* tx_with_collision - [Meter] + - Rate of transactions with collisions. +* tx_collisions - [Meter] + - Rate of collisions. +* tx_entries_set - [Meter] + - Rate of row/columns set by transaction +* tx_entries_read - [Meter] + - Rate of row/columns read by transaction that existed. + - There is currently no count of all reads (including non-existent data) +* tx_locks_timedout - [Meter] + - Rate of timedout locks rolled back by transaction. + - These are locks that are held for very long periods by another transaction that appears to be + alive based on zookeeper. +* tx_locks_dead - [Meter] + - Rate of dead locks rolled by a transaction. + - These are locks held by a process that appears to be dead according to zookeeper. +* tx_status_`<STATUS>` - [Meter] + - Rate of different ways (i.e `<STATUS>`) a transaction can terminate + +Below are system-wide metrics that are reported for the entire Fluo application. These metrics are +named using the format `fluo.system.APPLICATION.REPORTER_ID.METRIC`. + +* oracle_response_time - [Timer] + - Time each RPC call to oracle for stamps took +* oracle_client_stamps - [Histogram] + - Number of stamps requested for each request for stamps to the server +* oracle_server_stamps - [Histogram] + - Number of stamps requested for each request for stamps from a client +* worker_notifications_queued - [Gauge] + - The current number of notifications queued for processing. +* transactor_committing - [Gauge] + - The current number of transactions that are working their way through the commit steps. + +Histograms and Timers have a counter. In the case of a histogram, the counter is the number of times +the metric was updated and not a sum of the updates. For example if a request for 5 timestamps was +made to the oracle followed by a request for 3 timestamps, then the count for `oracle_server_stamps` +would be 2 and the mean would be (5+3)/2. + +## View metrics in Grafana/InfuxDB + +Fluo metrics can be sent to [InfluxDB], a time series database, and made viewable in [Grafana], +a visualization tool by following the instructions below: + +1. Follow the standard installation instructions for [InfluxDB] and [Grafana]. As for versions, + the instructions below were written using InfluxDB v0.9.4.2 and Grafana v2.5.0. + +2. Add the following to your InfluxDB configuration to configure it accept metrics in Graphite + format from Fluo. The configuration below contains templates that transform the Graphite + metrics into a format that is usable in InfluxDB. + + ``` + [[graphite]] + bind-address = ":2003" + enabled = true + database = "fluo_metrics" + protocol = "tcp" + consistency-level = "one" + separator = "_" + batch-size = 1000 + batch-pending = 5 + batch-timeout = "1s" + templates = [ + "fluo.class.*.*.*.*.* ..app.host.measurement.observer.field", + "fluo.class.*.*.*.* ..app.host.measurement.observer", + "fluo.system.*.*.*.* ..app.host.measurement.field", + "fluo.system.*.*.* ..app.host.measurement", + "fluo.app.*.*.* ..host.measurement.field", + "fluo.app.*.* ..host.measurement", + ] + ``` + +3. Fluo distributes a file called `fluo_metrics_setup.txt` that contains a list of commands that + setup InfluxDB. These commands will configure an InfluxDB user, retention policies, and + continuous queries that downsample data for the historical dashboard in Grafana. Run the command + below to execute the commands in this file: + + ``` + $INFLUXDB_HOME/bin/influx -import -path $FLUO_HOME/contrib/influxdb/fluo_metrics_setup.txt + ``` + +3. Configure the `fluo-app.properties` of your Fluo application to send Graphite metrics to InfluxDB. + Below is example configuration. Remember to replace `<INFLUXDB_HOST>` with the actual host. + + ``` + fluo.metrics.reporter.graphite.enable=true + fluo.metrics.reporter.graphite.host=<INFLUXDB_HOST> + fluo.metrics.reporter.graphite.port=2003 + fluo.metrics.reporter.graphite.frequency=30 + ``` + + The reporting frequency of 30 sec is required if you are using the provided Grafana dashboards + that are configured in the next step. + +4. Grafana needs to be configured to load dashboard JSON templates from a directory. Fluo + distributes two Grafana dashboard templates in its tarball distribution in the directory + `contrib/grafana`. Before restarting Grafana, you should copy the templates from your Fluo + installation to the `dashboards/` directory configured below. + + ``` + [dashboards.json] + enabled = true + path = <GRAFANA_HOME>/dashboards + ``` + +5. If you restart Grafana, you will see the Fluo dashboards configured but all of their charts will + be empty unless you have a Fluo application running and configured to send data to InfluxDB. + When you start sending data, you may need to refresh the dashboard page in the browser to start + viewing metrics. + +[Grafana]: http://grafana.org/ +[InfluxDB]: https://influxdb.com/ +[fluo-app.properties]: {{ page.github_base }}/modules/distribution/src/main/config/fluo-app.properties +[Dropwizard]: https://dropwizard.github.io/metrics/3.1.0/ +[MetricsReporter]: {{ page.javadoc_base }}/org/apache/fluo/api/metrics/MetricsReporter.html +[Observer]: {{ page.javadoc_base }}/org/apache/fluo/api/observer/Observer.html +[Loader]: {{ page.javadoc_base }}/org/apache/fluo/api/client/Loader.html +[FluoClient]: {{ page.javadoc_base }}/org/apache/fluo/api/client/FluoClient.html +[Timer]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#timers +[Counter]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#counters +[Histogram]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#histograms +[Gauge]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#gauges +[Meter]: https://dropwizard.github.io/metrics/3.1.0/getting-started/#meters diff --git a/_fluo-docs-1-2/development/applications.md b/_fluo-docs-1-2/development/applications.md new file mode 100644 index 0000000..a1fb8f9 --- /dev/null +++ b/_fluo-docs-1-2/development/applications.md @@ -0,0 +1,314 @@ +--- +title: Applications +category: development +order: 1 +--- + +Once you have Fluo installed and running on your cluster, you can run Fluo applications consisting +of [clients and observers][design]. This documentations will shows how to : + + * Create a Fluo client + * Create a Fluo observer + * Initialize a Fluo Application + * Start and stop a Fluo application (which consists of Oracle and Worker processes) + +## Fluo Maven Dependencies + +For both clients and observers, you will need to include the following in your Maven pom: + +```xml +<dependency> + <groupId>org.apache.fluo</groupId> + <artifactId>fluo-api</artifactId> + <version>1.2.0</version> +</dependency> +<dependency> + <groupId>org.apache.fluo</groupId> + <artifactId>fluo-core</artifactId> + <version>1.2.0</version> + <scope>runtime</scope> +</dependency> +``` + +Fluo provides a classpath command to help users build a runtime classpath. This command along with +the `hadoop jar` command is useful when writing scripts to run Fluo client code. These commands +allow the scripts to use the versions of Hadoop, Accumulo, and Zookeeper installed on a cluster. + +## Creating a Fluo client + +To create a [FluoClient], you will need to provide it with a [FluoConfiguration] object that is +configured to connect to your Fluo instance. + +If you have access to the [fluo-conn.properties] file that was used to configure your Fluo instance, you +can use it to build a [FluoConfiguration] object with all necessary properties: + +```java +FluoConfiguration config = new FluoConfiguration(new File("fluo-conn.properties")); +config.setApplicationName("myapp"); +``` + +You can also create an empty [FluoConfiguration] object and set properties using Java: + +```java +FluoConfiguration config = new FluoConfiguration(); +config.setInstanceZookeepers("localhost/fluo"); +config.setApplicationName("myapp"); +``` + +Once you have [FluoConfiguration] object, pass it to the `newClient()` method of [FluoFactory] to +create a [FluoClient]: + +```java +try(FluoClient client = FluoFactory.newClient(config)){ + + try (Transaction tx = client.newTransaction()) { + // read and write some data + tx.commit(); + } + + try (Snapshot snapshot = client.newSnapshot()) { + //read some data + } +} +``` + +It may help to reference the [API javadocs][API] while you are learning the Fluo API. + +## Creating a Fluo observer + +To create an observer, follow these steps: + +1. Create one or more classes that extend [Observer] like the example below. Please use [slf4j] for + any logging in observers as [slf4j] supports multiple logging implementations. This is + necessary as Fluo applications have a hard requirement on [logback] when running in YARN. + + ```java + public class InvertObserver implements Observer { + + @Override + public void process(TransactionBase tx, Bytes row, Column col) throws Exception { + // read value + Bytes value = tx.get(row, col); + // invert row and value + tx.set(value, new Column("inv", "data"), row); + } + } + ``` + +2. Create a class that implements [ObserverProvider] like the example below. The purpose of this + class is associate a set Observers with columns that trigger the observers. The class can + register multiple observers. + + ```java + class AppObserverProvider implements ObserverProvider { + @Override + public void provide(Registry or, Context ctx) { + //setup InvertObserver to be triggered when the column obs:data is modified + or.forColumn(new Column("obs", "data"), NotificationType.STRONG) + .useObserver(new InvertObserver()); + + //Observer is a Functional interface. So Observers can be written as lambdas. + or.forColumn(new Column("new","data"), NotificationType.WEAK) + .useObserver((tx,row,col) -> { + Bytes combined = combineNewAndOld(tx,row); + tx.set(row, new Column("current","data"), combined); + }); + } + } + ``` + +3. Build a jar containing these classes and include this jar in the `lib/` directory of your Fluo + application. +4. Configure your Fluo application to use this observer provider by modifying the Application section of + [fluo-app.properties]. Set `fluo.observer.provider` to the observer provider class name. +5. Initialize your Fluo application as described in the next section. During initialization Fluo + will obtain the observed columns from the ObserverProvider and persist the columns in Zookeeper. + These columns persisted in Zookeeper are used by transactions to know when to trigger observers. + +## Initializing a Fluo Application + +Before a Fluo Application can run, it must be initiaized. Below is an overview of what +initialization does and some of the properties that may be set for initialization. + + * **Initialize ZooKeeper** : Each application has its own area in ZooKeeper used for configuration, + Oracle state, and worker coordination. All properties, except `fluo.connections.*`, are copied + into ZooKeeper. For example, if `fluo.worker.num.threads=128` was set, then when a worker process + starts it will read this from ZooKeeper. + * **Copy Observer jars to DFS** : Fluo workers processes need the jars containing observers. These + are provided in one of the following ways. + * Set the property `fluo.observer.init.dir` to a local directory containing observer jars. The + jars in this directory are copied to DFS under `<fluo.dfs.root>/<app name>`. When a worker is + started, the jars are pulled from DFS and added to its classpath. + * Set the property `fluo.observer.jars.url` to a directory in DFS containing observer jars. No + copying is done. When a worker is started, the jars are pulled from this location and added to + its classpath. + * Do not set any of the properties above and have the mechanism that starts the worker process + add the needed jars to the classpath. + * **Create Accumulo table** : Each Fluo application creates and configures an Accumulo table. The + `fluo.accumulo.*` properties determine which Accumulo instance is used. For performance reasons, + Fluo runs its own code in Accumulo tablet servers. Fluo attempts to copy Fluo jars into DFS and + configure Accumulo to use them. Fluo first checks the property `fluo.accumulo.jars` and if set, + copies the jars listed there. If that property is not set, then Fluo looks on the classpath to + find jars. Jars are copied to a location under `<fluo.dfs.root>/<app name>`. + +Below are the steps to initialize an application from the command line. It is also possible to +initialize an application using Fluo's Java API. + +1. Create a copy of [fluo-app.properties] for your Fluo application. + + cp $FLUO_HOME/conf/fluo-app.properties /path/to/myapp/fluo-app.properties + +2. Edit your copy of [fluo-app.properties] and make sure to set the following: + + * Class name of your ObserverProvider + * Paths to your Fluo observer jars + * Accumulo configuration + * DFS configuration + + When configuring the observer section of fluo-app.properties, you can configure your instance for the + [phrasecount] application if you have not created your own application. See the [phrasecount] + example for instructions. You can also choose not to configure any observers but your workers will + be idle when started. + +3. Run the command below to initialize your Fluo application. Change `myapp` to your application name: + + fluo init myapp /path/to/myapp/fluo-app.properties + + A Fluo application only needs to be initialized once. After initialization, the Fluo application + name is used to start/stop the application and scan the Fluo table. + +4. Run `fluo list` which connects to Fluo and lists applications to verify initialization. + +5. Run `fluo config myapp` to see what configuration is stored in ZooKeeper. + +## Starting your Fluo application + +Follow the instructions below to start a Fluo application which contains an oracle and multiple workers. + +1. Configure [fluo-env.sh] and [fluo-conn.properties] if you have not already. + +2. Run Fluo application processes using the `fluo oracle` and `fluo worker` commands. Fluo applications + are typically run with one oracle process and multiple worker processes. The commands below will start + a Fluo oracle and two workers on your local machine: + + fluo oracle myapp &> oracle.log & + fluo worker myapp &> worker1.log & + fluo worker myapp &> worker2.log & + + The commands will retrieve your application configuration and observer jars (using your + application name) before starting the oracle or worker process. + +If you want to distribute the processes of your Fluo application across a cluster, you will need install +Fluo on every node where you want to run a Fluo process and follow the instructions above on each node. + +## Managing your Fluo application + +When you have data in your Fluo application, you can view it using the command `fluo scan myapp`. +Pipe the output to `less` using the command `fluo scan myapp | less` if you want to page through the data. + +To list all Fluo applications, run `fluo list`. + +To stop your Fluo application, run `jps -m | grep Fluo` to find process IDs and use `kill` to stop them. + +## Running application code + +The `fluo exec <app name> <class> {arguments}` provides an easy way to execute application code. It +will execute a class with a main method if a jar containing the class is included with the observer +jars configured at initialization. When the class is run, Fluo classes and dependencies will be on +the classpath. The `fluo exec` command can inject the applications configuration if the class is +written in the following way. Defining the injection point is optional. + +```java +import javax.inject.Inject; + +public class AppCommand { + + //when run with fluo exec command, the applications configuration will be injected + @Inject + private static FluoConfiguration fluoConfig; + + public static void main(String[] args) throws Exception { + try(FluoClient fluoClient = FluoFactory.newClient(fluoConfig)) { + //do stuff with Fluo + } + } +} +``` + +## Application Configuration + +For configuring observers, fluo provides a simple mechanism to set and access application specific +configuration. See the javadoc on [FluoClient].getAppConfiguration() for more details. + +## Debugging Applications + +While monitoring [Fluo metrics][metrics] can detect problems (like too many transaction collisions) +in a Fluo application, [metrics][metrics] may not provide enough information to debug the root cause +of the problem. To help debug Fluo applications, low-level logging of transactions can be turned on +by setting the following loggers to TRACE: + +| Logger | Level | Information | +|----------------------|-------|----------------------------------------------------------------------------------------------------| +| fluo.tx | TRACE | Provides detailed information about what transactions read and wrote | +| fluo.tx.summary | TRACE | Provides a one line summary about each transaction executed | +| fluo.tx.collisions | TRACE | Provides details about what data was involved When a transaction collides with another transaction | +| fluo.tx.scan | TRACE | Provides logging for each cell read by a scan. Scan summary logged at `fluo.tx` level. This allows suppression of `fluo.tx.scan` while still seeing summary. | + +Below is an example log after setting `fluo.tx` to TRACE. The number following `txid: ` is the +transactions start timestamp from the Oracle. + +``` +2015-02-11 18:24:05,341 [fluo.tx ] TRACE: txid: 3 begin() thread: 198 +2015-02-11 18:24:05,343 [fluo.tx ] TRACE: txid: 3 class: com.SimpleLoader +2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 3 get(4333, stat count ) -> null +2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 3 set(4333, stat count , 1) +2015-02-11 18:24:05,441 [fluo.tx ] TRACE: txid: 3 commit() -> SUCCESSFUL commitTs: 4 +2015-02-11 18:24:05,341 [fluo.tx ] TRACE: txid: 5 begin() thread: 198 +2015-02-11 18:24:05,442 [fluo.tx ] TRACE: txid: 3 close() +2015-02-11 18:24:05,343 [fluo.tx ] TRACE: txid: 5 class: com.SimpleLoader +2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 5 get(4333, stat count ) -> 1 +2015-02-11 18:24:05,357 [fluo.tx ] TRACE: txid: 5 set(4333, stat count , 2) +2015-02-11 18:24:05,441 [fluo.tx ] TRACE: txid: 5 commit() -> SUCCESSFUL commitTs: 6 +2015-02-11 18:24:05,442 [fluo.tx ] TRACE: txid: 5 close() +``` + +The log above traces the following sequence of events. + +* Transaction T1 has a start timestamp of `3` +* Thread with id `198` is executing T1, its running code from the class `com.SimpleLoader` +* T1 reads row `4333` and column `stat count` which does not exist +* T1 sets row `4333` and column `stat count` to `1` +* T1 commits successfully and its commit timestamp from the Oracle is `4`. +* Transaction T2 has a start timestamp of `5` (because its `5` > `4` it can see what T1 wrote). +* T2 reads a value of `1` for row `4333` and column `stat count` +* T2 sets row `4333` and column `stat count` to `2` +* T2 commits successfully with a commit timestamp of `6` + +Below is an example log after only setting `fluo.tx.collisions` to TRACE. This setting will only log +trace information when a collision occurs. Unlike the previous example, what the transaction read +and wrote is not logged. This shows that a transaction with a start timestamp of `106` and a class +name of `com.SimpleLoader` collided with another transaction on row `r1` and column `fam1 qual1`. + +``` +2015-02-11 18:17:02,639 [tx.collisions] TRACE: txid: 106 class: com.SimpleLoader +2015-02-11 18:17:02,639 [tx.collisions] TRACE: txid: 106 collisions: {r1=[fam1 qual1 ]} +``` + +When applications read and write arbitrary binary data, this does not log so well. In order to make +the trace logs human readable, non ASCII chars are escaped using hex. The convention used it `\xDD` +where D is a hex digit. Also the `\` character is escaped to make the output unambiguous. + +[design]: {{ page.docs_base }}/getting-started/design/ +[FluoFactory]: {{ page.javadoc_base}}/org/apache/fluo/api/client/FluoFactory.html +[FluoClient]: {{ page.javadoc_base}}/org/apache/fluo/api/client/FluoClient.html +[FluoConfiguration]: {{ page.javadoc_base}}/org/apache/fluo/api/config/FluoConfiguration.html +[Observer]: {{ page.javadoc_base}}/org/apache/fluo/api/observer/Observer.html +[ObserverProvider]: {{ page.javadoc_base}}/org/apache/fluo/api/observer/ObserverProvider.html +[fluo-conn.properties]: {{ page.github_base}}/modules/distribution/src/main/config/fluo-conn.properties +[fluo-app.properties]: {{ page.github_base}}/modules/distribution/src/main/config/fluo-app.properties +[API]: https://fluo.apache.org/apidocs/ +[metrics]: {{ page.docs_base }}/administration/metrics/ +[slf4j]: http://www.slf4j.org/ +[logback]: http://logback.qos.ch/ +[phrasecount]: https://github.com/astralway/phrasecount +[fluo-env.sh]: {{ page.github_base}}/modules/distribution/src/main/config/fluo-env.sh diff --git a/_fluo-docs-1-2/getting-started/design.md b/_fluo-docs-1-2/getting-started/design.md new file mode 100644 index 0000000..00d6f7a --- /dev/null +++ b/_fluo-docs-1-2/getting-started/design.md @@ -0,0 +1,45 @@ +--- +title: Design +category: getting-started +order: 1 +--- + +The diagram below provides an overview of Apache Fluo's design. + +![fluo-architecture][1] + +## Fluo Application + +A **Fluo application** maintains a large scale computation using a series of small transactional +updates. Fluo applications store their data in a **Fluo table** which has a similar structure (row, +column, value) to an **Accumulo table** except that a Fluo table has no timestamps. A Fluo table +is implemented using an Accumulo table. While you could scan the Accumulo table used to implement +a Fluo table using an Accumulo client, you would read extra implementation-related data in addition +to your data. Therefore, developers should only interact with the data in a Fluo table by writing +Fluo client or observer code: + +* **Clients** ingest data or interact with Fluo from external applications (REST services, + crawlers, etc). These are generally user started process that use the Fluo API. +* **Observers** are user provided functions run by Fluo Workers that execute transactions in response to notifications. Notifications are set by Fluo transactions, executing in a client or observer, when a requested column is modified. + +Multiple Fluo applications can run on a cluster at the same time. Fluo applications +consist of an oracle process and a configurable number of worker processes: + +* The **Oracle** process allocates timestamps for transactions. While only one Oracle is required, + Fluo can be configured to run extra Oracles that can take over if the primary Oracle fails. +* **Worker** processes run user code (called **observers**) that perform transactions. All workers + run the same observers. The number of worker instances are configured to handle the processing + workload. + +## Fluo Dependencies + +Fluo requires the following software to be running on the cluster: + +* **Accumulo** - Fluo stores its data in Accumulo and uses Accumulo's conditional mutations for + transactions. +* **Hadoop** - Each Fluo application run its oracle and worker processes as Hadoop YARN + applications. HDFS is also required for Accumulo. +* **Zookeeper** - Fluo stores its metadata and state information in Zookeeper. Zookeeper is also + required for Accumulo. + +[1]: /resources/docs/fluo-architecture.png diff --git a/_fluo-docs-1-2/getting-started/install.md b/_fluo-docs-1-2/getting-started/install.md new file mode 100644 index 0000000..68de78b --- /dev/null +++ b/_fluo-docs-1-2/getting-started/install.md @@ -0,0 +1,115 @@ +--- +title: Installation +category: getting-started +order: 2 +--- + +Instructions for installing Apache Fluo and starting a Fluo application on a cluster where +Accumulo, Hadoop & Zookeeper are running. If you need help setting up these dependencies, see the +[related projects page][related] for external projects that may help. + +## Requirements + +Before you install Fluo, the following software must be installed and running on your local machine +or cluster: + +| Software | Recommended Version | Minimum Version | +|-------------|---------------------|-----------------| +| [Accumulo] | 1.7.2 | 1.6.1 | +| [Hadoop] | 2.7.2 | 2.6.0 | +| [Zookeeper] | 3.4.8 | | +| [Java] | JDK 8 | JDK 8 | + +## Obtain a distribution + +Before you can install Fluo, you will need to obtain a distribution tarball. It is recommended that +you download the [latest release][release]. You can also build a distribution from the master +branch by following these steps which create a tarball in `modules/distribution/target`: + + git clone https://github.com/apache/fluo.git + cd fluo/ + mvn package + +## Install Fluo + +After you obtain a Fluo distribution tarball, follow these steps to install Fluo. + +1. Choose a directory with plenty of space and untar the distribution: + + tar -xvzf fluo-1.2.0-bin.tar.gz + cd fluo-1.2.0 + + The distribution contains a `fluo` script in `bin/` that administers Fluo and the + following configuration files in `conf/`: + + | Configuration file | Description | + |------------------------------|----------------------------------------------------------------------------------------------| + | [fluo-env.sh] | Configures classpath for `fluo` script. Required for all commands. | + | [fluo-conn.properties] | Configures connection to Fluo. Required for all commands. | + | [fluo-app.properties] | Template for configuration file passed to `fluo init` when initializing Fluo application. | + | [log4j.properties] | Configures logging | + | [fluo.properties.deprecated] | Deprecated Fluo configuration file. Replaced by fluo-conn.properties and fluo-app.properties | + +2. Configure [fluo-env.sh] to set up your classpath using jars from the versions of Hadoop, Accumulo, and +Zookeeper that you are using. Choose one of the two ways below to make these jars available to Fluo: + + * Set `HADOOP_PREFIX`, `ACCUMULO_HOME`, and `ZOOKEEPER_HOME` in your environment or configure + these variables in [fluo-env.sh]. Fluo will look in these locations for jars. + * Run `./lib/fetch.sh ahz` to download Hadoop, Accumulo, and Zookeeper jars to `lib/ahz` and + configure [fluo-env.sh] to look in this directory. By default, this command will download the + default versions set in [lib/ahz/pom.xml]. If you are not using the default versions, you can + override them: + + ./lib/fetch.sh ahz -Daccumulo.version=1.7.2 -Dhadoop.version=2.7.2 -Dzookeeper.version=3.4.8 + +3. Fluo needs more dependencies than what is available from Hadoop, Accumulo, and Zookeeper. These + extra dependencies need to be downloaded to `lib/` using the command below: + + ./lib/fetch.sh extra + +You are now ready to use the `fluo` script. + +## Fluo command script + +The Fluo command script is located at `bin/fluo` of your Fluo installation. All Fluo commands are +invoked by this script. + +Modify and add the following to your `~/.bashrc` if you want to be able to execute the fluo script +from any directory: + + export PATH=/path/to/fluo-1.2.0/bin:$PATH + +Source your `.bashrc` for the changes to take effect and test the script + + source ~/.bashrc + fluo + +Running the script without any arguments prints a description of all commands. + + ./bin/fluo + +## Tuning Accumulo + +Fluo will reread the same data frequently when it checks conditions on mutations. When Fluo +initializes a table it enables data caching to make this more efficient. However you may need to +increase the amount of memory available for caching in the tserver by increasing +`tserver.cache.data.size`. Increasing this may require increasing the maximum tserver java heap size +in `accumulo-env.sh`. + +Fluo will run many client threads, will want to ensure the tablet server has enough threads. Should +probably increase the `tserver.server.threads.minimum` Accumulo setting. + +Using at least Accumulo 1.6.1 is recommended because multiple performance bugs were fixed. + +[Accumulo]: https://accumulo.apache.org/ +[Hadoop]: http://hadoop.apache.org/ +[Zookeeper]: http://zookeeper.apache.org/ +[Java]: http://openjdk.java.net/ +[related]: https://fluo.apache.org/related-projects/ +[release]: https://fluo.apache.org/download/ +[fluo-conn.properties]: {{ page.github_base }}/modules/distribution/src/main/config/fluo-conn.properties +[fluo-app.properties]: {{ page.github_base }}/modules/distribution/src/main/config/fluo-app.properties +[log4j.properties]: {{ page.github_base }}/modules/distribution/src/main/config/log4j.properties +[fluo.properties.deprecated]: {{ page.github_base }}/modules/distribution/src/main/config/fluo.properties.deprecated +[fluo-env.sh]: {{ page.github_base }}/modules/distribution/src/main/config/fluo-env.sh +[lib/ahz/pom.xml]: {{ page.github_base }}/modules/distribution/src/main/lib/ahz/pom.xml diff --git a/_fluo-docs-1-2/index.md b/_fluo-docs-1-2/index.md new file mode 100644 index 0000000..349b42d --- /dev/null +++ b/_fluo-docs-1-2/index.md @@ -0,0 +1,4 @@ +--- +title: Apache Fluo documentation +redirect_to: getting-started/design +--- diff --git a/_layouts/default.html b/_layouts/default.html index 15f5342..dcd034c 100644 --- a/_layouts/default.html +++ b/_layouts/default.html @@ -34,7 +34,13 @@ <ul class="navbar-nav nav"> <li><a href="{{ site.baseurl }}/release/">Releases</a></li> <li><a href="{{ site.baseurl }}/tour/">Tour</a></li> - <li><a href="{{ site.baseurl }}/docs/">Docs</a></li> + <li class="dropdown"> + <a class="dropdown-toggle" data-toggle="dropdown" href="#">Docs<span class="caret"></span></a> + <ul class="dropdown-menu"> + <li><a href="{{ site.baseurl }}/docs/fluo/{{ site.latest_fluo_release }}/">Fluo</a></li> + <li><a href="{{ site.baseurl }}/docs/fluo-recipes/{{ site.latest_recipes_release }}/">Fluo Recipes</a></li> + </ul> + </li> <li><a href="{{ site.baseurl }}/api/">API</a></li> <li class="dropdown"> <a class="dropdown-toggle" data-toggle="dropdown" href="#">Community<span class="caret"></span></a> diff --git a/_layouts/fluo-docs-1.2.html b/_layouts/fluo-docs-1.2.html new file mode 100644 index 0000000..c53ffd3 --- /dev/null +++ b/_layouts/fluo-docs-1.2.html @@ -0,0 +1,55 @@ +--- +layout: default +--- + +<div class="row"> + <div class="col-md-2"> + <div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true" data-spy="affix"> + <div class="panel panel-default"> + {% assign mydocs = site.fluo-docs-1-2 | group_by: 'category' %} + {% assign categories = "getting-started,development,administration" | split: "," %} + {% for pcat in categories %} + {% for dcat in mydocs %} + {% if pcat == dcat.name %} + <div class="panel-heading" role="tab" id="headingOne"> + <h4 class="panel-title"> + <a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapse{{ pcat }}" aria-expanded="{% if pcat == page.category %}true{% else %}false{% endif %}" aria-controls="collapse{{ pcat }}"> + {{ pcat | capitalize | replace: "-", " " }} + </a> + </h4> + </div> + <div id="collapse{{pcat}}" class="panel-collapse collapse{% if pcat == page.category %} in{% endif %}" role="tabpanel" aria-labelledby="headingOne"> + <div class="panel-body"> + {% assign items = dcat.items | sort: 'order' %} + {% for item in items %} + <div class="row doc-sidebar-link"><a href="{{ item.url }}">{{ item.title }}</a></div> + {% endfor %} + </div> + </div> + {% endif %} + {% endfor %} + {% endfor %} + </div> + </div> + </div> + <div class="col-md-10"> + {% if page.category %} + <p>Fluo {{ page.version }} documentation >> {{ page.category | capitalize | replace: "-", " " }} >> {{ page.title }}</p> + {% endif %} + + <div class="alert alert-danger" style="margin-bottom: 0px;" role="alert">This documentation is for a future release of Fluo! <a href="{{ site.baseurl }}/docs/fluo/{{ site.latest_fluo_release }}/">View documentation for the latest release</a>.</div> + + {% unless page.nodoctitle %} + <div class="row"> + <div class="col-md-10"><h1>{{ page.title }}</h1></div> + <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;" href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + {% endunless %} + {{ content }} + + <div class="row" style="margin-top: 20px;"> + <div class="col-md-10"><strong>Find documentation for all Fluo releases in the <a href="{{ site.baseurl }}/docs/">archive</strong></div> + <div class="col-md-2"><a class="pull-right" href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}" role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this page</small></a></div> + </div> + </div> +</div> diff --git a/css/fluo.scss b/css/fluo.scss index 150608e..6e9d7fe 100644 --- a/css/fluo.scss +++ b/css/fluo.scss @@ -30,6 +30,12 @@ code { margin-left: 0px; } +.doc-sidebar-link { + font-size: 14px; + margin-bottom: 10px; + margin-left: 0px; +} + tr:nth-child(even) {background: #F3F3F3} tr:nth-child(odd) {background: #FFF} diff --git a/docs/fluo/1.0.0-incubating/index.md b/docs/fluo/1.0.0-incubating/index.md index 4373f85..9ecbe94 100644 --- a/docs/fluo/1.0.0-incubating/index.md +++ b/docs/fluo/1.0.0-incubating/index.md @@ -33,6 +33,8 @@ Below are helpful resources for Fluo application developers: * [Metrics] - Fluo metrics are visible via JMX by default but can be configured to send to Graphite or Ganglia +**Find documentation for all Fluo releases in the [archive](/docs/)**. + [related]: /related-projects/ [tour]: /tour/ [accumulo]: https://accumulo.apache.org diff --git a/docs/fluo/1.1.0-incubating/index.md b/docs/fluo/1.1.0-incubating/index.md index 9f5e520..18914d7 100644 --- a/docs/fluo/1.1.0-incubating/index.md +++ b/docs/fluo/1.1.0-incubating/index.md @@ -33,6 +33,8 @@ Below are helpful resources for Fluo application developers: * [Metrics] - Fluo metrics are visible via JMX by default but can be configured to send to Graphite or Ganglia +**Find documentation for all Fluo releases in the [archive](/docs/)**. + [fluo]: https://fluo.apache.org/ [related]: https://fluo.apache.org/related-projects/ [tour]: https://fluo.apache.org/tour/ diff --git a/docs/index.md b/docs/index.md index b7ace5f..9e718e5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,13 +1,11 @@ --- layout: page -title: Documentation +title: Documentation Archive redirect_from: - /docs/fluo/ - /docs/fluo-recipes/ --- -For a general overview of Fluo, take the [Fluo tour](/tour/). - [Apache Fluo] and [Apache Fluo Recipes] have separate documentation as they are different repositories with their own release cycle. ### Apache Fluo documentation diff --git a/resources/docs/fluo-architecture.odg b/resources/docs/fluo-architecture.odg new file mode 100644 index 0000000..fb2a9ad Binary files /dev/null and b/resources/docs/fluo-architecture.odg differ diff --git a/resources/docs/fluo-architecture.png b/resources/docs/fluo-architecture.png new file mode 100644 index 0000000..3ba96fd Binary files /dev/null and b/resources/docs/fluo-architecture.png differ -- To stop receiving notification emails like this one, please contact ['"commits@fluo.apache.org" <commits@fluo.apache.org>'].