incubator-rya git commit: RYA-342 RYA-321 Added documentation on the shell and PCJ Updater. Closes #226.

caleb Fri, 29 Sep 2017 13:13:14 -0700

Repository: incubator-rya
Updated Branches:
  refs/heads/master 33ef52cbb -> 5db4c8234



RYA-342 RYA-321 Added documentation on the shell and PCJ Updater. Closes #226.


Project: http://git-wip-us.apache.org/repos/asf/incubator-rya/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-rya/commit/5db4c823
Tree: http://git-wip-us.apache.org/repos/asf/incubator-rya/tree/5db4c823
Diff: http://git-wip-us.apache.org/repos/asf/incubator-rya/diff/5db4c823

Branch: refs/heads/master
Commit: 5db4c8234d9ec6d102dac4587394d37fef0a9246
Parents: 33ef52c
Author: jdasch <[email protected]>
Authored: Mon Sep 11 14:47:12 2017 -0400
Committer: Caleb Meier <[email protected]>
Committed: Fri Sep 29 13:11:25 2017 -0700

----------------------------------------------------------------------
 README.md                                       |   2 +-
 extras/rya.manual/src/site/markdown/_index.md   |   1 -
 extras/rya.manual/src/site/markdown/index.md    |   3 +-
 .../rya.manual/src/site/markdown/pcj-updater.md | 513 +++++++++++++++++++
 extras/rya.manual/src/site/markdown/shell.md    | 334 ++++++++++++
 extras/rya.manual/src/site/site.xml             |   4 +-
 6 files changed, 853 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 640d1b6..45399de 100644
--- a/README.md
+++ b/README.md
@@ -339,7 +339,7 @@ myRepository.shutDown();
 ```
 
 
-[RYA]: http://rya.incubator.apache.org/ 
+[Apache Rya]: http://rya.incubator.apache.org/ 
 [Accumulo]: https://accumulo.apache.org/
 [ZooKeeper]: https://zookeeper.apache.org/
 [Hadoop]: http://hadoop.apache.org/

http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/extras/rya.manual/src/site/markdown/_index.md
----------------------------------------------------------------------
diff --git a/extras/rya.manual/src/site/markdown/_index.md 
b/extras/rya.manual/src/site/markdown/_index.md
index 9e682a5..901170a 100644
--- a/extras/rya.manual/src/site/markdown/_index.md
+++ b/extras/rya.manual/src/site/markdown/_index.md
@@ -45,4 +45,3 @@ This project contains documentation about Apache Rya, a 
scalable RDF triple stor
 
 # Development
 - [Building From Source](build-source.md)
-- [LTS Maven Settings XML](maven-settings.md)

http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/extras/rya.manual/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/extras/rya.manual/src/site/markdown/index.md 
b/extras/rya.manual/src/site/markdown/index.md
index 9e682a5..4e009a0 100644
--- a/extras/rya.manual/src/site/markdown/index.md
+++ b/extras/rya.manual/src/site/markdown/index.md
@@ -32,6 +32,8 @@ This project contains documentation about Apache Rya, a 
scalable RDF triple stor
 - [Pre-computed Joins](loadPrecomputedJoin.md)
 - [Inferencing](infer.md)
 - [MapReduce Interface](mapreduce.md)
+- [Shell Interface](shell.md)
+- [Incremental Join Maintenance Application (PCJ Updater)](pcj-updater.md)
 
 # Samples
 - [Typical First Steps](sm-firststeps.md)
@@ -45,4 +47,3 @@ This project contains documentation about Apache Rya, a 
scalable RDF triple stor
 
 # Development
 - [Building From Source](build-source.md)
-- [LTS Maven Settings XML](maven-settings.md)

http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/extras/rya.manual/src/site/markdown/pcj-updater.md
----------------------------------------------------------------------
diff --git a/extras/rya.manual/src/site/markdown/pcj-updater.md 
b/extras/rya.manual/src/site/markdown/pcj-updater.md
new file mode 100644
index 0000000..8f3c27f
--- /dev/null
+++ b/extras/rya.manual/src/site/markdown/pcj-updater.md
@@ -0,0 +1,513 @@
+<!--
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+-->
+# Incremental Join Maintenance Application (PCJ Updater)
+
+The Apache Rya `rya.pcj.fluo.app` project contains an [Apache Fluo] Incremental
+Join Maintenance Application (PCJ Updater).  
+
+The [Rya Shell Interface](shell.md) provides a command line utility for the 
+registration of new persisted queries within the Rya-Fluo incremental join 
+maintenance application.  This section provides instructions on setting up the 
+maintenance application on a distributed Apache Hadoop YARN execution 
+environment with Apache Accumulo.
+
+## Installation of Fluo and the Rya PCJ Updater Application
+
+There are a number of steps required to ensure that both Fluo and the Rya PCJ 
+Updater Application are configured correctly for the target execution 
environment.
+
+### 1. Fluo Installation
+
+To install the `rya.pcj.fluo.app`, it is necessary to download the Apache Fluo 
+1.0.0-incubating release.
+
+```sh
+wget 
https://www.apache.org/dist/incubator/fluo/fluo/1.0.0-incubating/fluo-1.0.0-incubating-bin.tar.gz
+tar xzvf fluo-1.0.0-incubating-bin.tar.gz
+```
+
+
+
+### 2. Fluo Configuration
+Below is an abridged version of instructions for configuring Fluo to work with 
+Rya.  For complete installation instructions, see the 
+[Apache Fluo 1.0.0-incubating Documentation].
+
+``` sh
+cd fluo-1.0.0-incubating
+
+# copy the example properties to the conf directory
+cp conf/examples/* conf/
+
+# edit the base fluo properties file which is used for new applications
+vi conf/fluo.properties
+```
+
+The following properties in the `conf/fluo.properties` file should be
+uncommented and populated with appropriate values for your 
+Accumulo/Hadoop (YARN)/Zookeeper execution environment:
+
+```
+fluo.client.zookeeper.connect=${fluo.client.accumulo.zookeepers}/fluo
+fluo.client.accumulo.instance=<accumulo instance name>
+fluo.client.accumulo.user=<accumulo user name>
+fluo.client.accumulo.password=<accumulo user password>
+fluo.client.accumulo.zookeepers=<your zookeeper connect string>
+fluo.admin.hdfs.root=hdfs://<your hdfs host name>:8020
+```
+
+### 3. Fluo Classpath Configuration
+Fluo defers realization of dependencies until as late as possible.  You can 
+either download dependencies from the internet, or install on a system that 
+already has the dependencies installed on it.  Regardless of approach taken,
+the `fluo-1.0.0-incubating/conf/fluo-env.sh` file will need to be tailored to
+your execution environment.  See the 
+[Apache Fluo 1.0.0-incubating Install Instructions] for more information.
+
+The following instructions go through the steps of downloading dependencies 
from
+the internet.  Note, you will still need a system with the correct version of 
+hadoop installed on it as `bin/fluo` requires the `$HADOOP_PREFIX/bin/hdfs` 
+command to be available.
+
+``` sh
+# If using a vendor's distribution of hadoop, edit the lib/ahz/pom.xml to 
specify the vendor's maven repo.
+vi lib/ahz/pom.xml
+    <repositories>
+        <repository>
+            <id>vendor</id>
+            
<url>https://repository.vendor.com/content/repositories/releases/</url>
+        </repository>
+     </repositories>
+./lib/fetch.sh ahz -Daccumulo.version=1.7.3 -Dhadoop.version=2.6.0-vendor5.8.5 
-Dzookeeper.version=3.4.5-vendor5.8.5
+
+# Otherwise fetch the desired the apache release versions for accumulo, hadoop 
and zookeeper
+./lib/fetch.sh ahz -Daccumulo.version=1.7.3 -Dhadoop.version=2.6.5 
-Dzookeeper.version=3.4.6
+
+# Then fetch the remaining Fluo dependencies
+./lib/fetch.sh extra
+```
+
+Next it is necessary to update the `fluo-1.0.0-incubating/conf/fluo-env.sh` 
file
+to use the locally downloaded libraries.
+
+```
+vi conf/fluo-env.sh
+```
+
+The listing below highlights a few modifications that may need to be made to 
the
+`fluo-env.sh` to adapt it to your system:
+
+1) Define a value for the environmental variable `HADOOP_PREFIX` if it is not
+   already set.  The correct value depends on your system configuration and 
+   could be `/usr`, `/usr/lib/hadoop`, or perhaps another path.
+2) Depending on the value used for `HADOOP_PREFIX`, which may or may not 
include
+   a directory for `$HADOOP_PREFIX/etc/hadoop`, it may be necessary to modify 
+   the shell variable `CLASSPATH` to include the hadoop configuration 
directory.
+   In the following listing, we append the directory `/etc/hadoop/conf` to the 
+   `CLASSPATH`.
+3) Uncomment the `setupClasspathFromLib` function and comment the 
+   `setupClasspathFromSystem`.
+
+```sh
+# Sets HADOOP_PREFIX if it is not already set.  Please modify the
+# export statement to use the correct directory.  Remove the test
+# statement to override any previously set environment.
+
+#test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/path/to/hadoop
+test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/usr
+
+#
+# ...
+#
+
+# This function obtains Accumulo, Hadoop, and Zookeeper jars from
+# $FLUO_HOME/lib/ahz/. Before using this function, make sure you run
+# `./lib/fetch.sh ahz` to download dependencies to this directory.
+setupClasspathFromLib(){
+  #CLASSPATH="$FLUO_HOME/lib/*:$FLUO_HOME/lib/logback/*:$FLUO_HOME/lib/ahz/*"
+  
CLASSPATH="$FLUO_HOME/lib/*:$FLUO_HOME/lib/logback/*:$FLUO_HOME/lib/ahz/*:/etc/hadoop/conf"
+}
+
+# Call one of the following functions to setup the classpath or write your own
+# bash code to setup the classpath for Fluo. You must also run the command
+# `./lib/fetch.sh extra` to download extra Fluo dependencies before using Fluo.
+
+#setupClasspathFromSystem
+setupClasspathFromLib
+```
+As discussed above, Fluo requires some hadoop configuration files to be
+accessible, either in the `$HADOOP_PREFIX/etc/hadoop` directory, or on the 
+classpath.  The requirements for these configuration files are system specific,
+and it is recommended that they be copied from the target system.  However, if
+configuring manually, the required files `core-site.xml` and 
+`yarn-site.xml` should have at a minimum the following properties configured.
+
+In the file `core-site.xml`:
+
+```
+  <property>
+    <name>fs.defaultFS</name>
+    <value>hdfs://[your hdfs host name]:8020</value>
+  </property>
+```
+
+In the file `yarn-site.xml`:
+
+```
+  <property>
+    <name>yarn.resourcemanager.hostname</name>
+    <value>[your yarn resourcemanager hostname]</value>
+  </property>
+```
+
+### 4. Create and Configure a New Fluo App for the Rya PCJ Updater
+
+Now that Fluo has been configured to work with your target 
+Accumulo/Hadoop/Zookeeper execution environment, it is time to specify a Fluo 
+App definition for the Rya Incremental Join Maintenance Application (PCJ 
Updater).
+
+Note, in this documentation we will refer to this Fluo App with the 
fluoApplicationId 
+`rya_pcj_updater`, but the current convention is for the fluoApplicationId to 
be
+a completion of a rya instance name.  For example, if the Rya instance is 
+`my_rya_instance_` then the recommended corresponding fluoApplicationID would 
be `my_rya_instance_pcj_updater`.
+
+The `bin/fluo new <fluoApplicationId>` command uses the base 
+`fluo-1.0.0-incubating/conf/fluo.properties` file that was configured earlier 
in
+this guide as a template for this Fluo Application.
+
+```sh
+# Create the new Fluo Application
+bin/fluo new rya_pcj_updater
+
+# Edit the Fluo Application Configuration
+vi apps/rya_pcj_updater/conf/fluo.properties
+```
+
+Add the following entries under Observer properties in the 
+`apps/rya_pcj_updater/conf/fluo.properties` file.
+
+```
+# Observer properties
+# -------------------
+# Specifies observers
+# fluo.observer.0=com.foo.Observer1
+# Can optionally have configuration key values
+# fluo.observer.1=com.foo.Observer2,configKey1=configVal1,configKey2=configVal2
+fluo.observer.0=org.apache.rya.indexing.pcj.fluo.app.observers.TripleObserver
+fluo.observer.1=org.apache.rya.indexing.pcj.fluo.app.observers.StatementPatternObserver
+fluo.observer.2=org.apache.rya.indexing.pcj.fluo.app.observers.JoinObserver
+fluo.observer.3=org.apache.rya.indexing.pcj.fluo.app.observers.FilterObserver
+fluo.observer.4=org.apache.rya.indexing.pcj.fluo.app.observers.AggregationObserver
+fluo.observer.5=org.apache.rya.indexing.pcj.fluo.app.observers.ProjectionObserver
+#fluo.observer.5=org.apache.rya.indexing.pcj.fluo.app.observers.ConstructQueryResultObserver
+fluo.observer.6=org.apache.rya.indexing.pcj.fluo.app.observers.QueryResultObserver,pcj.fluo.export.rya.enabled=true,pcj.fluo.export.rya.ryaInstanceName=rya_,pcj.fluo.export.rya.accumuloInstanceName=myAccumuloInstance,pcj.fluo.export.rya.zookeeperServers=zoo1;zoo2;zoo3,pcj.fluo.export.rya.exporterUsername=myUserName,pcj.fluo.export.rya.exporterPassword=myPassword,pcj.fluo.export.kafka.enabled=true,bootstrap.servers=myKafkaBroker:9092,key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer,value.serializer=org.apache.rya.indexing.pcj.fluo.app.export.kafka.KryoVisibilityBindingSetSerializer
+```
+
+Description of configuration keys for the 
+`org.apache.rya.indexing.pcj.fluo.app.observers.QueryResultObserver`:
+
+Key                                      | Description
+---------------------------------------- | -------------
+pcj.fluo.export.rya.enabled              | If true, `pcj.fluo.export.rya.*` 
prefixed properties will be used for exporting query results to Rya. If false, 
they are ignored and can be omitted.
+pcj.fluo.export.rya.ryaInstanceName      | The Rya Instance (ie, 
`my_rya_instance_`) this PCJ Updater app should be exporting to.
+pcj.fluo.export.rya.accumuloInstanceName | The Accumulo instance that is 
hosting the specified Rya Instance.
+pcj.fluo.export.rya.zookeeperServers     | The Zookeeper connect string for 
the Zookeepers that are used by the Accumulo instance that is hosting the 
specified Rya Instance.  Note, the `host:port` values are separated by 
semi-colons instead of the traditional commas.
+pcj.fluo.export.rya.exporterUsername     | The Accumulo username to be used 
for the Rya Export operation.
+pcj.fluo.export.rya.exporterPassword     | The Accumulo password to be used 
for the Rya Export operation.
+pcj.fluo.export.kafka.enabled            |  If true, the `bootstrap.servers`, 
`key.serializer`, and `value.serializer` properties will be used for exporting 
query results to Kafka. If false, they are ignored and can be omitted.
+bootstrap.servers                        | A `hostname:port` string specifying 
a kafka broker.  Note, multiple bootstrap servers are not currently supported.
+key.serializer                           | The Kafka serializer class that 
should be used for keys published to the query result topic.  Default value: 
`org.apache.kafka.common.serialization.ByteArraySerializer`.
+value.serializer                         | The Kafka serializer class that 
should be used for values published to the query result topic.  Default value: 
`org.apache.rya.indexing.pcj.fluo.app.export.kafka.KryoVisibilityBindingSetSerializer`.
+
+Depending on the workload, it may be necessary to increase the resources of a 
+Fluo worker's YARN container, or to distribute the Observers defined in the 
+listing above into multiple Fluo workers that are located in multiple YARN 
+containers to scale performance.  The following table contains descriptions of 
+relevant properties in the `YARN properties` section of the `fluo.properties` 
+file that can be tailored.
+
+Key                            | Description
+-------------------------------| -------------
+fluo.yarn.worker.instances     | Defines the number of YARN containers used 
for executing Observers.  Allows for scaling out.
+fluo.yarn.worker.max.memory.mb | Defines the amount of memory in Megabytes 
that should be allocated to a worker's YARN container.  Allows for scaling up.
+fluo.yarn.worker.num.cores     | Defines the number of CPUs that should be 
allocated to a worker's YARN container.  Allows for scaling up.
+
+
+### 5. Stage the Rya PCJ Updater Fluo App Jar
+
+The RYA PCJ Updater Fluo App jar is in a special uber jar that contains a 
subset of dependencies.
+This jar is represented by the maven coordinate 
+`org.apache.rya:rya.pcj.fluo.app:3.2.11-incubating:fluo-app` and when Rya is 
+built from source, it can be found here:
+`rya/extras/rya.pcj.fluo/pcj.fluo.app/target/rya.pcj.fluo.app-3.2.11-incubating-fluo-app.jar`.
+
+The Rya fluo-app jar needs to be copied to Fluo here: 
+`fluo-1.0.0-incubating/apps/rya_pcj_updater/lib/rya.pcj.fluo.app-3.2.11-incubating-fluo-app.jar`
+
+
+### 6. Initialize the Rya PCJ Updater Fluo App
+
+The initialization step creates entries in the Zookeeper cluster for this Fluo 
+application
+
+This step also copies the Fluo jars over to HDFS so Accumulo tablet servers can
+access custom Fluo iterators.
+
+```sh
+bin/fluo init rya_pcj_updater
+```
+
+
+### 7. Create the Rya instance for this Rya PCJ Updater
+
+The [Rya Shell Interface](shell.md) provides an interface to create Rya 
+instances.  See this documentation for more information on the shell.
+
+To create and connect to a Rya instance that is configured to use a PCJ 
Updater, 
+use the following commands in the rya shell:
+
+```
+$ rya
+
+ _____                _____ _          _ _
+|  __ \              / ____| |        | | |
+| |__) |   _  __ _  | (___ | |__   ___| | |
+|  _  / | | |/ _` |  \___ \| '_ \ / _ \ | |
+| | \ \ |_| | (_| |  ____) | | | |  __/ | |
+|_|  \_\__, |\__,_| |_____/|_| |_|\___|_|_|
+        __/ |
+       |___/
+3.2.11-incubating
+
+Welcome to the Rya Shell.
+
+Execute one of the connect commands to start interacting with an instance of 
Rya.
+You may press tab at any time to see which of the commands are available.
+rya>
+rya> connect-accumulo --username myUserName --instanceName myAccumuloInstance 
--zookeepers zoo1,zoo2,zoo3
+Password: *********
+Connected. You must select a Rya instance to interact with next.
+rya/myAccumuloInstance> install-with-parameters --instanceName rya_ 
--enablePcjIndex --fluoPcjAppName rya_pcj_updater
+
+A Rya instance will be installed using the following values:
+   Instance Name: rya_
+   Use Shard Balancing: false
+   Use Entity Centric Indexing: false
+   Use Free Text Indexing: false
+   Use Geospatial Indexing: false
+   Use Temporal Indexing: false
+   Use Precomputed Join Indexing: true
+   PCJ Updater Fluo Application Name: rya_pcj_updater
+
+Continue with the install? (y/n) y
+The Rya instance named 'rya_' has been installed.
+rya/myAccumuloInstance> connect-rya --instance rya_
+rya/myAccumuloInstance:rya_>
+
+```
+
+
+### 8. Start the Rya PCJ Updater Fluo App
+
+Now that the Rya instance has been created, to start the app, issue the 
+following command to start the Rya PCJ Updater on YARN:
+
+```sh
+bin/fluo start rya_pcj_updater
+```
+
+### 9.  Creating and Deleting PCJ Queries
+
+Once the PCJ Updater app has been started, it is now possible to register and 
+unregister SPARQL Queries with it using the `create-pcj` and `delete-pcj` Rya 
+shell commands.  It is possible to see details on registered PCJ Queries using 
+the `print-instance-details` Rya shell command.  See the 
+[Rya Shell Interface](shell.md) documentation for more information on this 
step.
+
+
+### 10.  Stop the Rya PCJ Updater Fluo App
+
+To stop the Rya PCJ Updater on YARN, issue the following command:
+
+```sh
+bin/fluo stop rya_pcj_updater
+```
+
+## Troubleshooting
+
+### Notification Latency
+
+Fluo employs a scan backoff that dynamically adjusts the scan interval between 
+a minimum and maximum delay to reduce the amount of scanning overhead if the
+database becomes idle with no modifications.  This reduced overhead comes with 
+a cost of increased latency for an initial notification on an idle database.
+
+There are two internal fluo properties (`fluo.implScanTask.minSleep` and 
+`fluo.implScanTask.maxSleep`, both in milliseconds) that can be modified to 
+tailor the scanning overhead and maximum initial notification latency for your
+use case.
+
+For the scenario where a database is tends to be active and frequently 
modified, 
+scan latency will largely be influenced by the property 
+`fluo.implScanTask.minSleep` which has a default value of 5 seconds.
+
+For the scenario where a database is tends to be idle and infrequently 
modified, 
+scan latency will largely be influenced by the property 
+`fluo.implScanTask.maxSleep` which has a default value of 5 minutes.
+
+To configure these settings, modify your Fluo Application's 
+`fluo-1.0.0-incubating/apps/rya_pcj_updater/conf/fluo.properties` file to 
+contain the the following section and tailor the values for your use case:
+
+```
+# Fluo Internal Implementation Properties (Not part of public API)
+------------------------------------------------------------------
+# fluo.implScanTask.minSleep default value is 5000ms (5 seconds)
+fluo.implScanTask.minSleep = 5000
+# fluo.implScanTask.maxSleep default value is 300000ms (5 minutes)
+fluo.implScanTask.maxSleep = 300000
+```
+
+
+### VFS Classloader and Fluo Iterators
+
+Accumulo may generate warnings that the Apache Commons VFS classloader cannot 
+find Fluo jars on HDFS, or that Accumulo is unable to find Fluo iterators. 
There
+are typically two reasons why this occurs: HDFS Accessibility or the Accumulo 
+VFS Cache Dir.
+
+#### HDFS Accessibility
+The Fluo Jars `fluo-api-1.0.0-incubating.jar` and 
+`fluo-accumulo-1.0.0-incubating.jar` are not copied to HDFS or they have been
+copied with permissions that make then inaccessible by the Accumulo Tablet 
+servers.  Verify the property `fluo.admin.accumulo.classpath` in 
+`fluo-1.0.0-incubating/apps/rya_pcj_updater/conf/fluo.properties` is correct.
+The default value is typically adequate: 
+
+```
+  
fluo.admin.accumulo.classpath=${fluo.admin.hdfs.root}/fluo/lib/fluo-api-1.0.0-incubating.jar,${fluo.admin.hdfs.root}/fluo/lib/fluo-accumulo-1.0.0-incubating.jar`.
+```
+It is possible to verify that the correct Fluo iterators are installed for the
+table by running this command in the Accumulo shell:
+`config -t rya_pcj_updater -f iterators`.
+
+#### Accumulo VFS Cache Dir
+The configuration of `accumulo/conf/accumulo-site.xml` needs to be updated to
+explicitly include a definition for the property `general.vfs.cache.dir`.  The
+Accumulo tablet servers need to be restarted to get the new property. 
+Depending on system configuration, `/tmp` or `/var/lib/accumulo` may be 
+appropriate.  An example entry is listed below:
+  
+```
+<property>
+  <name>general.vfs.cache.dir</name>
+  <value>/var/lib/accumulo</value>
+  <description>Directory to use for the vfs cache. The cache will keep a soft 
+  reference to all of the classes loaded in the VM. This should be on local 
disk on
+  each node with sufficient space. It defaults to /tmp and will use a 
directory with the
+  format "accumulo-vfs-cache-" + 
System.getProperty("user.name","nouser")</description>
+</property>
+```
+
+### Blocked Ports
+If the YARN NodeManagers in your cluster have firewalls enabled, it will be
+necessary to specify and open a dedicated port for the Fluo Oracle YARN 
+container. The Oracle is a mandatory component of every Fluo Application.
+
+To specify the port, modify your Fluo Application's 
+`fluo-1.0.0-incubating/apps/rya_pcj_updater/conf/fluo.properties` file to 
contain 
+the the following section:
+
+```
+# Fluo Internal Implementation Properties (Not part of public API)
+------------------------------------------------------------------
+# The Fluo Oracle uses a random free port by default.  Specify a port
+# here and open it on the firewall of all potential YARN NodeManagers.
+fluo.impl.oracle.port=[port number]
+```
+
+Fluo's underlying [Apache Twill] version does not support assignment of a port 
or 
+port range to the Resource Manager's Tracking URL.  As a result, it is always 
+assigned to a random free port on a NodeManager.  This makes it impossible to 
+use some of Fluo's administrative functionality 
+(for example, `bin/fluo stop rya_pcj_updater`) on a cluster where firewalls are
+enabled on the NodeManagers.  Even with this limitation, it is still possible 
to
+successfully launch the Rya PCJ Updater app and terminate it when desired.  
+
+If your target execution environment has firewalls enabled, the following 
issues
+may occur while starting and stopping.
+
+#### Starting Issues
+It is likely that the command `bin/fluo start rya_pcj_updater` 
+will timeout while waiting for a ResourceReport from the Twill TrackerService, 
+or you may throw a series of `java.net.NoRouteToHostException` exceptions 
+like in the following listing:
+
+```
+...
+15:57:39.802 [main] INFO  o.a.f.cluster.runner.YarnAppRunner - Waiting for 
ResourceReport from Twill. Elapsed time = 10000 ms
+15:57:45.913 [ STARTING] INFO  o.a.h.y.c.api.impl.YarnClientImpl - Submitted 
application application_1496425295778_0015
+15:57:49.838 [main] INFO  o.a.f.cluster.runner.YarnAppRunner - Waiting for 
ResourceReport from Twill. Elapsed time = 20000 ms
+15:57:53.434 [main] ERROR o.a.twill.yarn.ResourceReportClient - Exception 
getting resource report from 
http://<my-application-master-host>:<random-port>/resources.
+    java.net.NoRouteToHostException: No route to host
+      at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_102]
+      at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) 
~[na:1.8.0_102]
+      at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
 ~[na:1.8.0_102]
+      at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) 
~[na:1.8.0_102]
+      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 
~[na:1.8.0_102]
+      at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_102]
+      at java.net.Socket.connect(Socket.java:538) ~[na:1.8.0_102]
+      at sun.net.NetworkClient.doConnect(NetworkClient.java:180) 
~[na:1.8.0_102]
+      at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) 
~[na:1.8.0_102]
+      at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) 
~[na:1.8.0_102]
+      at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) 
~[na:1.8.0_102]
+      at sun.net.www.http.HttpClient.New(HttpClient.java:308) ~[na:1.8.0_102]
+      at sun.net.www.http.HttpClient.New(HttpClient.java:326) ~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
 ~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
 ~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
 ~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) 
~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1513)
 ~[na:1.8.0_102]
+      at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
 ~[na:1.8.0_102]
+      at java.net.URL.openStream(URL.java:1045) ~[na:1.8.0_102]
+      at 
org.apache.twill.yarn.ResourceReportClient.get(ResourceReportClient.java:52) 
~[twill-yarn-0.6.0-incubating.jar:0.6.0-incubating]
+      at 
org.apache.twill.yarn.YarnTwillController.getResourceReport(YarnTwillController.java:303)
 [twill-yarn-0.6.0-incubating.jar:0.6.0-incubating]
+      at 
org.apache.fluo.cluster.runner.YarnAppRunner.getResourceReport(YarnAppRunner.java:302)
 [fluo-cluster-1.0.0-incubating.jar:1.0.0-incubating]
+      at 
org.apache.fluo.cluster.runner.YarnAppRunner.start(YarnAppRunner.java:232) 
[fluo-cluster-1.0.0-incubating.jar:1.0.0-incubating]
+      at org.apache.fluo.cluster.command.FluoCommand.main(FluoCommand.java:74) 
[fluo-cluster-1.0.0-incubating.jar:1.0.0-incubating]
+...
+```
+As long as the application is submitted and is shown to be running in the 
+Hadoop YARN UI for running applications, the Rya PCJ Updater app has likely 
+been started correctly.  To verify, look at the YARN container log files to
+ensure that no unexpected errors occurred.
+
+#### Stopping Issues
+It is likely that the command `bin/fluo stop rya_pcj_updater`
+will fail.  If that occurs, look up the YARN Application-Id in the YARN UI, 
+or with the command `yarn application -list` and then kill it with a command
+similar to: `yarn application -kill application_1503402439867_0009`.
+
+
+[Apache Fluo]: https://fluo.apache.org/
+[Apache Fluo 1.0.0-incubating Documentation]: 
https://fluo.apache.org/docs/fluo/1.0.0-incubating/
+[Apache Fluo 1.0.0-incubating Install Instructions]: 
https://fluo.apache.org/docs/fluo/1.0.0-incubating/install/
+[Apache Twill]: http://twill.apache.org/

http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/extras/rya.manual/src/site/markdown/shell.md
----------------------------------------------------------------------
diff --git a/extras/rya.manual/src/site/markdown/shell.md 
b/extras/rya.manual/src/site/markdown/shell.md
new file mode 100644
index 0000000..f641d13
--- /dev/null
+++ b/extras/rya.manual/src/site/markdown/shell.md
@@ -0,0 +1,334 @@
+<!--
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+-->
+# Shell Interface
+
+The Apache Rya `rya.shell` project contains a client shell application to 
+simplify common interactions with Rya.
+
+## Installation
+
+When building from source, the binary distribution of the Rya Shell is stored
+in the artifact `rya.shell-<version>-bin.tar.gz`.
+
+To install, simply extract the archive to the desired output directory
+
+``` sh
+tar xzvf rya.shell-3.2.11-incubating-bin.tar.gz
+```
+
+You can optionally install the Rya Shell by adding its `bin` directory to your 
+shell's `PATH`.
+
+``` sh
+$ echo "PATH=$PATH:path/to/rya.shell-3.2.11-incubating/bin" >> ~/.bash_profile
+```
+
+## Launching, Exiting and Help
+
+```
+# Launch the shell
+$ cd rya.shell-3.2.11-incubating-bin
+$ bin/rya
+
+# Or, if you added the rya shell to your path, you can just type:
+$ rya
+
+ _____                _____ _          _ _ 
+|  __ \              / ____| |        | | |
+| |__) |   _  __ _  | (___ | |__   ___| | |
+|  _  / | | |/ _` |  \___ \| '_ \ / _ \ | |
+| | \ \ |_| | (_| |  ____) | | | |  __/ | |
+|_|  \_\__, |\__,_| |_____/|_| |_|\___|_|_|
+        __/ |                              
+       |___/                               
+3.2.11-incubating
+
+Welcome to the Rya Shell.
+
+Execute one of the connect commands to start interacting with an instance of 
Rya.
+You may press tab at any time to see which of the commands are available.
+rya> 
+
+```
+Once you have launched the shell, to leave simply type `exit` or `quit`.
+
+``` sh
+rya> exit
+```
+
+To view a listing of all available commands use the `help` command.
+
+```
+rya> help
+* ! - Allows execution of operating system (OS) commands
+* // - Inline comment markers (start of line only)
+* ; - Inline comment markers (start of line only)
+* add-user - Adds an authorized user to the Rya instance.
+* clear - Clears the console
+* cls - Clears the console
+* connect-accumulo - Connect the shell to an instance of Accumulo.
+* connect-rya - Connect to a specific Rya instance
+* create-pcj - Creates and starts the maintenance of a new PCJ using a Fluo 
application.
+* date - Displays the local date and time
+* delete-pcj - Deletes and halts maintenance of a PCJ.
+* disconnect - Disconnect the shell's Rya storage connection (Accumulo).
+* exit - Exits the shell
+* help - List all commands usage
+* install - Create a new instance of Rya interactively.
+* install-with-parameters - Create a new instance of Rya with command line 
parameters.
+* list-instances - List the names of the installed Rya instances.
+* load-data - Loads RDF Statement data from a local file to the connected Rya 
instance.
+* print-connection-details - Print information about the Shell's Rya storage 
connection.
+* print-instance-details - Print information about how the Rya instance is 
configured.
+* quit - Exits the shell
+* remove-user - Removes an authorized user from the Rya instance.
+* script - Parses the specified resource file and executes its commands
+* sparql-query - Executes the provided SPARQL Query on the connected Rya 
instance.
+* system properties - Shows the shell's properties
+* uninstall - Uninstall an instance of Rya.
+* version - Displays shell version
+```
+
+The help modifier can be used to provide additional details on a command's
+mandatory options:
+
+```
+rya> connect-accumulo help
+You should specify option (--username, --instanceName, --zookeepers) for this 
command
+```
+
+The help command can be used to provide complete documentation on a command's
+options:
+
+```
+rya> help connect-accumulo
+Keyword:                   connect-accumulo
+Description:               Connect the shell to an instance of Accumulo.
+ Keyword:                  username
+   Help:                   The username that will be used to connect to 
Accummulo.
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  instanceName
+   Help:                   The name of the Accumulo instance that will be 
connected to.
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+ Keyword:                  zookeepers
+   Help:                   A comma delimited list of zookeeper server 
hostnames.
+   Mandatory:              true
+   Default if specified:   '__NULL__'
+   Default if unspecified: '__NULL__'
+
+* connect-accumulo - Connect the shell to an instance of Accumulo.
+```
+
+## Context Sensitive Commands
+
+Some commands may not be available to the user until certain preconditions are 
+met.  For example, you cannot create a Rya instance until you are connected to 
+an Accumulo instance.
+
+Pressing the tab character while at the `rya>` prompt will display the 
available 
+commands for the current shell context (or state).
+
+Pressing the tab key while typing a command will autocomplete the command 
+and subsequent tab key presses then begin suggesting mandatory options for 
that 
+command.
+
+## Scripting
+
+It is possible to script the Rya Shell by writing multiple commands to a text 
+file and then load them into the shell with the `script` command:
+
+```
+rya> script --file rya.shell-3.2.11-incubating/examples/example.script
+```
+
+## Logs
+
+Logging for the rya shell is written to the `rya.shell-3.2.11-incubating/logs`
+directory.  Configuration of the logging is controlled by the 
+`rya.shell-3.2.11-incubating/conf/log4j.properties` file.
+
+## Creating a Rya Instance
+
+Creating a Rya instance first requires making a connection to Accumulo.  See
+the following Rya shell listing:
+
+```
+rya> connect-accumulo --username myUserName --instanceName myAccumuloInstance 
--zookeepers zoo1,zoo2,zoo3
+Password: *********
+Connected. You must select a Rya instance to interact with next.
+rya/myAccumuloInstance>
+```
+
+Once connected to Accumulo, there are two options for creating a Rya instance.
+- Interactive with the `install` command.  This is useful for a guided install.
+- Parameterized with the `install-with-parameter` command.  This is useful for 
a scripted install.
+
+Example creating and connecting to a Rya instance using the interactive 
`install` command:
+
+```
+rya/myAccumuloInstance> install
+Rya Instance Name [default: rya_]: rya1_
+Use Shard Balancing (improves streamed input write speeds) [default: false]:
+Use Entity Centric Indexing [default: true]:
+Use Free Text Indexing [default: true]:
+Use Geospatial Indexing [default: true]:
+Use Temporal Indexing [default: true]:
+Use Precomputed Join Indexing [default: true]:
+Use a Fluo application to update the PCJ Index? (y/n) n
+
+A Rya instance will be installed using the following values:
+   Instance Name: rya1_
+   Use Shard Balancing: false
+   Use Entity Centric Indexing: true
+   Use Free Text Indexing: true
+   Use Geospatial Indexing: true
+   Use Temporal Indexing: true
+   Use Precomputed Join Indexing: true
+   Not using a PCJ Updater Fluo Application
+
+Continue with the install? (y/n) y
+The Rya instance named 'rya1_' has been installed.
+rya/myAccumuloInstance> connect-rya --instance rya1_
+rya/myAccumuloInstance:rya1_>
+```
+
+Example creating and connecting to a Rya instance using the parameterized 
`install-with-parameter` command:
+
+```
+rya/myAccumuloInstance> install-with-parameters --instanceName rya_ 
--enablePcjIndex --fluoPcjAppName rya_pcj_updater
+
+A Rya instance will be installed using the following values:
+   Instance Name: rya_
+   Use Shard Balancing: false
+   Use Entity Centric Indexing: false
+   Use Free Text Indexing: false
+   Use Geospatial Indexing: false
+   Use Temporal Indexing: false
+   Use Precomputed Join Indexing: true
+   PCJ Updater Fluo Application Name: rya_pcj_updater
+
+Continue with the install? (y/n) y
+The Rya instance named 'rya_' has been installed.
+rya/myAccumuloInstance> connect-rya --instance rya_
+rya/myAccumuloInstance:rya_>
+```
+
+## Deleting a Rya Instance
+
+In order to delete a Rya instance, it must be connected.  Then use the 
`uninstall` command:
+
+```
+rya/myAccumuloInstance:rya1_> uninstall
+Are you sure you want to uninstall this instance of Rya named 'rya1_'? y
+The Rya instance named 'rya1_' has been uninstalled.
+```
+
+## Loading Data
+
+The `load-data` command can be used to load RDF Statement data in a variety of 
formats.  If only the `--file` option is specified, the shell will attempt to 
determine the file format by filename.  To specify a specific format, include 
the `--format` option.  Use the `help load-data` command to see a list of all 
available formats.
+
+```
+rya/myAccumuloInstance:rya1_> load-data --file  examples/triples.nt
+Detected RDF Format: N-Triples (mimeTypes=text/plain; ext=nt)
+Loaded the file: 'examples/triples.nt' successfully in 1.843 seconds.
+rya/myAccumuloInstance:rya1_>
+
+```
+
+## Issuing a SPARQL Query
+
+Use the `sparql-query` command to launch an interactive prompt for composing a
+SPARQL query to be executed on the connected Rya instance.  To load an existing
+SPARQL query from a file, add the `--file` option with a filepath to the 
command.
+
+```
+rya/myAccumuloInstance:rya_> sparql-query --file examples/Query1.sparql
+Loaded Query:
+PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+SELECT ?thing ?name WHERE {
+  ?thing <http://predicates#name> ?name .
+  ?thing rdf:type <http://types#Monkey> .
+}
+Executing Query...
+Query Result:
+thing,name
+http://Thing1,Thing 1
+http://Thing3,Thing 3
+Retrieved 2 results in 0.039 seconds.
+```
+
+## Creating a PCJ Query
+
+Use the `create-pcj` command to launch an interactive prompt for composing a 
SPARQL query that will be registered with the Rya PCJ Updater Fluo App for the
+connected Rya instance.  It is necessary to specify one or more export strategy
+with the `--exportToKafka` and/or `--exportToRya` command options.  Note, the
+Rya PCJ Updater Fluo App must be configured to support the specified export
+strategy.
+
+## Deleting a PCJ Query
+
+Use the `delete-pcj --pcjId` command to delete a SPARQL query that is 
registered
+with the Rya PCJ Updater Fluo App.  To get a list of registered queries, use 
the
+`print-instance-details` command.
+
+## Printing Instance Details
+
+The `print-instance-details` command displays the configuration of the 
currently connected Rya instance and any associated PCJs that may have been 
added with the `create-pcj` command.
+
+```
+rya/myAccumuloInstance:rya_> print-instance-details
+General Metadata:
+  Instance Name: rya_
+  RYA Version: 3.2.11-incubating
+  Users: myUserName
+Secondary Indicies:
+  Entity Centric Index:
+    Enabled: false
+  Free Text Index:
+    Enabled: false
+  Temporal Index:
+    Enabled: false
+  PCJ Index:
+    Enabled: true
+    Fluo App Name: rya_pcj_updater
+    PCJs:
+      ID: a49cbc7a5c83429fa8f375cc75ed9ee7
+        Update Strategy: INCREMENTAL
+        Last Update Time: unavailable
+      ID: a5741933fb464cbda9abc607d9028926
+        Update Strategy: INCREMENTAL
+        Last Update Time: unavailable
+      ID: d5635bdd1b484d05ba596f9e16b46d9a
+        Update Strategy: INCREMENTAL
+        Last Update Time: unavailable
+Statistics:
+  Prospector:
+    Last Update Time: unavailable
+  Join Selectivity:
+    Last Updated Time: unavailable
+```
+

http://git-wip-us.apache.org/repos/asf/incubator-rya/blob/5db4c823/extras/rya.manual/src/site/site.xml
----------------------------------------------------------------------
diff --git a/extras/rya.manual/src/site/site.xml 
b/extras/rya.manual/src/site/site.xml
index a5fab57..fd6fcc9 100644
--- a/extras/rya.manual/src/site/site.xml
+++ b/extras/rya.manual/src/site/site.xml
@@ -45,7 +45,9 @@ under the License.
         <item name="Evaluation Table" href="eval.html"/>
         <item name="Pre-computed Joins" href="loadPrecomputedJoin.html"/>
         <item name="Inferencing" href="infer.html"/>
-        <item name="MapReduce Interface" href="mapreduce.html"/>
+        <item name="MapReduce Interface" href="mapreduce.html"/>
+        <item name="Shell Interface" href="shell.html"/>
+        <item name="Incremental Join Maintenance" href="pcj-updater.html"/>
     </menu>
 
     <menu name="Samples">

incubator-rya git commit: RYA-342 RYA-321 Added documentation on the shell and PCJ Updater. Closes #226.

Reply via email to