Github user kchilton2 commented on a diff in the pull request:
https://github.com/apache/incubator-rya/pull/273#discussion_r170708490
--- Diff: extras/rya.manual/src/site/markdown/rya-streams.md ---
@@ -0,0 +1,385 @@
+<!--
+[comment]: # Licensed to the Apache Software Foundation (ASF) under one
+[comment]: # or more contributor license agreements. See the NOTICE file
+[comment]: # distributed with this work for additional information
+[comment]: # regarding copyright ownership. The ASF licenses this file
+[comment]: # to you under the Apache License, Version 2.0 (the
+[comment]: # "License"); you may not use this file except in compliance
+[comment]: # with the License. You may obtain a copy of the License at
+[comment]: #
+[comment]: # http://www.apache.org/licenses/LICENSE-2.0
+[comment]: #
+[comment]: # Unless required by applicable law or agreed to in writing,
+[comment]: # software distributed under the License is distributed on an
+[comment]: # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+[comment]: # KIND, either express or implied. See the License for the
+[comment]: # specific language governing permissions and limitations
+[comment]: # under the License.
+-->
+
+# Rya Streams
+
+Introduced in 3.2.12
+
+## Disclaimer
+This is a Beta feature. We do not guarantee newer versions of Rya Streams
+will be compatible with this version. You may need to remove all of your
+queries and their associated data from your Rya Streams system and then
+reprocess them using the upgraded system.
+
+# Table of Contents
+- [Introduction](#introduction)
+- [Architecture](#architecture)
+- [Quick Start](#quick-start)
+- [Use Cases](#use-cases)
+- [Future Work](#future-work)
+
+<div id='introduction'/>
+
+## Introduction
+Rya Streams is a system that processes SPARQL queries over streams of RDF
+Statements that may have Visibilities attached to them. It does this by
+utilizing Kafka as a data processing platform.
+
+There are three basic building blocks that the system depends on:
+
+* **Streams Query** - This is a SPARQL query that is registered wth Rya
Streams.
+ It is associated with a specific Rya instance because that Rya instance
+ determines which Statements the query will evaluate. It has an ID that
+ uniquely identifies it across all of the queries that are managed by the
+ system, whether or not the system should be processing it, and whether
or not
+ the results of the query needs to be inserted back into the Rya instance
the
+ source statements come from.
+
+* **Query Change Log** - A list of changes that have been performed to the
+ Streams Queries of a specific Rya Instance. This log contains the
absolute
+ truth about what queries are registered, which are running, and which
generate
+ new statements that need to be inserted back into Rya.
+
+* **Query Manager** - A daemon application that reacts to new/deleted
Query
+ Change Logs as well as new entries within those logs. It starts and stops
+ Kafka Streams processing jobs for the active Streams Queries.
+
+The Quick Start section explains how the Rya Streams Client is used to
interact
+with the system using a simple SPARQL query and some sample Statements.
+
+<div id='architecture'/>
+
+## Architecture ##
+
+The following image is a high level view of how Rya Streams interacts with
+Rya to process queries.
+
+
+
+1. The Rya Streams client is used to register/update/delete queries.
+2. Rya Streams notices the change starts/stops processing a query based on
+ what the change was.
+3. Statements are discovered for ingest by whatever application is loading
data
+ into Rya.
+4. Those Statements are written to Rya Streams so that the Streams Queries
may
+ process them and produce results.
+5. Those Statements are also written to the Rya instance for ad-hoc
querying.
+6. Rya Streams produces Visibility Binding Sets and/or Visibility
Statements
+ that are written back to Rya.
+7. Those same Visibility Binding Sets and/or Visibility Statements are made
+ available to other systems as well.
+
+<div id='quick-start'/>
+
+## Quick Start ##
+This tutorial demonstrates how to install and start the Rya Streams
system. It
+must be configured and running on its own before any Rya instances may use
it.
+After performing the steps of this quick start, you will have installed
the
+system, demonstrated that it is functioning properly, and then may use the
Rya
+Shell to configure Rya instances to use it.
+
+This tutorial assumes you are starting fresh and have no existing Kafka,
or
+Zookeeper data. The two services must already be installed and running.
+
+### Step 1: Download the applications ###
+
+You can fetch the artifacts you need to follow this Quick Start from our
+[downloads page](http://rya.apache.org/download/). Click on the release of
+interest and follow the "Central repository for Maven and other dependency
+managers" URL.
+
+Fetch the following two artifacts:
+
+Artifact Id | Type
+--- | ---
+rya.streams.client | shaded jar
+rya.streams.query-manager | rpm
+
+### Step 2: Install the Query Manager ###
+
+Copy the RPM to the CentOS 7 machine the Query Manager will be installed
on.
+Install it using the following command:
+
+```
+yum install -y rya.streams.query-manager-3.2.12-incubating.noarch.rpm
+```
+
+It will install the program to **/opt/rya-streams-query-manager-3.2.12**.
Follow
+the directions that are in the README.txt file within that directory to
finish
+configuration.
+
+### Step 3: Register a Streams Query with Rya Streams ###
+
+Use the Rya Streams Client to register the following Streams Query:
+
+```
+SELECT *
+WHERE {
+ ?person <urn:talksTo> ?employee .
+ ?employee <urn:worksAt> ?business
+}
+```
+We assume Kafka is running on the local machine using the standard Kafka
port.
+Issue the following command:
+
+```
+java -jar rya.streams.client-3.2.12-incubating-SNAPSHOT-shaded.jar
add-query \
+ -i localhost -p 9092 -r rya-streams-quick-start -a true \
+ -q "SELECT * WHERE { ?person <urn:talksTo> ?employee .?employee
<urn:worksAt> ?business }"
+```
+The Query Manager should eventually see that this query was registered and
+start a Rya Streams job that will begin processing it using any Visibility
+Statements that have been loaded into Rya Streams. If no results are
observed
+in step 6 , then verify the Query Manager is working properly. It's logs
can be
+found in **/opt/rya-streams-query-manager-3.2.12/logs**.
+
+### Step 4: Start watching for results ###
+
+We need to fetch the Query ID of the query we want to watch. This can be
looked
+up by issuing the following command:
+
+```
+java -jar rya.streams.client-3.2.12-incubating-SNAPSHOT-shaded.jar
list-queries \
+ -i localhost -p 9092 -r rya-streams-quick-start"
+```
+
+The client will print something that looks like this:
+
+```
+Queries in Rya Streams:
+---------------------------------------------------------
+ID: 8dd689ee-9d16-4aa7-91c0-667cdb3ed81a Is Active: true Query:
SELECT * WHERE { ?person <urn:talksTo> ?employee .?employee <urn:worksAt>
?business }
+```
+
+Now we know that if we want to reference the query we registered is step
3, we
+need to use the Query ID **8dd689ee-9d16-4aa7-91c0-667cdb3ed81a**. Start
+watching the Stream Query's output using the following command:
+
+```
+java -jar rya.streams.client-3.2.12-incubating-SNAPSHOT-shaded.jar
stream-results \
+ -i localhost -p 9092 -r rya-streams-quick-start" -q
8dd689ee-9d16-4aa7-91c0-667cdb3ed81a
+```
+
+The command will not finish. The console will print results once they are
produced
+by the Rya Streams job that is processing the query. Results will appear
once
+Statements have been loaded.
+
+### Step 5: Load data into the input topic ###
+
+In a new terminal, create a file named **quick-start-data.nt** that
contains
+the following text:
+
+```
+<urn:Bob> <urn:worksAt> <urn:TacoPlace> .
+<urn:Charlie> <urn:worksAt> <urn:BurngerJoint> .
--- End diff --
Done.
---