This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 6558483b1964e4f98adb256608862260a83c253c
Author: Mergebot <[email protected]>
AuthorDate: Thu Aug 24 20:03:11 2017 +0000

    Prepare repository for deployment.
---
 content/documentation/dsls/sql/index.html | 132 +++++++++++++++++++++---------
 1 file changed, 93 insertions(+), 39 deletions(-)

diff --git a/content/documentation/dsls/sql/index.html 
b/content/documentation/dsls/sql/index.html
index 37d6b56..af40420 100644
--- a/content/documentation/dsls/sql/index.html
+++ b/content/documentation/dsls/sql/index.html
@@ -147,9 +147,8 @@
   <li><a href="#functionality">3. Functionality in Beam SQL</a>
     <ul>
       <li><a href="#features">3.1. Supported Features</a></li>
-      <li><a href="#beamsqlrow">3.2. Intro of BeamSqlRow</a></li>
-      <li><a href="#data-type">3.3. Data Types</a></li>
-      <li><a href="#built-in-functions">3.4. built-in SQL functions</a></li>
+      <li><a href="#data-type">3.2. Data Types</a></li>
+      <li><a href="#built-in-functions">3.3. built-in SQL functions</a></li>
     </ul>
   </li>
   <li><a href="#internal-of-sql">4. The Internal of Beam SQL</a></li>
@@ -162,37 +161,109 @@
 </blockquote>
 
 <h1 id="a-nameoverviewa1-overview"><a name="overview"></a>1. Overview</h1>
-<p>SQL is a well-adopted standard to process data with concise syntax. With 
DSL APIs, now <code class="highlighter-rouge">PCollection</code>s can be 
queried with standard SQL statements, like a regular table. The DSL APIs 
leverage <a href="http://calcite.apache.org/";>Apache Calcite</a> to parse and 
optimize SQL queries, then translate into a composite Beam <code 
class="highlighter-rouge">PTransform</code>. In this way, both SQL and normal 
Beam <code class="highlighter-rouge">PTransform</ [...]
+<p>SQL is a well-adopted standard to process data with concise syntax. With 
DSL APIs (currently available only in Java), now <code 
class="highlighter-rouge">PCollection</code>s can be queried with standard SQL 
statements, like a regular table. The DSL APIs leverage <a 
href="http://calcite.apache.org/";>Apache Calcite</a> to parse and optimize SQL 
queries, then translate into a composite Beam <code 
class="highlighter-rouge">PTransform</code>. In this way, both SQL and normal 
Beam <code cla [...]
+
+<p>There are two main pieces to the SQL DSL API:</p>
+
+<ul>
+  <li><a 
href="/documentation/sdks/javadoc/2.1.0/index.html?org/apache/beam/sdk/values/BeamRecord.html">BeamRecord</a>:
 a new data type used to define composite records (i.e., rows) that consist of 
multiple, named columns of primitive data types. All SQL DSL queries must be 
made against collections of type <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code>. Note that <code 
class="highlighter-rouge">BeamRecord</code> itself is not SQL-specific, 
however, and may also be u [...]
+  <li><a 
href="/documentation/sdks/javadoc/2.1.0/index.html?org/apache/beam/sdk/extensions/sql/BeamSql.html">BeamSql</a>:
 the interface for creating <code class="highlighter-rouge">PTransforms</code> 
from SQL queries.</li>
+</ul>
+
+<p>We’ll look at each of these below.</p>
 
 <h1 id="a-nameusagea2-usage-of-dsl-apis"><a name="usage"></a>2. Usage of DSL 
APIs</h1>
-<p><code class="highlighter-rouge">BeamSql</code> is the only interface(with 
two methods <code class="highlighter-rouge">BeamSql.query()</code> and <code 
class="highlighter-rouge">BeamSql.simpleQuery()</code>) for developers. It 
wraps the back-end details of parsing/validation/assembling, and deliver a Beam 
SDK style API that can take either simple TABLE_FILTER queries or complex 
queries containing JOIN/GROUP_BY etc.</p>
 
-<p><em>Note</em>, the two methods are equivalent in functionality, <code 
class="highlighter-rouge">BeamSql.query()</code> applies on a <code 
class="highlighter-rouge">PCollectionTuple</code> with one or many input <code 
class="highlighter-rouge">PCollection</code>s; <code 
class="highlighter-rouge">BeamSql.simpleQuery()</code> is a simplified API 
which applies on single <code class="highlighter-rouge">PCollection</code>.</p>
+<h2 id="beamrecord">BeamRecord</h2>
 
-<p><a 
href="https://github.com/apache/beam/blob/DSL_SQL/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java";>BeamSqlExample</a>
 in code repository shows the usage of both APIs:</p>
+<p>Before applying a SQL query to a <code 
class="highlighter-rouge">PCollection</code>, the data in the collection must 
be in <code class="highlighter-rouge">BeamRecord</code> format. A <code 
class="highlighter-rouge">BeamRecord</code> represents a single, immutable row 
in a Beam SQL <code class="highlighter-rouge">PCollection</code>. The names and 
types of the fields/columns in the record are defined by its associated <a 
href="/documentation/sdks/javadoc/2.1.0/index.html?org/apache/beam [...]
 
-<div class="highlighter-rouge"><pre class="highlight"><code>//Step 1. create a 
source PCollection with Create.of();
-BeamRecordSqlType type = BeamRecordSqlType.create(fieldNames, fieldTypes);
-...
+<p>A <code class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code> can 
be created explicitly or implicitly:</p>
 
-PCollection&lt;BeamRecord&gt; inputTable = PBegin.in(p).apply(Create.of(row...)
-    .withCoder(type.getRecordCoder()));
+<p>Explicitly:</p>
+<ul>
+  <li><strong>From in-memory data</strong> (typically for unit testing). In 
this case, the record type and coder must be specified explicitly:
+    <div class="highlighter-rouge"><pre class="highlight"><code>// Define the 
record type (i.e., schema).
+List&lt;String&gt; fieldNames = Arrays.asList("appId", "description", 
"rowtime");
+List&lt;Integer&gt; fieldTypes = Arrays.asList(Types.INTEGER, Types.VARCHAR, 
Types.TIMESTAMP);
+BeamRecordSqlType appType = BeamRecordSqlType.create(fieldNames, fieldTypes);
+
+// Create a concrete row with that type.
+BeamRecord row = new BeamRecord(nameType, 1, "Some cool app", new Date());
+
+//create a source PCollection containing only that row.
+PCollection&lt;BeamRecord&gt; testApps = PBegin
+    .in(p)
+    .apply(Create.of(row)
+                 .withCoder(nameType.getRecordCoder()));
+</code></pre>
+    </div>
+  </li>
+  <li><strong>From a <code 
class="highlighter-rouge">PCollection&lt;T&gt;</code></strong> where <code 
class="highlighter-rouge">T</code> is not already a <code 
class="highlighter-rouge">BeamRecord</code>, by applying a <code 
class="highlighter-rouge">PTransform</code> that converts input records to 
<code class="highlighter-rouge">BeamRecord</code> format:
+    <div class="highlighter-rouge"><pre class="highlight"><code>// An example 
POJO class.
+class AppPojo {
+  ...
+  public final Integer appId;
+  public final String description;
+  public final Date timestamp;
+}
+
+// Acquire a collection of Pojos somehow.
+PCollection&lt;AppPojo&gt; pojos = ...
+
+// Convert them to BeamRecords with the same schema as defined above via a 
DoFn.
+PCollection&lt;BeamRecord&gt; apps = pojos.apply(
+    ParDo.of(new DoFn&lt;AppPojo, BeamRecord&gt;() {
+      @ProcessElement
+      public void processElement(ProcessContext c) {
+        c.output(new BeamRecord(appType, pojo.appId, pojo.description, 
pojo.timestamp));
+      }
+    }));
+</code></pre>
+    </div>
+  </li>
+</ul>
 
-//Step 2. (Case 1) run a simple SQL query over input PCollection with 
BeamSql.simpleQuery;
-PCollection&lt;BeamRecord&gt; outputStream = inputTable.apply(
-    BeamSql.simpleQuery("select c1, c2, c3 from PCOLLECTION where c1=1"));
+<p>Implicitly:</p>
+<ul>
+  <li><strong>As the result of a <code 
class="highlighter-rouge">BeamSql</code> <code 
class="highlighter-rouge">PTransform</code></strong> applied to a <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code> (details in the 
next section).</li>
+</ul>
 
+<p>Once you have a <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code> in hand, you may 
use the <code class="highlighter-rouge">BeamSql</code> APIs to apply SQL 
queries to it.</p>
 
-//Step 2. (Case 2) run the query with BeamSql.query over result PCollection of 
(case 1);
-PCollection&lt;BeamRecord&gt; outputStream2 =
-    PCollectionTuple.of(new TupleTag&lt;BeamRecord&gt;("CASE1_RESULT"), 
outputStream)
-        .apply(BeamSql.query("select c2, sum(c3) from CASE1_RESULT group by 
c2"));
+<h2 id="beamsql">BeamSql</h2>
+
+<p><code class="highlighter-rouge">BeamSql</code> provides two methods for 
generating a <code class="highlighter-rouge">PTransform</code> from a SQL 
query, both of which are equivalent except for the number of inputs they 
support:</p>
+
+<ul>
+  <li><code class="highlighter-rouge">BeamSql.query()</code>, which may be 
applied to a single <code class="highlighter-rouge">PCollection</code>. The 
input collection must be referenced via the table name <code 
class="highlighter-rouge">PCOLLECTION</code> in the query:
+    <div class="highlighter-rouge"><pre 
class="highlight"><code>PCollection&lt;BeamRecord&gt; filteredNames = 
testApps.apply(
+    BeamSql.query("SELECT appId, description, rowtime FROM PCOLLECTION WHERE 
id=1"));
 </code></pre>
-</div>
+    </div>
+  </li>
+  <li><code class="highlighter-rouge">BeamSql.queryMulti()</code>, which may 
be applied to a <code class="highlighter-rouge">PCollectionTuple</code> 
containing one or more tagged <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code>s. The tuple tag 
for each <code class="highlighter-rouge">PCollection</code> in the tuple 
defines the table name that may used to query it. Note that table names are 
bound to the specific <code class="highlighter-rouge">PCollectionTuple</code>,  
[...]
+    <div class="highlighter-rouge"><pre class="highlight"><code>// Create a 
reviews PCollection to join to our apps PCollection.
+BeamRecordSqlType reviewType = BeamRecordSqlType.create(
+  Arrays.asList("appId", "reviewerId", "rating", "rowtime"),
+  Arrays.asList(Types.INTEGER, Types.INTEGER, Types.FLOAT, Types.TIMESTAMP));
+PCollection&lt;BeamRecord&gt; reviews = ... [records w/ reviewType schema] ...
+
+// Compute the # of reviews and average rating per app via a JOIN.
+PCollectionTuple namesAndFoods = PCollectionTuple.of(
+    new TupleTag&lt;BeamRecord&gt;("Apps"), apps),
+    new TupleTag&lt;BeamRecord&gt;("Reviews"), reviews));
+PCollection&lt;BeamRecord&gt; output = namesAndFoods.apply(
+    BeamSql.queryMulti("SELECT Names.appId, COUNT(Reviews.rating), 
AVG(Reviews.rating)
+                        FROM Apps INNER JOIN Reviews ON Apps.appId == 
Reviews.appId"));
+</code></pre>
+    </div>
+  </li>
+</ul>
 
-<p>In Step 1, a <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code> is prepared as 
the source dataset. The work to generate a queriable <code 
class="highlighter-rouge">PCollection&lt;BeamRecord&gt;</code> is beyond the 
scope of Beam SQL DSL.</p>
+<p>Both methods wrap the back-end details of parsing/validation/assembling, 
and deliver a Beam SDK style API that can express simple TABLE_FILTER queries 
up to complex queries containing JOIN/GROUP_BY etc.</p>
 
-<p>Step 2(Case 1) shows the usage to run a query with <code 
class="highlighter-rouge">BeamSql.simpleQuery()</code>, be aware that the input 
<code class="highlighter-rouge">PCollection</code> is named with a fixed table 
name <strong>PCOLLECTION</strong>. Step 2(Case 2) is another example to run a 
query with <code class="highlighter-rouge">BeamSql.query()</code>. A Table name 
is specified when adding <code class="highlighter-rouge">PCollection</code> to 
<code class="highlighter-rouge">PCol [...]
+<p><a 
href="https://github.com/apache/beam/blob/DSL_SQL/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java";>BeamSqlExample</a>
 in the code repository shows basic usage of both APIs.</p>
 
 <h1 id="a-namefunctionalitya3-functionality-in-beam-sql"><a 
name="functionality"></a>3. Functionality in Beam SQL</h1>
 <p>Just as the unified model for both bounded and unbounded data in Beam, SQL 
DSL provides the same functionalities for bounded and unbounded <code 
class="highlighter-rouge">PCollection</code> as well.</p>
@@ -334,23 +405,6 @@ PCollection&lt;BeamSqlRow&gt; result =
 </code></pre>
 </div>
 
-<h2 id="a-namebeamsqlrowa32-intro-of-beamrecord"><a name="beamsqlrow"></a>3.2. 
Intro of BeamRecord</h2>
-<p><code class="highlighter-rouge">BeamRecord</code>, described by <code 
class="highlighter-rouge">BeamRecordType</code>(extended <code 
class="highlighter-rouge">BeamRecordSqlType</code> in Beam SQL) and 
encoded/decoded by <code class="highlighter-rouge">BeamRecordCoder</code>, 
represents a single, immutable row in a Beam SQL <code 
class="highlighter-rouge">PCollection</code>. Similar as <em>row</em> in 
relational database, each <code class="highlighter-rouge">BeamRecord</code> 
consists  [...]
-
-<p>A Beam SQL <code class="highlighter-rouge">PCollection</code> can be 
created from an external source, in-memory data or derive from another SQL 
query. For <code class="highlighter-rouge">PCollection</code>s from external 
source or in-memory data, it’s required to specify coder explcitly; <code 
class="highlighter-rouge">PCollection</code> derived from SQL query has the 
coder set already. Below is one example:</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>//define the input 
row format
-List&lt;String&gt; fieldNames = Arrays.asList("c1", "c2", "c3");
-List&lt;Integer&gt; fieldTypes = Arrays.asList(Types.INTEGER, Types.VARCHAR, 
Types.DOUBLE);
-BeamRecordSqlType type = BeamRecordSqlType.create(fieldNames, fieldTypes);
-BeamRecord row = new BeamRecord(type, 1, "row", 1.0);
-
-//create a source PCollection with Create.of();
-PCollection&lt;BeamRecord&gt; inputTable = PBegin.in(p).apply(Create.of(row)
-    .withCoder(type.getRecordCoder()));
-</code></pre>
-</div>
-
 <h2 id="a-namedata-typea33-data-types"><a name="data-type"></a>3.3. Data 
Types</h2>
 <p>Each type in Beam SQL maps to a Java class to holds the value in <code 
class="highlighter-rouge">BeamRecord</code>. The following table lists the 
relation between SQL types and Java classes, which are supported in current 
repository:</p>
 

-- 
To stop receiving notification emails like this one, please contact
"[email protected]" <[email protected]>.

Reply via email to