[
https://issues.apache.org/jira/browse/BEAM-4494?focusedWorklogId=151907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-151907
]
ASF GitHub Bot logged work on BEAM-4494:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Oct/18 22:46
Start Date: 05/Oct/18 22:46
Worklog Time Spent: 10m
Work Description: swegner closed pull request #6574: [BEAM-4494] Migrate
recent website changes from beam-site to beam
URL: https://github.com/apache/beam/pull/6574
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/website/.github/PULL_REQUEST_TEMPLATE.md
b/website/.github/PULL_REQUEST_TEMPLATE.md
index 78623592142..ffb5c426404 100644
--- a/website/.github/PULL_REQUEST_TEMPLATE.md
+++ b/website/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,3 +1,10 @@
+**Deprecation notice:** The website is being migrated to
+https://github.com/apache/beam/tree/master/website
+
+Please create new pull requests against the above repo.
+
+---
+
*Please* add a meaningful description for your change here.
Once your pull request has been opened and assigned a number, please edit the
diff --git a/website/Rakefile b/website/Rakefile
index a64cb046502..00ee3c6023c 100644
--- a/website/Rakefile
+++ b/website/Rakefile
@@ -16,7 +16,8 @@ task :test do
/ai.google/, # https://issues.apache.org/jira/browse/INFRA-16527
/globenewswire.com/, # https://issues.apache.org/jira/browse/BEAM-5518
/www.se-radio.net/, # BEAM-5611: Can fail with rate limit HTTP 508
error
- /beam.apache.org\/releases/ # BEAM-4499 remove once publishing is
migrated
+ /beam.apache.org\/releases/, # BEAM-4499 remove once publishing is
migrated
+ /atrato.io/ # BEAM-5665 atrato.io seems to be down
],
:parallel => { :in_processes => 4 },
}).run
diff --git a/website/_config.yml b/website/_config.yml
index 4a421474a92..50826fa5341 100644
--- a/website/_config.yml
+++ b/website/_config.yml
@@ -60,7 +60,7 @@ kramdown:
toc_levels: 2..6
# The most recent release of Beam.
-release_latest: 2.6.0
+release_latest: 2.7.0
# Plugins are configured in the Gemfile.
diff --git a/website/src/_data/authors.yml b/website/src/_data/authors.yml
index 6d34750a0b0..1950caeb9fb 100644
--- a/website/src/_data/authors.yml
+++ b/website/src/_data/authors.yml
@@ -18,6 +18,9 @@ aljoscha:
altay:
name: Ahmet Altay
email: [email protected]
+ccy:
+ name: Charles Chen
+ email: [email protected]
davor:
name: Davor Bonaci
email: [email protected]
diff --git a/website/src/_includes/section-menu/sdks.html
b/website/src/_includes/section-menu/sdks.html
index 61e5f0cf84e..e9a661ab96f 100644
--- a/website/src/_includes/section-menu/sdks.html
+++ b/website/src/_includes/section-menu/sdks.html
@@ -64,7 +64,7 @@
<ul class="section-nav-list">
<li><a href="{{ site.baseurl
}}/documentation/dsls/sql/data-types/">Data types</a></li>
<li><a href="{{ site.baseurl
}}/documentation/dsls/sql/lexical/">Lexical structure</a></li>
- <li><a href="{{ site.baseurl
}}/documentation/dsls/sql/create-table/">CREATE TABLE</a></li>
+ <li><a href="{{ site.baseurl
}}/documentation/dsls/sql/create-external-table/">CREATE EXTERNAL TABLE</a></li>
<li><a href="{{ site.baseurl
}}/documentation/dsls/sql/select/">SELECT</a></li>
<li><a href="{{ site.baseurl
}}/documentation/dsls/sql/windowing-and-triggering/">Windowing &
Triggering</a></li>
<li><a href="{{ site.baseurl
}}/documentation/dsls/sql/joins/">Joins</a></li>
diff --git a/website/src/_posts/2018-06-26-beam-2.5.0.md
b/website/src/_posts/2018-06-26-beam-2.5.0.md
index 9ee57a526a6..fe6d3baf1b7 100644
--- a/website/src/_posts/2018-06-26-beam-2.5.0.md
+++ b/website/src/_posts/2018-06-26-beam-2.5.0.md
@@ -28,7 +28,7 @@ please check the detailed release notes.
# New Features / Improvements
## Go SDK support
-The Go SDK has been officially accepted into the project, after an incubation
period and community effort. Go pipelines run on Dataflow runner. More details
are [here](https://beam.apache.org/documentation/sdks/go/).
+The Go SDK has been officially accepted into the project, after an incubation
period and community effort. Go pipelines run on Dataflow runner. More details
are [here]({{ site.baseurl }}/documentation/sdks/go/).
## Parquet support
Support for Apache Parquet format was added. It uses Parquet 1.10 release
which, thanks to AvroParquerWriter's API changes, allows FileIO.Sink
implementation.
diff --git a/website/src/_posts/2018-10-03-beam-2.7.0.md
b/website/src/_posts/2018-10-03-beam-2.7.0.md
new file mode 100644
index 00000000000..c515de63c5c
--- /dev/null
+++ b/website/src/_posts/2018-10-03-beam-2.7.0.md
@@ -0,0 +1,76 @@
+---
+layout: post
+title: "Apache Beam 2.7.0"
+date: 2018-10-03 00:00:01 -0800
+excerpt_separator: <!--more-->
+categories: blog
+authors:
+ - ccy
+
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+We are happy to present the new 2.7.0 release of Beam. This release includes
both improvements and new functionality.
+See the [download page]({{ site.baseurl
}}/get-started/downloads/#270-2018-10-02) for this release.<!--more-->
+For more information on changes in 2.7.0, check out the
+[detailed release
notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343654).
+
+## New Features / Improvements
+
+### New I/Os
+
+* KuduIO
+* Amazon SNS sink
+* Amazon SqsIO
+
+### Dependency Upgrades
+
+* Apache Calcite dependency upgraded to 1.17.0
+* Apache Derby dependency upgraded to 10.14.2.0
+* Apache HTTP components upgraded (see release notes).
+
+### Portability
+
+* Experimental support for Python on local Flink runner for simple
+examples, see latest information here:
+{{ site.baseurl }}/contribute/portability/#status.
+
+## Miscellaneous Fixes
+
+### I/Os
+
+* KinesisIO, fixed dependency issue
+
+## List of Contributors
+
+According to git shortlog, the following 72 people contributed
+to the 2.7.0 release. Thank you to all contributors!
+
+Ahmet Altay, Alan Myrvold, Alexey Romanenko, Aljoscha Krettek,
+Andrew Pilloud, Ankit Jhalaria, Ankur Goenka, Anton Kedin, Boyuan
+Zhang, Carl McGraw, Carlos Alonso, cclauss, Chamikara Jayalath,
+Charles Chen, Cory Brzycki, Daniel Oliveira, Dariusz Aniszewski,
+devinduan, Eric Beach, Etienne Chauchot, Eugene Kirpichov, Garrett
+Jones, Gene Peters, Gleb Kanterov, Henning Rohde, Henry Suryawirawan,
+Holden Karau, Huygaa Batsaikhan, Ismaël Mejía, Jason Kuster, Jean-
+Baptiste Onofré, Joachim van der Herten, Jozef Vilcek, jxlewis, Kai
+Jiang, Katarzyna Kucharczyk, Kenn Knowles, Krzysztof Trubalski, Kyle
+Winkelman, Leen Toelen, Luis Enrique Ortíz Ramirez, Lukasz Cwik,
+Łukasz Gajowy, Luke Cwik, Mark Liu, Matthias Feys, Maximilian Michels,
+Melissa Pashniak, Mikhail Gryzykhin, Mikhail Sokolov, mingmxu, Norbert
+Chen, Pablo Estrada, Prateek Chanda, Raghu Angadi, Ravi Pathak, Reuven
+Lax, Robert Bradshaw, Robert Burke, Rui Wang, Ryan Williams, Sindy Li,
+Thomas Weise, Tim Robertson, Tormod Haavi, Udi Meiri, Vaclav Plajt,
+Valentyn Tymofieiev, xiliu, XuMingmin, Yifan Zou, Yueyang Qiu.
diff --git a/website/src/contribute/dependencies.md
b/website/src/contribute/dependencies.md
index 99ec6e690a1..c8c4ccc9f10 100644
--- a/website/src/contribute/dependencies.md
+++ b/website/src/contribute/dependencies.md
@@ -52,16 +52,17 @@ In addition to this, Beam community members might identify
other critical depend
These kind of urgently required upgrades might not get automatically picked up
by the Jenkins job for few months. So Beam community has to act to identify
such issues and perform upgrades early.
-## JIRA Automation
+## JIRA Issue Automation
In order to track the dependency upgrade process, JIRA tickets will be created
per significant outdated dependency based on the report. A bot named *Beam Jira
Bot* was created for managing JIRA issues. Beam community agrees on the
following policies that creates and updates issues.
-* Issues will be named as "Beam Dependency Update Request: <dep_name>
<dep_newest_version>".
-* Issues will be created under the component *"dependencies"*
-* Issues will be assigned to the primary owner of the dependencies, who are
mentioned in the dependency ownership files. ([Java Dependency
Owners](https://github.com/apache/beam/blob/master/ownership/JAVA_DEPENDENCY_OWNERS.yaml)
and [Python Dependency
Owners](https://github.com/apache/beam/blob/master/ownership/PYTHON_DEPENDENCY_OWNERS.yaml))
-* If more than one owners found for a dependency, the first owner will be
picked as the primary owner, the others will be pinged in the issue's
description.
-* If no owners found, leave the assignee empty. The component lead is
responsible for triaging the issue.
-* Avoid creating duplicate issues. Updating the descriptions of the open
issues created by the previous dependency check.
-* The dependency sometimes is not able to be upgraded, the issue should be
closed as *"won't fix"*. And, the bot should avoid recreating issues with
"won't fix".
+* Title (summary) of the issues will be in the format "Beam Dependency Update
Request: <dep_name>" where <dep_name> is the dependency artifact name.
+* Issues will be created under the component *"dependencies"*.
+* Owners of dependencies will be notified by tagging the corresponding JIRA
IDs mentioned in the ownership files in the issue description. See [Java
Dependency
Owners](https://github.com/apache/beam/blob/master/ownership/JAVA_DEPENDENCY_OWNERS.yaml)
and [Python Dependency
Owners](https://github.com/apache/beam/blob/master/ownership/PYTHON_DEPENDENCY_OWNERS.yaml)
for current owners for Java SDK and Python SDK dependencies respectively.
+* Automated tool will not create duplicate issues for the same dependency.
Instead the tool will look for an existing JIRA when one has to be created for
a given dependency and description of the JIRA will be updated with latest
information, for example, current version of the dependency.
+* If a Beam community member determines that a given dependency should not be
upgraded the corresponding JIRA issue can be closed with a fix version
specified.
+* Automated tool will reopen a JIRA for a given dependency when one of
following conditions is met:
+ * Next SDK release is for a fix version mentioned in the JIRA.
+ * Six months __and__ three or more minor releases have passed since the JIRA
was closed.
## Upgrading identified outdated dependencies
@@ -91,4 +92,4 @@ __Dependencies of Java SDK components that may cause issues
to other components
## Dependency updates and backwards compatibility
-Beam releases [adhere to](https://beam.apache.org/get-started/downloads/)
semantic versioning. Hence, community members should take care when updating
dependencies. Minor version updates to dependencies should be backwards
compatible in most cases. Some updates to dependencies though may result in
backwards incompatible API or functionality changes to Beam. PR reviewers and
committers should take care to detect any dependency updates that could
potentially introduce backwards incompatible changes to Beam before merging and
PRs that update dependencies should include a statement regarding this
verification in the form of a PR comment. Dependency updates that result in
backwards incompatible changes to non-experimental features of Beam should be
held till next major version release of Beam. Any exceptions to this policy
should only occur in extreme cases (for example, due to a security
vulnerability of an existing dependency that is only fixed in a subsequent
major version) and should be discussed in the Beam dev list. Note that
backwards incompatible changes to experimental features may be introduced in a
minor version release.
+Beam releases [adhere to]({{ site.baseurl }}/get-started/downloads/) semantic
versioning. Hence, community members should take care when updating
dependencies. Minor version updates to dependencies should be backwards
compatible in most cases. Some updates to dependencies though may result in
backwards incompatible API or functionality changes to Beam. PR reviewers and
committers should take care to detect any dependency updates that could
potentially introduce backwards incompatible changes to Beam before merging and
PRs that update dependencies should include a statement regarding this
verification in the form of a PR comment. Dependency updates that result in
backwards incompatible changes to non-experimental features of Beam should be
held till next major version release of Beam. Any exceptions to this policy
should only occur in extreme cases (for example, due to a security
vulnerability of an existing dependency that is only fixed in a subsequent
major version) and should be discussed in the Beam dev list. Note that
backwards incompatible changes to experimental features may be introduced in a
minor version release.
diff --git a/website/src/contribute/index.md b/website/src/contribute/index.md
index c061f388964..230c3df2ae1 100644
--- a/website/src/contribute/index.md
+++ b/website/src/contribute/index.md
@@ -348,7 +348,7 @@ We are also working on writing Performance Tests for IOs
and developing a Perfor
- running Performance Tests on runners other than Dataflow and Direct
- improving existing Performance Testing Framework and it's documentation
-See the
[documentation](https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests)
and the [initial
proposal](https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit?usp=sharing)(for
file based tests).
+See the [documentation]({{ site.baseurl
}}/documentation/io/testing/#i-o-transform-integration-tests) and the [initial
proposal](https://docs.google.com/document/d/1dA-5s6OHiP_cz-NRAbwapoKF5MEC1wKps4A5tFbIPKE/edit?usp=sharing)(for
file based tests).
If you're willing to help in this area, tag the following people in PRs:
[@chamikaramj](https://github.com/chamikaramj),
[@DariuszAniszewski](https://github.com/dariuszaniszewski),
[@lgajowy](https://github.com/lgajowy), [@szewi](https://github.com/szewi),
[@kkucharc](https://github.com/kkucharc)
diff --git a/website/src/contribute/postcommits-policies.md
b/website/src/contribute/postcommits-policies.md
index b4e6d356f27..8e6af7217ef 100644
--- a/website/src/contribute/postcommits-policies.md
+++ b/website/src/contribute/postcommits-policies.md
@@ -49,8 +49,7 @@ When a post-commit test fails, follow the provided steps for
your situation.
### I found a test failure {#found-failing-test}
-1. Create a [JIRA
issue](https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20test-failures)
- and assign it to yourself.
+1. Create a [JIRA issue](https://s.apache.org/beam-test-failure) and assign
it to yourself.
1. Do high level triage of the failure.
1. [Assign the JIRA issue to a relevant person]({{ site.baseurl
}}/contribute/postcommits-guides/index.html#find_specialist).
diff --git a/website/src/contribute/release-guide.md
b/website/src/contribute/release-guide.md
index d3072c7ca31..c184a72e9db 100644
--- a/website/src/contribute/release-guide.md
+++ b/website/src/contribute/release-guide.md
@@ -196,8 +196,9 @@ When contributors resolve an issue in JIRA, they are
tagging it with a release t
__Attention__: Only PMC has permission to perform this. If you are not a PMC,
please ask for help in dev@ mailing list.
-1. In JIRA, navigate to the [`Beam > Administration >
Versions`](https://issues.apache.org/jira/plugins/servlet/project-config/BEAM/versions).
-1. Add a new release: choose the next minor version number compared to the one
currently underway, select today’s date as the `Start Date`, and choose `Add`.
+1. In JIRA, navigate to [`Beam > Administration >
Versions`](https://issues.apache.org/jira/plugins/servlet/project-config/BEAM/versions).
+1. Add a new release. Choose the next minor version number after the version
currently underway, select the release cut date (today’s date) as the `Start
Date`, and choose `Add`.
+1. At the end of the release, go to the same page and mark the recently
released version as released. Use the `...` menu and choose `Release`.
### Triage release-blocking issues in JIRA
diff --git a/website/src/contribute/runner-guide.md
b/website/src/contribute/runner-guide.md
index a0aa2a81491..bd70899b18d 100644
--- a/website/src/contribute/runner-guide.md
+++ b/website/src/contribute/runner-guide.md
@@ -566,7 +566,7 @@ collection of log files, or a database table. The
capabilities are:
* timestamps to associate with each element read
* `splitAtFraction` for dynamic splitting to enable work stealing, and other
methods to support it - see the [Beam blog post on dynamic work
-
rebalancing](https://beam.apache.org/blog/2016/05/18/splitAtFraction-method.html)
+ rebalancing]({{ site.baseurl }}/blog/2016/05/18/splitAtFraction-method.html)
The `BoundedSource` does not report a watermark currently. Most of the time,
reading
from a bounded source can be parallelized in ways that result in utterly
out-of-order
diff --git a/website/src/contribute/testing.md
b/website/src/contribute/testing.md
index 2d847a79bb5..7e21c228b52 100644
--- a/website/src/contribute/testing.md
+++ b/website/src/contribute/testing.md
@@ -107,13 +107,13 @@ NeedsRunner is a category of tests that require a Beam
runner. To run
NeedsRunner tests:
```
-$ ./gradlew :runners:direct-java:needsRunnerTests
+$ ./gradlew beam-runners-direct-java:needsRunnerTests
```
To run a single NeedsRunner test use the `test` property, e.g.
```
-$ ./gradlew :runners:direct-java:needsRunnerTests --tests
org.apache.beam.sdk.transforms.MapElementsTest.testMapBasic
+$ ./gradlew beam-runners-direct-java:needsRunnerTests --tests
org.apache.beam.sdk.transforms.MapElementsTest.testMapBasic
```
will run the `MapElementsTest.testMapBasic()` test.
@@ -123,7 +123,7 @@ NeedsRunner tests in modules that are not required to build
runners (e.g.
command:
```
-$ ./gradlew sdks:java:io:google-cloud-platform:test --tests
org.apache.beam.sdk.io.gcp.spanner.SpannerIOWriteTest
+$ ./gradlew beam-sdks-java-io-google-cloud-platform:test --tests
org.apache.beam.sdk.io.gcp.spanner.SpannerIOWriteTest
```
### ValidatesRunner
diff --git a/website/src/documentation/dsls/sql/create-table.md
b/website/src/documentation/dsls/sql/create-external-table.md
similarity index 85%
rename from website/src/documentation/dsls/sql/create-table.md
rename to website/src/documentation/dsls/sql/create-external-table.md
index e481fe81766..a6f6e323495 100644
--- a/website/src/documentation/dsls/sql/create-table.md
+++ b/website/src/documentation/dsls/sql/create-external-table.md
@@ -1,9 +1,11 @@
---
layout: section
-title: "Beam SQL: CREATE TABLE Statement"
+title: "Beam SQL: CREATE EXTERNAL TABLE Statement"
section_menu: section-menu/sdks.html
-permalink: /documentation/dsls/sql/create-table/
-redirect_from: /documentation/dsls/sql/statements/create-table/
+permalink: /documentation/dsls/sql/create-external-table/
+redirect_from:
+ - /documentation/dsls/sql/statements/create-table/
+ - /documentation/dsls/sql/create-table/
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
@@ -19,20 +21,20 @@ See the License for the specific language governing
permissions and
limitations under the License.
-->
-# CREATE TABLE
+# CREATE EXTERNAL TABLE
-Beam SQL's `CREATE TABLE` statement registers a virtual table that maps to an
-[external storage system](https://beam.apache.org/documentation/io/built-in/).
-For some storage systems, `CREATE TABLE` does not create a physical table until
+Beam SQL's `CREATE EXTERNAL TABLE` statement registers a virtual table that
maps to an
+[external storage system]({{ site.baseurl }}/documentation/io/built-in/).
+For some storage systems, `CREATE EXTERNAL TABLE` does not create a physical
table until
a write occurs. After the physical table exists, you can access the table with
the `SELECT`, `JOIN`, and `INSERT INTO` statements.
-The `CREATE TABLE` statement includes a schema and extended clauses.
+The `CREATE EXTERNAL TABLE` statement includes a schema and extended clauses.
## Syntax
```
-CREATE TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*)
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [,
tableElement ]*)
TYPE type
[LOCATION location]
[TBLPROPERTIES tblProperties]
@@ -48,7 +50,7 @@ tableElement: columnName fieldType [ NOT NULL ]
ignores the statement instead of returning an error.
* `tableName`: The case sensitive name of the table to create and register,
specified as an
-
[Identifier](https://beam.apache.org/documentation/dsls/sql/lexical/#identifiers).
+ [Identifier]({{ site.baseurl
}}/documentation/dsls/sql/lexical/#identifiers).
The table name does not need to match the name in the underlying data
storage system.
* `tableElement`: `columnName` `fieldType` `[ NOT NULL ]`
@@ -63,7 +65,7 @@ tableElement: columnName fieldType [ NOT NULL ]
* `ROW<tableElement [, tableElement ]*>`
* `NOT NULL`: Optional. Indicates that the column is not nullable.
* `type`: The I/O transform that backs the virtual table, specified as an
-
[Identifier](https://beam.apache.org/documentation/dsls/sql/lexical/#identifiers)
+ [Identifier]({{ site.baseurl
}}/documentation/dsls/sql/lexical/#identifiers)
with one of the following values:
* `bigquery`
* `pubsub`
@@ -71,11 +73,11 @@ tableElement: columnName fieldType [ NOT NULL ]
* `text`
* `location`: The I/O specific location of the underlying table, specified as
a [String
-
Literal](https://beam.apache.org/documentation/dsls/sql/lexical/#string-literals).
+ Literal]({{ site.baseurl
}}/documentation/dsls/sql/lexical/#string-literals).
See the I/O specific sections for `location` format requirements.
* `tblProperties`: The I/O specific quoted key value JSON object with extra
configuration, specified as a [String
-
Literal](https://beam.apache.org/documentation/dsls/sql/lexical/#string-literals).
+ Literal]({{ site.baseurl
}}/documentation/dsls/sql/lexical/#string-literals).
See the I/O specific sections for `tblProperties` format requirements.
## BigQuery
@@ -83,7 +85,7 @@ tableElement: columnName fieldType [ NOT NULL ]
### Syntax
```
-CREATE TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*)
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [,
tableElement ]*)
TYPE bigquery
LOCATION '[PROJECT_ID]:[DATASET].[TABLE]'
```
@@ -183,7 +185,7 @@ as follows:
### Example
```
-CREATE TABLE users (id INTEGER, username VARCHAR)
+CREATE EXTERNAL TABLE users (id INTEGER, username VARCHAR)
TYPE bigquery
LOCATION 'testing-integration:apache.users'
```
@@ -193,7 +195,7 @@ LOCATION 'testing-integration:apache.users'
### Syntax
```
-CREATE TABLE [ IF NOT EXISTS ] tableName
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName
(
event_timestamp TIMESTAMP,
attributes MAP<VARCHAR, VARCHAR>,
@@ -263,7 +265,7 @@ declare a special set of columns, as shown below.
### Example
```
-CREATE TABLE locations (event_timestamp TIMESTAMP, attributes MAP<VARCHAR,
VARCHAR>, payload ROW<id INTEGER, location VARCHAR>)
+CREATE EXTERNAL TABLE locations (event_timestamp TIMESTAMP, attributes
MAP<VARCHAR, VARCHAR>, payload ROW<id INTEGER, location VARCHAR>)
TYPE pubsub
LOCATION 'projects/testing-integration/topics/user-location'
```
@@ -275,7 +277,7 @@ KafkaIO is experimental in Beam SQL.
### Syntax
```
-CREATE TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*)
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [,
tableElement ]*)
TYPE kafka
LOCATION 'kafka://localhost:2181/brokers'
TBLPROPERTIES '{"bootstrap.servers":"localhost:9092", "topics": ["topic1",
"topic2"]}'
@@ -313,7 +315,7 @@ access the same underlying data.
### Syntax
```
-CREATE TABLE [ IF NOT EXISTS ] tableName (tableElement [, tableElement ]*)
+CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tableName (tableElement [,
tableElement ]*)
TYPE text
LOCATION '/home/admin/orders'
TBLPROPERTIES '{"format: "Excel"}'
@@ -345,7 +347,7 @@ Only simple types are supported.
### Example
```
-CREATE TABLE orders (id INTEGER, price INTEGER)
+CREATE EXTERNAL TABLE orders (id INTEGER, price INTEGER)
TYPE text
LOCATION '/home/admin/orders'
```
diff --git a/website/src/documentation/dsls/sql/select.md
b/website/src/documentation/dsls/sql/select.md
index 24bd728b98c..f3a135f57a7 100644
--- a/website/src/documentation/dsls/sql/select.md
+++ b/website/src/documentation/dsls/sql/select.md
@@ -32,59 +32,684 @@ batch/streaming model:
- [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins)
- [Windowing & Triggering]({{
site.baseurl}}/documentation/dsls/sql/windowing-and-triggering/)
-Below is a curated grammar of the supported syntax in Beam SQL
+Query statements scan one or more tables or expressions and return the computed
+result rows. This topic describes the syntax for SQL queries in Beam.
+## SQL Syntax
+
+ query_statement:
+ [ WITH with_query_name AS ( query_expr ) [, ...] ]
+ query_expr
+
+ query_expr:
+ { select | ( query_expr ) | query_expr set_op query_expr }
+ [ LIMIT count [ OFFSET skip_rows ] ]
+
+ select:
+ SELECT [{ ALL | DISTINCT }]
+ { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
+ [ REPLACE ( expression [ AS ] column_name [, ...] ) ]
+ | expression [ [ AS ] alias ] } [, ...]
+ [ FROM from_item [, ...] ]
+ [ WHERE bool_expression ]
+ [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ]
+ [ HAVING bool_expression ]
+
+ set_op:
+ UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT
+
+ from_item: {
+ table_name [ [ AS ] alias ] |
+ join |
+ ( query_expr ) [ [ AS ] alias ]
+ with_query_name [ [ AS ] alias ]
+ }
+
+ join:
+ from_item [ join_type ] JOIN from_item
+ [ { ON bool_expression | USING ( join_column [, ...] ) } ]
+
+ join_type:
+ { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
+
+Notation:
+
+- Square brackets "\[ \]" indicate optional clauses.
+- Parentheses "( )" indicate literal parentheses.
+- The vertical bar "|" indicates a logical OR.
+- Curly braces "{ }" enclose a set of options.
+- A comma followed by an ellipsis within square brackets "\[, ... \]"
+ indicates that the preceding item can repeat in a comma-separated list.
+
+## SELECT list
+
+Syntax:
+
+ SELECT [{ ALL | DISTINCT }]
+ { [ expression. ]*
+ | expression [ [ AS ] alias ] } [, ...]
+
+The `SELECT` list defines the columns that the query will return. Expressions
in
+the `SELECT` list can refer to columns in any of the `from_item`s in its
+corresponding `FROM` clause.
+
+Each item in the `SELECT` list is one of:
+
+- \*
+- `expression`
+- `expression.*`
+
+### SELECT \*
+
+`SELECT *`, often referred to as *select star*, produces one output column for
+each column that is visible after executing the full query.
+
+```
+SELECT * FROM (SELECT 'apple' AS fruit, 'carrot' AS vegetable);
+
++-------+-----------+
+| fruit | vegetable |
++-------+-----------+
+| apple | carrot |
++-------+-----------+
+```
+
+### SELECT `expression`
+
+Items in a `SELECT` list can be expressions. These expressions evaluate to a
+single value and produce one output column, with an optional explicit `alias`.
+
+If the expression does not have an explicit alias, it receives an implicit
alias
+according to the rules for [implicit aliases](#implicit-aliases), if possible.
+Otherwise, the column is anonymous and you cannot refer to it by name elsewhere
+in the query.
+
+### SELECT `expression.*` {#select-expression_1}
+
+An item in a `SELECT` list can also take the form of `expression.*`. This
+produces one output column for each column or top-level field of `expression`.
+The expression must be a table alias.
+
+The following query produces one output column for each column in the table
+`groceries`, aliased as `g`.
+
+```
+WITH groceries AS
+ (SELECT 'milk' AS dairy,
+ 'eggs' AS protein,
+ 'bread' AS grain)
+SELECT g.*
+FROM groceries AS g;
+
++-------+---------+-------+
+| dairy | protein | grain |
++-------+---------+-------+
+| milk | eggs | bread |
++-------+---------+-------+
+```
+
+### SELECT modifiers
+
+You can modify the results returned from a `SELECT` query, as follows.
+
+#### SELECT DISTINCT
+
+A `SELECT DISTINCT` statement discards duplicate rows and returns only the
+remaining rows. `SELECT DISTINCT` cannot return columns of the following types:
+
+- STRUCT
+- ARRAY
+
+#### SELECT ALL
+
+A `SELECT ALL` statement returns all rows, including duplicate rows. `SELECT
+ALL` is the default behavior of `SELECT`.
+
+### Aliases
+
+See [Aliases](#aliases_2) for information on syntax and visibility for
+`SELECT` list aliases.
+
+## FROM clause
+
+The `FROM` clause indicates the table or tables from which to retrieve rows,
and
+specifies how to join those rows together to produce a single stream of rows
for
+processing in the rest of the query.
+
+### Syntax
+
+ from_item: {
+ table_name [ [ AS ] alias ] |
+ join |
+ ( query_expr ) [ [ AS ] alias ] |
+ with_query_name [ [ AS ] alias ]
+ }
+
+#### table\_name
+
+The name (optionally qualified) of an existing table.
+
+ SELECT * FROM Roster;
+ SELECT * FROM beam.Roster;
+
+#### join
+
+See [JOIN Types](#join-types) below and [Joins]({{
site.baseurl}}/documentation/dsls/sql/joins).
+
+#### select {#select_1}
+
+`( select ) [ [ AS ] alias ]` is a table [subquery](#subqueries).
+
+#### with\_query\_name
+
+The query names in a `WITH` clause (see [WITH Clause](#with-clause)) act like
+names of temporary tables that you can reference anywhere in the `FROM` clause.
+In the example below, `subQ1` and `subQ2` are `with_query_names`.
+
+Example:
+
+ WITH
+ subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
+ subQ2 AS (SELECT SchoolID FROM subQ1)
+ SELECT DISTINCT * FROM subQ2;
+
+The `WITH` clause hides any permanent tables with the same name for the
duration
+of the query, unless you qualify the table name, e.g. `beam.Roster`.
+
+### Subqueries
+
+A subquery is a query that appears inside another statement, and is written
+inside parentheses. These are also referred to as "sub-SELECTs" or "nested
+SELECTs". The full `SELECT` syntax is valid in subqueries.
+
+There are two types of subquery:
+
+- Expression Subqueries
+ which you can use in a query wherever expressions are valid. Expression
+ subqueries return a single value.
+- Table subqueries, which you can use only in a `FROM` clause. The outer
query
+ treats the result of the subquery as a table.
+
+Note that there must be parentheses around both types of subqueries.
+
+Example:
+
+```
+SELECT AVG ( PointsScored )
+FROM
+( SELECT PointsScored
+ FROM Stats
+ WHERE SchoolID = 77 )
+```
+
+Optionally, a table subquery can have an alias.
+
+Example:
+
+```
+SELECT r.LastName
+FROM
+( SELECT * FROM Roster) AS r;
+```
+
+### Aliases {#aliases_1}
+
+See [Aliases](#aliases_2) for information on syntax and visibility for
+`FROM` clause aliases.
+
+## JOIN types
+
+Also see [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins).
+
+### Syntax {#syntax_1}
+
+ join:
+ from_item [ join_type ] JOIN from_item
+ [ ON bool_expression | USING ( join_column [, ...] ) ]
+
+ join_type:
+ { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
+
+The `JOIN` clause merges two `from_item`s so that the `SELECT` clause can query
+them as one source. The `join_type` and `ON` or `USING` clause (a "join
+condition") specify how to combine and discard rows from the two `from_item`s
to
+form a single source.
+
+All `JOIN` clauses require a `join_type`.
+
+A `JOIN` clause requires a join condition unless one of the following
conditions
+is true:
+
+- `join_type` is `CROSS`.
+- One or both of the `from_item`s is not a table, e.g. an `array_path` or
+ `field_path`.
+
+### \[INNER\] JOIN
+
+An `INNER JOIN`, or simply `JOIN`, effectively calculates the Cartesian product
+of the two `from_item`s and discards all rows that do not meet the join
+condition. "Effectively" means that it is possible to implement an `INNER JOIN`
+without actually calculating the Cartesian product.
+
+### CROSS JOIN
+
+`CROSS JOIN` is generally not yet supported.
+
+### FULL \[OUTER\] JOIN
+
+A `FULL OUTER JOIN` (or simply `FULL JOIN`) returns all fields for all rows in
+both `from_item`s that meet the join condition.
+
+`FULL` indicates that *all rows* from both `from_item`s are returned, even if
+they do not meet the join condition. For streaming jobs, all rows that are
+not late according to default trigger and belonging to the same window
+if there's non-global window applied.
+
+`OUTER` indicates that if a given row from one `from_item` does not join to any
+row in the other `from_item`, the row will return with NULLs for all columns
+from the other `from_item`.
+
+Also see [Joins]({{ site.baseurl}}/documentation/dsls/sql/joins).
+
+### LEFT \[OUTER\] JOIN
+
+The result of a `LEFT OUTER JOIN` (or simply `LEFT JOIN`) for two `from_item`s
+always retains all rows of the left `from_item` in the `JOIN` clause, even if
no
+rows in the right `from_item` satisfy the join predicate.
+
+`LEFT` indicates that all rows from the *left* `from_item` are returned; if a
+given row from the left `from_item` does not join to any row in the *right*
+`from_item`, the row will return with NULLs for all columns from the right
+`from_item`. Rows from the right `from_item` that do not join to any row in the
+left `from_item` are discarded.
+
+### RIGHT \[OUTER\] JOIN
+
+The result of a `RIGHT OUTER JOIN` (or simply `RIGHT JOIN`) is similar and
+symmetric to that of `LEFT OUTER JOIN`.
+
+### ON clause
+
+The `ON` clause contains a `bool_expression`. A combined row (the result of
+joining two rows) meets the join condition if `bool_expression` returns TRUE.
+
+Example:
+
+```
+SELECT * FROM Roster INNER JOIN PlayerStats
+ON Roster.LastName = PlayerStats.LastName;
+```
+
+### USING clause
+
+The `USING` clause requires a `column_list` of one or more columns which occur
+in both input tables. It performs an equality comparison on that column, and
the
+rows meet the join condition if the equality comparison returns TRUE.
+
+In most cases, a statement with the `USING` keyword is equivalent to using the
+`ON` keyword. For example, the statement:
+
+```
+SELECT FirstName
+FROM Roster INNER JOIN PlayerStats
+USING (LastName);
+```
+
+is equivalent to:
+
+```
+SELECT FirstName
+FROM Roster INNER JOIN PlayerStats
+ON Roster.LastName = PlayerStats.LastName;
+```
+
+The results from queries with `USING` do differ from queries that use `ON` when
+you use `SELECT *`. To illustrate this, consider the query:
+
+```
+SELECT * FROM Roster INNER JOIN PlayerStats
+USING (LastName);
+```
+
+This statement returns the rows from `Roster` and `PlayerStats` where
+`Roster.LastName` is the same as `PlayerStats.LastName`. The results include a
+single `LastName` column.
+
+By contrast, consider the following query:
+
+```
+SELECT * FROM Roster INNER JOIN PlayerStats
+ON Roster.LastName = PlayerStats.LastName;
+```
+
+This statement returns the rows from `Roster` and `PlayerStats` where
+`Roster.LastName` is the same as `PlayerStats.LastName`. The results include
two
+`LastName` columns; one from `Roster` and one from `PlayerStats`.
+
+### Sequences of JOINs
+
+The `FROM` clause can contain multiple `JOIN` clauses in sequence.
+
+Example:
+
+```
+SELECT * FROM a LEFT JOIN b ON TRUE LEFT JOIN c ON TRUE;
+```
+
+where `a`, `b`, and `c` are any `from_item`s. JOINs are bound from left to
+right, but you can insert parentheses to group them in a different order.
+
+## WHERE clause
+
+### Syntax {#syntax_2}
+
+```
+WHERE bool_expression
+```
+
+The `WHERE` clause filters out rows by evaluating each row against
+`bool_expression`, and discards all rows that do not return TRUE (that is, rows
+that return FALSE or NULL).
+
+Example:
+
+```
+SELECT * FROM Roster
+WHERE SchoolID = 52;
+```
+
+The `bool_expression` can contain multiple sub-conditions.
+
+Example:
+
+```
+SELECT * FROM Roster
+WHERE LastName LIKE 'Mc%' OR LastName LIKE 'Mac%';
+```
+
+You cannot reference column aliases from the `SELECT` list in the `WHERE`
+clause.
+
+Expressions in an `INNER JOIN` have an equivalent expression in the `WHERE`
+clause. For example, a query using `INNER` `JOIN` and `ON` has an equivalent
+expression using `CROSS JOIN` and `WHERE`.
+
+Example - this query:
+
+```
+SELECT * FROM Roster INNER JOIN TeamMascot
+ON Roster.SchoolID = TeamMascot.SchoolID;
+```
+
+is equivalent to:
+
+```
+SELECT * FROM Roster CROSS JOIN TeamMascot
+WHERE Roster.SchoolID = TeamMascot.SchoolID;
+```
+
+## GROUP BY clause
+
+Also see [Windowing & Triggering]({{
site.baseurl}}/documentation/dsls/sql/windowing-and-triggering/)
+
+### Syntax {#syntax_3}
+
+ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) }
+
+The `GROUP BY` clause groups together rows in a table with non-distinct values
+for the `expression` in the `GROUP BY` clause. For multiple rows in the source
+table with non-distinct values for `expression`, the `GROUP BY` clause produces
+a single combined row. `GROUP BY` is commonly used when aggregate functions are
+present in the `SELECT` list, or to eliminate redundancy in the output.
+
+Example:
+
+```
+SELECT SUM(PointsScored), LastName
+FROM PlayerStats
+GROUP BY LastName;
+```
+
+## HAVING clause
+
+### Syntax {#syntax_4}
+
+```
+HAVING bool_expression
+```
+
+The `HAVING` clause is similar to the `WHERE` clause: it filters out rows that
+do not return TRUE when they are evaluated against the `bool_expression`.
+
+As with the `WHERE` clause, the `bool_expression` can be any expression that
+returns a boolean, and can contain multiple sub-conditions.
+
+The `HAVING` clause differs from the `WHERE` clause in that:
+
+- The `HAVING` clause requires `GROUP BY` or aggregation to be present in the
+ query.
+- The `HAVING` clause occurs after `GROUP BY` and aggregation.
+ This means that the `HAVING` clause is evaluated once for every
+ aggregated row in the result set. This differs from the `WHERE` clause,
+ which is evaluated before `GROUP BY` and aggregation.
+
+The `HAVING` clause can reference columns available via the `FROM` clause, as
+well as `SELECT` list aliases. Expressions referenced in the `HAVING` clause
+must either appear in the `GROUP BY` clause or they must be the result of an
+aggregate function:
+
+```
+SELECT LastName
+FROM Roster
+GROUP BY LastName
+HAVING SUM(PointsScored) > 15;
```
-query:
- {
- select
- | query UNION [ ALL ] query
- | query MINUS [ ALL ] query
- | query INTERSECT [ ALL ] query
- }
- [ ORDER BY orderItem [, orderItem ]* LIMIT count [OFFSET offset] ]
-orderItem:
- expression [ ASC | DESC ]
+## Set operators
-select:
- SELECT
- { * | projectItem [, projectItem ]* }
- FROM tableExpression
- [ WHERE booleanExpression ]
- [ GROUP BY { groupItem [, groupItem ]* } ]
- [ HAVING booleanExpression ]
+### Syntax {#syntax_6}
-projectItem:
- expression [ [ AS ] columnAlias ]
- | tableAlias . *
+ UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT
-tableExpression:
- tableReference [, tableReference ]*
- | tableExpression [ ( LEFT | RIGHT ) [ OUTER ] ] JOIN tableExpression [
joinCondition ]
+Set operators combine results from two or more input queries into a single
+result set. You must specify `ALL` or `DISTINCT`; if you specify `ALL`, then
all
+rows are retained. If `DISTINCT` is specified, duplicate rows are discarded.
-booleanExpression:
- expression [ IS NULL | IS NOT NULL ]
- | expression [ > | >= | = | < | <= | <> ] expression
- | booleanExpression [ AND | OR ] booleanExpression
- | NOT booleanExpression
- | '(' booleanExpression ')'
+If a given row R appears exactly m times in the first input query and n times
in
+the second input query (m >= 0, n >= 0):
-joinCondition:
- ON booleanExpression
+- For `UNION ALL`, R appears exactly m + n times in the result.
+- For `UNION DISTINCT`, the `DISTINCT` is computed after the `UNION` is
+ computed, so R appears exactly one time.
+- For `INTERSECT DISTINCT`, the `DISTINCT` is computed after the result above
+ is computed.
+- For `EXCEPT DISTINCT`, row R appears once in the output if m > 0 and
+ n = 0.
+- If there are more than two input queries, the above operations generalize
+ and the output is the same as if the inputs were combined incrementally
from
+ left to right.
-tableReference:
- tableName [ [ AS ] alias ]
+The following rules apply:
-values:
- VALUES expression [, expression ]*
+- For set operations other than `UNION ALL`, all column types must support
+ equality comparison.
+- The input queries on each side of the operator must return the same number
+ of columns.
+- The operators pair the columns returned by each input query according to
the
+ columns' positions in their respective `SELECT` lists. That is, the first
+ column in the first input query is paired with the first column in the
+ second input query.
+- The result set always uses the column names from the first input query.
+- The result set always uses the supertypes of input types in corresponding
+ columns, so paired columns must also have either the same data type or a
+ common supertype.
+- You must use parentheses to separate different set operations; for this
+ purpose, set operations such as `UNION ALL` and `UNION DISTINCT` are
+ different. If the statement only repeats the same set operation,
parentheses
+ are not necessary.
-groupItem:
- expression
- | '(' expression [, expression ]* ')'
- | HOP '(' expression [, expression ]* ')'
- | TUMBLE '(' expression [, expression ]* ')'
- | SESSION '(' expression [, expression ]* ')'
+Examples:
```
+query1 UNION ALL (query2 UNION DISTINCT query3)
+query1 UNION ALL query2 UNION ALL query3
+```
+
+Invalid:
+
+ query1 UNION ALL query2 UNION DISTINCT query3
+ query1 UNION ALL query2 INTERSECT ALL query3; // INVALID.
+
+### UNION
+
+The `UNION` operator combines the result sets of two or more input queries by
+pairing columns from the result set of each query and vertically concatenating
+them.
+
+### INTERSECT
+
+The `INTERSECT` operator returns rows that are found in the result sets of both
+the left and right input queries. Unlike `EXCEPT`, the positioning of the input
+queries (to the left vs. right of the `INTERSECT` operator) does not matter.
+
+### EXCEPT
+
+The `EXCEPT` operator returns rows from the left input query that are not
+present in the right input query.
+
+## LIMIT clause and OFFSET clause
+
+### Syntax {#syntax_7}
+
+```
+LIMIT count [ OFFSET skip_rows ]
+```
+
+`LIMIT` specifies a non-negative `count` of type INTEGER, and no more than
`count`
+rows will be returned. `LIMIT` `0` returns 0 rows. If there is a set operation,
+`LIMIT` is applied after the set operation is evaluated.
+
+`OFFSET` specifies a non-negative `skip_rows` of type INTEGER, and only rows
from
+that offset in the table will be considered.
+
+These clauses accept only literal or parameter values.
+
+The rows that are returned by `LIMIT` and `OFFSET` is unspecified.
+
+## WITH clause
+
+The `WITH` clause contains one or more named subqueries which execute every
time
+a subsequent `SELECT` statement references them. Any clause or subquery can
+reference subqueries you define in the `WITH` clause. This includes any
`SELECT`
+statements on either side of a set operator, such as `UNION`.
+
+Example:
+
+```
+WITH subQ1 AS (SELECT SchoolID FROM Roster),
+ subQ2 AS (SELECT OpponentID FROM PlayerStats)
+SELECT * FROM subQ1
+UNION ALL
+SELECT * FROM subQ2;
+```
+
+## Aliases {#aliases_2}
+
+An alias is a temporary name given to a table, column, or expression present in
+a query. You can introduce explicit aliases in the `SELECT` list or `FROM`
+clause, or Beam will infer an implicit alias for some expressions.
+Expressions with neither an explicit nor implicit alias are anonymous and the
+query cannot reference them by name.
+
+### Explicit alias syntax
+
+You can introduce explicit aliases in either the `FROM` clause or the `SELECT`
+list.
+
+In a `FROM` clause, you can introduce explicit aliases for any item, including
+tables, arrays, subqueries, and `UNNEST` clauses, using `[AS] alias`. The `AS`
+keyword is optional.
+
+Example:
+
+```
+SELECT s.FirstName, s2.SongName
+FROM Singers AS s JOIN Songs AS s2 ON s.SingerID = s2.SingerID;
+```
+
+You can introduce explicit aliases for any expression in the `SELECT` list
using
+`[AS] alias`. The `AS` keyword is optional.
+
+Example:
+
+```
+SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
+FROM Singers s;
+```
+
+### Explicit alias visibility
+
+After you introduce an explicit alias in a query, there are restrictions on
+where else in the query you can reference that alias. These restrictions on
+alias visibility are the result of Beam's name scoping rules.
+
+#### FROM clause aliases
+
+Beam processes aliases in a `FROM` clause from left to right, and aliases
+are visible only to subsequent `JOIN` clauses.
+
+### Ambiguous aliases
+
+Beam provides an error if a name is ambiguous, meaning it can resolve to
+more than one unique object.
+
+Examples:
+
+This query contains column names that conflict between tables, since both
+`Singers` and `Songs` have a column named `SingerID`:
+
+```
+SELECT SingerID
+FROM Singers, Songs;
+```
+
+### Implicit aliases
+
+In the `SELECT` list, if there is an expression that does not have an explicit
+alias, Beam assigns an implicit alias according to the following rules.
+There can be multiple columns with the same alias in the `SELECT` list.
+
+- For identifiers, the alias is the identifier. For example, `SELECT abc`
+ implies `AS abc`.
+- For path expressions, the alias is the last identifier in the path. For
+ example, `SELECT abc.def.ghi` implies `AS ghi`.
+- For field access using the "dot" member field access operator, the alias is
+ the field name. For example, `SELECT (struct_function()).fname` implies `AS
+ fname`.
+
+In all other cases, there is no implicit alias, so the column is anonymous and
+cannot be referenced by name. The data from that column will still be returned
+and the displayed query results may have a generated label for that column, but
+the label cannot be used like an alias.
+
+In a `FROM` clause, `from_item`s are not required to have an alias. The
+following rules apply:
+
+If there is an expression that does not have an explicit alias, Beam assigns
+an implicit alias in these cases:
+
+- For identifiers, the alias is the identifier. For example, `FROM abc`
+ implies `AS abc`.
+- For path expressions, the alias is the last identifier in the path. For
+ example, `FROM abc.def.ghi` implies `AS ghi`
+
+Table subqueries do not have implicit aliases.
+
+`FROM UNNEST(x)` does not have an implicit alias.
+> Portions of this page are modifications based on
+>
[work](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax)
+> created and
+> [shared by Google](https://developers.google.com/terms/site-policies)
+> and used according to terms described in the [Creative Commons 3.0
+> Attribution License](http://creativecommons.org/licenses/by/3.0/).
diff --git a/website/src/documentation/dsls/sql/shell.md
b/website/src/documentation/dsls/sql/shell.md
index 6f9c32cf187..4ef670a4158 100644
--- a/website/src/documentation/dsls/sql/shell.md
+++ b/website/src/documentation/dsls/sql/shell.md
@@ -59,17 +59,17 @@ The shell converts the queries into Beam pipelines, runs
them using `DirectRunne
## Declaring Tables
-Before reading data from a source or writing data to a destination, you must
declare a virtual table using the `CREATE TABLE` statement. For example, if you
have a local CSV file `"test-file.csv"` in the current folder, you can create a
table with the following statement:
+Before reading data from a source or writing data to a destination, you must
declare a virtual table using the `CREATE EXTERNAL TABLE` statement. For
example, if you have a local CSV file `"test-file.csv"` in the current folder,
you can create a table with the following statement:
```
-0: BeamSQL> CREATE TABLE csv_file (field1 VARCHAR, field2 INTEGER) TYPE text
LOCATION 'test-file.csv';
+0: BeamSQL> CREATE EXTERNAL TABLE csv_file (field1 VARCHAR, field2 INTEGER)
TYPE text LOCATION 'test-file.csv';
No rows affected (0.042 seconds)
```
-The `CREATE TABLE` statement registers the CSV file as a table in Beam SQL and
specifies the table's schema. This statement does not directly create a
persistent physical table; it only describes the source/sink to Beam SQL so
that you can use the table in the queries that read data and write data.
+The `CREATE EXTERNAL TABLE` statement registers the CSV file as a table in
Beam SQL and specifies the table's schema. This statement does not directly
create a persistent physical table; it only describes the source/sink to Beam
SQL so that you can use the table in the queries that read data and write data.
-_For more information about `CREATE TABLE` syntax and supported table types,
see the [CREATE TABLE reference page]({{ site.baseurl
}}/documentation/dsls/sql/create-table/)._
+_For more information about `CREATE EXTERNAL TABLE` syntax and supported table
types, see the [CREATE EXTERNAL TABLE reference page]({{ site.baseurl
}}/documentation/dsls/sql/create-external-table/)._
## Reading and Writing Data
diff --git a/website/src/documentation/io/testing.md
b/website/src/documentation/io/testing.md
index 7e212bca62f..abfa3a6fb55 100644
--- a/website/src/documentation/io/testing.md
+++ b/website/src/documentation/io/testing.md
@@ -97,7 +97,7 @@ Python:
### Implementing unit tests {#implementing-unit-tests}
-A general guide to writing Unit Tests for all transforms can be found in the
[PTransform Style
Guide](https://beam.apache.org/contribute/ptransform-style-guide/#testing ). We
have expanded on a few important points below.
+A general guide to writing Unit Tests for all transforms can be found in the
[PTransform Style Guide]({{ site.baseurl
}}/contribute/ptransform-style-guide/#testing ). We have expanded on a few
important points below.
If you are using the `Source` API, make sure to exhaustively unit-test your
code. A minor implementation error can lead to data corruption or data loss
(such as skipping or duplicating records) that can be hard for your users to
detect. Also look into using <span
class="language-java">`SourceTestUtils`</span><span
class="language-py">`source_test_utils`</span> - it is a key piece of testing
`Source` implementations.
@@ -164,13 +164,13 @@ Prerequisites:
You won’t need to invoke PerfKit Benchmarker directly. Run `./gradlew
performanceTest` task in project's root directory, passing kubernetes scripts
of your choice (located in .test_infra/kubernetes directory). It will setup
PerfKitBenchmarker for you.
-Example run with the
[Direct](https://beam.apache.org/documentation/runners/direct/) runner:
+Example run with the [Direct]({{ site.baseurl
}}/documentation/runners/direct/) runner:
```
./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py"
-DintegrationTestPipelineOptions='["--numberOfRecords=1000"]'
-DitModule=sdks/java/io/jdbc/
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
-DintegrationTestRunner=direct
```
-Example run with the [Google Cloud
Dataflow](https://beam.apache.org/documentation/runners/dataflow/) runner:
+Example run with the [Google Cloud Dataflow]({{ site.baseurl
}}/documentation/runners/dataflow/) runner:
```
./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py"
-DintegrationTestPipelineOptions='["--numberOfRecords=1000",
"--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]'
-DitModule=sdks/java/io/jdbc/
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
-DintegrationTestRunner=dataflow
```
diff --git a/website/src/get-started/downloads.md
b/website/src/get-started/downloads.md
index 6f66479bb6d..34d02e1931e 100644
--- a/website/src/get-started/downloads.md
+++ b/website/src/get-started/downloads.md
@@ -79,6 +79,13 @@ versions denoted `0.x.y`.
## Releases
+### 2.7.0 (2018-10-02)
+Official [source code
download](https://dist.apache.org/repos/dist/release/beam/2.7.0/apache-beam-2.7.0-source-release.zip)
+[SHA-512](https://dist.apache.org/repos/dist/release/beam/2.7.0/apache-beam-2.7.0-source-release.zip.sha512)
+[signature](https://dist.apache.org/repos/dist/release/beam/2.7.0/apache-beam-2.7.0-source-release.zip.asc).
+
+[Release
notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343654).
+
### 2.6.0 (2018-08-08)
Official [source code
download](https://archive.apache.org/dist/beam/2.6.0/apache-beam-2.6.0-source-release.zip)
[SHA-512](https://archive.apache.org/dist/beam/2.6.0/apache-beam-2.6.0-source-release.zip.sha512)
diff --git a/website/src/get-started/quickstart-java.md
b/website/src/get-started/quickstart-java.md
index cc070fec71f..35bb74d5c6e 100644
--- a/website/src/get-started/quickstart-java.md
+++ b/website/src/get-started/quickstart-java.md
@@ -162,7 +162,7 @@ $ mvn compile exec:java
-Dexec.mainClass=org.apache.beam.examples.WordCount \
{:.runner-dataflow}
```
-Make sure you complete the setup steps at
https://beam.apache.org/documentation/runners/dataflow/#setup
+Make sure you complete the setup steps at {{ site.baseurl
}}/documentation/runners/dataflow/#setup
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
@@ -213,7 +213,7 @@ PS> mvn compile exec:java -D
exec.mainClass=org.apache.beam.examples.WordCount `
{:.runner-dataflow}
```
-Make sure you complete the setup steps at
https://beam.apache.org/documentation/runners/dataflow/#setup
+Make sure you complete the setup steps at {{ site.baseurl
}}/documentation/runners/dataflow/#setup
PS> mvn compile exec:java -D exec.mainClass=org.apache.beam.examples.WordCount
`
-D exec.args="--runner=DataflowRunner --project=<your-gcp-project> `
diff --git a/website/src/get-started/quickstart-py.md
b/website/src/get-started/quickstart-py.md
index d6da9ef7ea4..f3d5d1bc455 100644
--- a/website/src/get-started/quickstart-py.md
+++ b/website/src/get-started/quickstart-py.md
@@ -190,7 +190,7 @@ This runner is not yet available for the Python SDK.
{:.runner-dataflow}
```
# As part of the initial setup, install Google Cloud Platform specific extra
components. Make sure you
-# complete the setup steps at
https://beam.apache.org/documentation/runners/dataflow/#setup
+# complete the setup steps at {{ site.baseurl
}}/documentation/runners/dataflow/#setup
pip install apache-beam[gcp]
python -m apache_beam.examples.wordcount --input
gs://dataflow-samples/shakespeare/kinglear.txt \
--output
gs://<your-gcs-bucket>/counts \
diff --git a/website/src/get-started/wordcount-example.md
b/website/src/get-started/wordcount-example.md
index a1cdbd12662..539d9d7404a 100644
--- a/website/src/get-started/wordcount-example.md
+++ b/website/src/get-started/wordcount-example.md
@@ -1148,7 +1148,7 @@ unbounded datasets. If your dataset has a fixed number of
elements, it is a boun
dataset and all of the data can be processed together. For bounded datasets,
the question to ask is "Do I have all of the data?" If data continuously
arrives (such as an endless stream of game scores in the
-[Mobile gaming
example](https://beam.apache.org/get-started/mobile-gaming-example/),
+[Mobile gaming example]({{ site.baseurl }}/get-started/mobile-gaming-example/),
it is an unbounded dataset. An unbounded dataset is never available for
processing at any one time, so the data must be processed using a streaming
pipeline that runs continuously. The dataset will only be complete up to a
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 151907)
Time Spent: 7h (was: 6h 50m)
> Migrate website source code to apache/beam [website-migration] branch
> ---------------------------------------------------------------------
>
> Key: BEAM-4494
> URL: https://issues.apache.org/jira/browse/BEAM-4494
> Project: Beam
> Issue Type: Sub-task
> Components: website
> Reporter: Scott Wegner
> Assignee: Scott Wegner
> Priority: Major
> Labels: beam-site-automation-reliability
> Fix For: 2.5.0
>
> Time Spent: 7h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)