This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 32ed5ae Publishing website 2020/10/16 00:03:38 at commit 15fd04d
32ed5ae is described below
commit 32ed5ae6a04274caa205e039cbee78b6c6cc9bc7
Author: jenkins <[email protected]>
AuthorDate: Fri Oct 16 00:03:38 2020 +0000
Publishing website 2020/10/16 00:03:38 at commit 15fd04d
---
website/generated-content/documentation/index.xml | 102 +++++++++++++++------
.../documentation/io/built-in/snowflake/index.html | 85 +++++++++++------
.../documentation/runners/twister2/index.html | 2 +-
website/generated-content/sitemap.xml | 2 +-
4 files changed, 137 insertions(+), 54 deletions(-)
diff --git a/website/generated-content/documentation/index.xml
b/website/generated-content/documentation/index.xml
index 3c75142..0d3ea63 100644
--- a/website/generated-content/documentation/index.xml
+++ b/website/generated-content/documentation/index.xml
@@ -1648,7 +1648,7 @@ Where parameters can be:</p>
<p><code>--privateKeyPath</code> Path to Private Key file. Required
for Private Key authentication only.</p>
<p><code>--rawPrivateKey</code> Private Key. Required for Private Key
authentication only.</p>
<p><code>--privateKeyPassphrase</code> Private Key&rsquo;s
passphrase. Required for Private Key authentication only.</p>
-<p><code>--stagingBucketName</code> External bucket path ending with
<code>/</code>. I.e. <code>gs://bucket/</code>. Sub-directories are
allowed.</p>
+<p><code>--stagingBucketName</code> External bucket path ending with
<code>/</code>. I.e. <code>{gs,s3}://bucket/</code>.
Sub-directories are allowed.</p>
<p><code>--storageIntegrationName</code> Storage integration
name</p>
<p><code>--warehouse</code> Warehouse to use. Optional.</p>
<p><code>--database</code> Database name to connect to.
Optional.</p>
@@ -1681,15 +1681,15 @@ Example: --table=TEST_TABLE
Example: --query=‘SELECT column FROM TABLE’
--storageIntegrationName=&lt;SNOWFLAKE STORAGE INTEGRATION NAME&gt;
Example: --storageIntegrationName=my_integration
---stagingBucketName=&lt;GCS BUCKET NAME&gt;
-Example: --stagingBucketName=gs://my_gcp_bucket/
+--stagingBucketName=&lt;GCS OR S3 BUCKET&gt;
+Example: --stagingBucketName={gs,s3}://bucket/
--runner=&lt;DirectRunner/DataflowRunner&gt;
Example: --runner=DataflowRunner
--project=&lt;FOR DATAFLOW RUNNER: GCP PROJECT NAME&gt;
Example: --project=my_project
--tempLocation=&lt;FOR DATAFLOW RUNNER: GCS TEMP LOCATION STARTING
WITH gs://…&gt;
-Example: --tempLocation=gs://my_bucket/temp/
+Example: --tempLocation=gs://bucket/temp/
--region=&lt;FOR DATAFLOW RUNNER: GCP REGION&gt;
Example: --region=us-east-1
--appName=&lt;OPTIONAL: DATAFLOW JOB NAME PREFIX&gt;
@@ -1713,10 +1713,10 @@ Example: --table=TEST_TABLE
Example: --database=TEST_DATABASE
&#34;--storageIntegrationName=&lt;SNOWFLAKE STORAGE INTEGRATION
NAME&gt;&#34;,
Example: --storageIntegrationName=my_integration
-&#34;--stagingBucketName=&lt;GCS BUCKET NAME&gt;&#34;,
-Example: --stagingBucketName=gs://my_gcp_bucket
+&#34;--stagingBucketName=&lt;GCS OR S3 BUCKET&gt;&#34;,
+Example: --stagingBucketName={gs,s3}://bucket
&#34;--externalLocation=&lt;GCS BUCKET URL STARTING WITH
GS://&gt;&#34;,
-Example: --tempLocation=gs://my_bucket/temp/
+Example: --tempLocation=gs://bucket/temp/
]&#39; --no-build-cache</code></pre>
</p>
<p>Where all parameters are starting with “&ndash;”, they are
surrounded with double quotation and separated with comma:</p>
@@ -1786,9 +1786,9 @@ Example: --tempLocation=gs://my_bucket/temp/
</ul>
</li>
<li>
-<p><code>--stagingBucketName=&lt;GCS BUCKET
NAME&gt;</code></p>
+<p><code>--stagingBucketName=&lt;GCS OR S3
BUCKET&gt;</code></p>
<ul>
-<li>Google Cloud Services bucket where the Beam files will be
staged.</li>
+<li>Google Cloud Services bucket or AWS S3 bucket where the Beam files will
be staged.</li>
</ul>
</li>
<li>
@@ -1833,7 +1833,7 @@ Example: --tempLocation=gs://my_bucket/temp/
<p><code>--privateKeyPassphrase</code> Private Key&rsquo;s
passphrase. Required for Private Key authentication only.</p>
</li>
<li>
-<p><code>--stagingBucketName</code> external bucket path ending with
<code>/</code>. I.e. <code>gs://bucket/</code>. Sub-directories are
allowed.</p>
+<p><code>--stagingBucketName</code> external bucket path ending with
<code>/</code>. I.e. <code>{gs,s3}://bucket/</code>.
Sub-directories are allowed.</p>
</li>
<li>
<p><code>--storageIntegrationName</code> Storage integration
name.</p>
@@ -1860,7 +1860,6 @@ Example: --tempLocation=gs://my_bucket/temp/
<p><code>--snowPipe</code> SnowPipe name. Optional.</p>
</li>
</ul>
-<p><strong>Note</strong>: table and query is not in pipeline options
by default, it may be added by <a
href="https://beam.apache.org/documentation/io/built-in/snowflake/#extending-pipeline-options">extending</a>
PipelineOptions.</p>
<p>Currently, SnowflakeIO <strong>doesn&rsquo;t support</strong>
following options at runtime:</p>
<ul>
<li>
@@ -1891,7 +1890,7 @@ Example: --tempLocation=gs://my_bucket/temp/
<span class="n">SnowflakeIO</span><span
class="o">.&lt;</span><span class="n">type</span><span
class="o">&gt;</span><span class="n">write</span><span
class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">toTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">)</span></code></pre></div>
@@ -1907,15 +1906,21 @@ Replace type with the data type of the PCollection
object to write; for example,
</li>
<li>
<p><code>.withStagingBucketName()</code> Accepts a cloud bucket path
ended with slash.
--Example:
<code>.withStagingBucketName(&quot;gs://mybucket/my/dir/&quot;)</code></p>
+-Example:
<code>.withStagingBucketName(&quot;{gs,s3}://bucket/my/dir/&quot;)</code></p>
</li>
<li>
-<p><code>.withStorageIntegrationName()</code> Accepts a name of a
Snowflake storage integration object created according to Snowflake
documentationt. Example:
+<p><code>.withStorageIntegrationName()</code> Accepts a name of a
Snowflake storage integration object created according to Snowflake
documentation. Examples:
<pre><code>CREATE OR REPLACE STORAGE INTEGRATION test_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS =
(&#39;gcs://bucket/&#39;);</code></pre>
+<pre><code>CREATE STORAGE INTEGRATION test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = &#39;&lt;ARN ROLE NAME&gt;&#39;
+STORAGE_ALLOWED_LOCATIONS = (&#39;s3://bucket/&#39;)</code></pre>
Then:
<pre><code>.withStorageIntegrationName(test_integration)</code></pre>
</p>
@@ -1941,7 +1946,7 @@ SnowflakeIO uses COPY statements behind the scenes to
write (using <a href="h
<div class=language-java>
<div class="highlight"><pre class="chroma"><code
class="language-java" data-lang="java"><span
class="n">data</span><span class="o">.</span><span
class="na">apply</span><span class="o">(</span>
<span class="n">SnowflakeIO</span><span
class="o">.&lt;</span><span class="n">type</span><span
class="o">&gt;</span><span class="n">write</span><span
class="o">()</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
@@ -1972,7 +1977,7 @@ SnowflakeIO uses COPY statements behind the scenes to
write (using <a href="h
<p><code>.withStagingBucketName()</code></p>
<ul>
<li>Accepts a cloud bucket path ended with slash.</li>
-<li>Example:
<code>.withStagingBucketName(&quot;gs://mybucket/my/dir/&quot;)</code></li>
+<li>Example:
<code>.withStagingBucketName(&quot;{gs,s3}://bucket/my/dir/&quot;)</code></li>
</ul>
</li>
<li>
@@ -1985,6 +1990,12 @@ TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS =
(&#39;gcs://bucket/&#39;);</code></pre>
+<pre><code>CREATE STORAGE INTEGRATION test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = &#39;&lt;ARN ROLE NAME&gt;&#39;
+STORAGE_ALLOWED_LOCATIONS = (&#39;s3://bucket/&#39;)</code></pre>
Then:
<pre><code>.withStorageIntegrationName(test_integration)</code></pre>
</li>
@@ -2088,7 +2099,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre>
<span class="n">SnowflakeIO</span><span
class="o">.&lt;~&gt;</span><span
class="n">write</span><span class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">toTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withQueryTransformation</span><span
class="o">(</span><span class="n">query</span><span
class="o">)</span>
@@ -2114,7 +2125,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre>
<span class="n">SnowflakeIO</span><span
class="o">.&lt;~&gt;</span><span
class="n">write</span><span class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">toTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withWriteDisposition</span><span class="o">(</span><span
class="n">TRUNCATE</span><span class="o">)</span>
@@ -2137,7 +2148,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre>
<span class="n">SnowflakeIO</span><span
class="o">.&lt;~&gt;</span><span
class="n">write</span><span class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">toTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withCreateDisposition</span><span
class="o">(</span><span class="n">CREATE_NEVER</span><span
class="o">)</span>
@@ -2158,7 +2169,7 @@ A table schema is a list of <code>SFColumn</code>
objects with name and ty
<span class="n">SnowflakeIO</span><span
class="o">.&lt;~&gt;</span><span
class="n">write</span><span class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">toTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
-<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+<span class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withUserDataMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withTableSchema</span><span class="o">(</span><span
class="n">tableSchema</span><span class="o">)</span>
@@ -2174,7 +2185,7 @@ A table schema is a list of <code>SFColumn</code>
objects with name and ty
<span class="n">SnowflakeIO</span><span
class="o">.&lt;</span><span
class="n">USER_DATA_TYPE</span><span class="o">&gt;</span><span
class="n">read</span><span class="o">()</span>
<span class="o">.</span><span
class="na">withDataSourceConfiguration</span><span
class="o">(</span><span class="n">dc</span><span
class="o">)</span>
<span class="o">.</span><span class="na">fromTable</span><span
class="o">(</span><span
class="s">&#34;MY_TABLE&#34;</span><span class="o">)</span>
<span class="c1">// or .fromQuery(&#34;QUERY&#34;)
-</span><span class="c1"></span> <span
class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span class="s">&#34;BUCKET
NAME&#34;</span><span class="o">)</span>
+</span><span class="c1"></span> <span
class="o">.</span><span
class="na">withStagingBucketName</span><span
class="o">(</span><span
class="s">&#34;BUCKET&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withStorageIntegrationName</span><span
class="o">(</span><span class="s">&#34;STORAGE INTEGRATION
NAME&#34;</span><span class="o">)</span>
<span class="o">.</span><span
class="na">withCsvMapper</span><span class="o">(</span><span
class="n">mapper</span><span class="o">)</span>
<span class="o">.</span><span class="na">withCoder</span><span
class="o">(</span><span class="n">coder</span><span
class="o">));</span>
@@ -2210,6 +2221,12 @@ TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS =
(&#39;gcs://bucket/&#39;);</code></pre>
+<pre><code>CREATE STORAGE INTEGRATION test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = &#39;&lt;ARN ROLE NAME&gt;&#39;
+STORAGE_ALLOWED_LOCATIONS = (&#39;s3://bucket/&#39;)</code></pre>
Then:
<pre><code>.withStorageIntegrationName(test_integration)</code></pre>
</p>
@@ -2230,7 +2247,7 @@ Then:
<p><strong>Note</strong>:
SnowflakeIO uses COPY statements behind the scenes to read (using <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY
to location</a>) files staged in cloud storage.StagingBucketName will be
used as a temporary location for storing CSV files. Those temporary directories
will be named <code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX</code> and they
will be removed automatically once Read operation finishes.</p>
<h3 id="csvmapper">CSVMapper</h3>
-<p>SnowflakeIO uses a <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY
INTO <location></a> statement to move data from a Snowflake table to
Google Cloud Storage as CSV files. These files are then downloaded via <a
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileIO</a>
and processed line by line. Each line is split into an array of Strings using
the <a href="http://ope [...]
+<p>SnowflakeIO uses a <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY
INTO <location></a> statement to move data from a Snowflake table to
GCS/S3 as CSV files. These files are then downloaded via <a
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileIO</a>
and processed line by line. Each line is split into an array of Strings using
the <a href="http://opencsv.sourcefor [...]
<p>The CSVMapper’s job is to give the user the possibility to convert the
array of Strings to a user-defined type, ie. GenericRecord for Avro or Parquet
files, or custom POJO.</p>
<p>Example implementation of CsvMapper for GenericRecord:
<div class=language-java>
@@ -2246,6 +2263,39 @@ SnowflakeIO uses COPY statements behind the scenes to
read (using <a href="ht
<span class="o">}</span></code></pre></div>
</div>
</p>
+<h2 id="using-snowflakeio-with-aws-s3">Using SnowflakeIO with AWS S3</h2>
+<p>To be able to use AWS S3 bucket as <code>stagingBucketName</code>
is required to:</p>
+<ol>
+<li>Create <code>PipelineOptions</code> interface which is <a
href="https://beam.apache.org/documentation/io/built-in/snowflake/#extending-pipeline-options">extending</a>
<code>SnowflakePipelineOptions</code> and <a
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/aws/options/S3Options.html">S3Options</a>
+with <code>AwsAccessKey</code> and <code>AwsSecretKey</code>
options. Example:</li>
+</ol>
+<p>
+<div class=language-java>
+<div class="highlight"><pre class="chroma"><code
class="language-java" data-lang="java"><span class="kd">public</span>
<span class="kd">interface</span> <span
class="nc">AwsPipelineOptions</span> <span class="kd">extends</span>
<span class="n">SnowflakePipelineOptions</span><span
class="o">,</span> <span class="n">S3Options</span> <span
class="o">{</span>
+<span class="nd">@Description</span><span
class="o">(</span><span class="s">&#34;AWS Access
Key&#34;</span><span class="o">)</span>
+<span class="nd">@Default.String</span><span
class="o">(</span><span
class="s">&#34;access_key&#34;</span><span class="o">)</span>
+<span class="n">String</span> <span
class="nf">getAwsAccessKey</span><span class="o">();</span>
+<span class="kt">void</span> <span
class="nf">setAwsAccessKey</span><span class="o">(</span><span
class="n">String</span> <span class="n">awsAccessKey</span><span
class="o">);</span>
+<span class="nd">@Description</span><span
class="o">(</span><span class="s">&#34;AWS secret
key&#34;</span><span class="o">)</span>
+<span class="nd">@Default.String</span><span
class="o">(</span><span
class="s">&#34;secret_key&#34;</span><span class="o">)</span>
+<span class="n">String</span> <span
class="nf">getAwsSecretKey</span><span class="o">();</span>
+<span class="kt">void</span> <span
class="nf">setAwsSecretKey</span><span class="o">(</span><span
class="n">String</span> <span class="n">awsSecretKey</span><span
class="o">);</span>
+<span class="o">}</span></code></pre></div>
+</div>
+2. Set <code>AwsCredentialsProvider</code> option by using
<code>AwsAccessKey</code> and <code>AwsSecretKey</code>
options.</p>
+<p>
+<div class=language-java>
+<div class="highlight"><pre class="chroma"><code
class="language-java" data-lang="java"><span
class="n">options</span><span class="o">.</span><span
class="na">setAwsCredentialsProvider</span><span class="o">(</span>
+<span class="k">new</span> <span
class="n">AWSStaticCredentialsProvider</span><span class="o">(</span>
+<span class="k">new</span> <span
class="n">BasicAWSCredentials</span><span class="o">(</span><span
class="n">options</span><span class="o">.</span><span
class="na">getAwsAccessKey</span><span class="o">(),</span> <span
class="n">options</span><span class="o">.</span><span
class="na">getAwsSecretKey</span><span class="o">())</span>
+<span class="o">)</span>
+<span class="o">);</span></code></pre></div>
+</div>
+3. Create pipeline</p>
+<div class=language-java>
+<div class="highlight"><pre class="chroma"><code
class="language-java" data-lang="java"><span class="n">Pipeline</span>
<span class="n">p</span> <span class="o">=</span> <span
class="n">Pipeline</span><span class="o">.</span><span
class="na">create</span><span class="o">(</span><span
class="n">options</span><span
class="o">);</span></code></pre></div>
+</div>
+<p>note: remember to set <code>awsRegion</code> from <a
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/aws/options/S3Options.html">S3Options</a>.</p>
<h2 id="using-snowflakeio-in-python-sdk">Using SnowflakeIO in Python
SDK</h2>
<h3 id="intro">Intro</h3>
<p>Snowflake cross-language implementation is supporting both reading and
writing operations for Python programming language, thanks to
@@ -2275,13 +2325,13 @@ and <a
href="https://beam.apache.org/roadmap/connectors-multi-sdk/#cross-lang
<p><code>database</code> Name of the Snowflake database to use.</p>
</li>
<li>
-<p><code>staging_bucket_name</code> Name of the Google Cloud Storage
bucket. Bucket will be used as a temporary location for storing CSV files.
Those temporary directories will be named
<code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX</code> and they will be removed
automatically once Read operation finishes.</p>
+<p><code>staging_bucket_name</code> Name of the Google Cloud Storage
bucket or AWS S3 bucket. Bucket will be used as a temporary location for
storing CSV files. Those temporary directories will be named
<code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX</code> and they will be removed
automatically once Read operation finishes.</p>
</li>
<li>
<p><code>storage_integration_name</code> Is the name of a Snowflake
storage integration object created according to <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html">Snowflake
documentation</a>.</p>
</li>
<li>
-<p><code>csv_mapper</code> Specifies a function which must translate
user-defined object to array of strings. SnowflakeIO uses a <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY
INTO <location></a> statement to move data from a Snowflake table to
Google Cloud Storage as CSV files. These files are then downloaded via <a
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileI
[...]
+<p><code>csv_mapper</code> Specifies a function which must translate
user-defined object to array of strings. SnowflakeIO uses a <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html">COPY
INTO <location></a> statement to move data from a Snowflake table to
GCS/S3 as CSV files. These files are then downloaded via <a
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/io/FileIO.html">FileIO</a>
and p [...]
Example:
<div class=language-py>
<div class="highlight"><pre class="chroma"><code class="language-py"
data-lang="py"><span class="k">def</span> <span
class="nf">csv_mapper</span><span class="p">(</span><span
class="n">strings_array</span><span class="p">):</span>
@@ -2339,7 +2389,7 @@ Example:
<span class="n">private_key_passphrase</span><span
class="o">=&lt;</span><span class="n">PASSWORD</span> <span
class="n">FOR</span> <span class="n">KEY</span><span
class="o">&gt;</span><span class="p">,</span>
<span class="n">schema</span><span
class="o">=&lt;</span><span class="n">SNOWFLAKE</span> <span
class="n">SCHEMA</span><span class="o">&gt;</span><span
class="p">,</span>
<span class="n">database</span><span
class="o">=&lt;</span><span class="n">SNOWFLAKE</span> <span
class="n">DATABASE</span><span class="o">&gt;</span><span
class="p">,</span>
-<span class="n">staging_bucket_name</span><span
class="o">=&lt;</span><span class="n">GCS</span> <span
class="n">BUCKET</span> <span class="n">NAME</span><span
class="o">&gt;</span><span class="p">,</span>
+<span class="n">staging_bucket_name</span><span
class="o">=&lt;</span><span class="n">GCS</span> <span
class="n">OR</span> <span class="n">S3</span> <span
class="n">BUCKET</span><span class="o">&gt;</span><span
class="p">,</span>
<span class="n">storage_integration_name</span><span
class="o">=&lt;</span><span class="n">SNOWFLAKE</span> <span
class="n">STORAGE</span> <span class="n">INTEGRATION</span> <span
class="n">NAME</span><span class="o">&gt;</span><span
class="p">,</span>
<span class="n">create_disposition</span><span
class="o">=&lt;</span><span class="n">CREATE</span> <span
class="n">DISPOSITION</span><span class="o">&gt;</span><span
class="p">,</span>
<span class="n">write_disposition</span><span
class="o">=&lt;</span><span class="n">WRITE</span> <span
class="n">DISPOSITION</span><span class="o">&gt;</span><span
class="p">,</span>
@@ -2363,7 +2413,7 @@ Example:
<p><code>database</code> Name of the Snowflake database to use.</p>
</li>
<li>
-<p><code>staging_bucket_name</code> Path to Google Cloud Storage
bucket ended with slash. Bucket will be used to save CSV files which will end
up in Snowflake. Those CSV files will be saved under “staging_bucket_name”
path.</p>
+<p><code>staging_bucket_name</code> Path to Google Cloud Storage
bucket or AWS S3 bucket ended with slash. Bucket will be used to save CSV files
which will end up in Snowflake. Those CSV files will be saved under
“staging_bucket_name” path.</p>
</li>
<li>
<p><code>storage_integration_name</code> Is the name of a Snowflake
storage integration object created according to <a
href="https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html">Snowflake
documentation</a>.</p>
diff --git
a/website/generated-content/documentation/io/built-in/snowflake/index.html
b/website/generated-content/documentation/io/built-in/snowflake/index.html
index 542e6c0..a5004d1 100644
--- a/website/generated-content/documentation/io/built-in/snowflake/index.html
+++ b/website/generated-content/documentation/io/built-in/snowflake/index.html
@@ -1,7 +1,7 @@
<!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta
http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport
content="width=device-width,initial-scale=1"><title>Apache Snowflake I/O
connector</title><meta name=description content="Apache Beam is an open source,
unified model and set of language-specific SDKs for defining and executing data
processing workflows, and also data ingestion and integration flows, supporting
Enterprise Integration Patterns (EIPs) an [...]
<span class=sr-only>Toggle navigation</span>
<span class=icon-bar></span><span class=icon-bar></span><span
class=icon-bar></span></button>
-<a href=/ class=navbar-brand><img alt=Brand style=height:25px
src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask
closed"></div><div id=navbar class="navbar-container closed"><ul class="nav
navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a
href=/documentation/>Documentation</a></li><li><a
href=/documentation/sdks/java/>Languages</a></li><li><a
href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a
href=/roadmap/>Roadmap</a></li>< [...]
+<a href=/ class=navbar-brand><img alt=Brand style=height:25px
src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask
closed"></div><div id=navbar class="navbar-container closed"><ul class="nav
navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a
href=/documentation/>Documentation</a></li><li><a
href=/documentation/sdks/java/>Languages</a></li><li><a
href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a
href=/roadmap/>Roadmap</a></li>< [...]
<span class=o>.</span><span class=na>fromArgs</span><span
class=o>(</span><span class=n>args</span><span class=o>)</span>
<span class=o>.</span><span class=na>withValidation</span><span
class=o>()</span>
<span class=o>.</span><span class=na>as</span><span
class=o>(</span><span class=n>SnowflakePipelineOptions</span><span
class=o>.</span><span class=na>class</span><span class=o>);</span>
@@ -15,7 +15,7 @@
<span class=o>.</span><span class=na>withServerName</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getServerName</span><span class=o>())</span>
<span class=o>.</span><span class=na>withDatabase</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getDatabase</span><span class=o>())</span>
<span class=o>.</span><span class=na>withWarehouse</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getWarehouse</span><span class=o>())</span>
- <span class=o>.</span><span class=na>withSchema</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getSchema</span><span
class=o>());</span></code></pre></div></div>Where parameters can
be:</p><ul><li><code>.withUrl(...)</code><ul><li>JDBC-like URL for your
Snowflake account, including account name and region, without any
parameters.</li><li>Example:
<code>.withUrl("jdbc:snowflake://account.snowflakecomputing.com")</code></li></ul></l
[...]
+ <span class=o>.</span><span class=na>withSchema</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getSchema</span><span
class=o>());</span></code></pre></div></div>Where parameters can
be:</p><ul><li><code>.withUrl(...)</code><ul><li>JDBC-like URL for your
Snowflake account, including account name and region, without any
parameters.</li><li>Example:
<code>.withUrl("jdbc:snowflake://account.snowflakecomputing.com")</code></li></ul></l
[...]
--args="
--serverName=<SNOWFLAKE SERVER NAME>
Example: --serverName=account.region.gcp.snowflakecomputing.com
@@ -33,15 +33,15 @@
Example: --query=‘SELECT column FROM TABLE’
--storageIntegrationName=<SNOWFLAKE STORAGE INTEGRATION NAME>
Example: --storageIntegrationName=my_integration
- --stagingBucketName=<GCS BUCKET NAME>
- Example: --stagingBucketName=gs://my_gcp_bucket/
+ --stagingBucketName=<GCS OR S3 BUCKET>
+ Example: --stagingBucketName={gs,s3}://bucket/
--runner=<DirectRunner/DataflowRunner>
Example: --runner=DataflowRunner
--project=<FOR DATAFLOW RUNNER: GCP PROJECT NAME>
Example: --project=my_project
--tempLocation=<FOR DATAFLOW RUNNER: GCS TEMP LOCATION STARTING
WITH gs://…>
- Example: --tempLocation=gs://my_bucket/temp/
+ Example: --tempLocation=gs://bucket/temp/
--region=<FOR DATAFLOW RUNNER: GCP REGION>
Example: --region=us-east-1
--appName=<OPTIONAL: DATAFLOW JOB NAME PREFIX>
@@ -61,26 +61,31 @@
Example: --database=TEST_DATABASE
"--storageIntegrationName=<SNOWFLAKE STORAGE INTEGRATION
NAME>",
Example: --storageIntegrationName=my_integration
- "--stagingBucketName=<GCS BUCKET NAME>",
- Example: --stagingBucketName=gs://my_gcp_bucket
+ "--stagingBucketName=<GCS OR S3 BUCKET>",
+ Example: --stagingBucketName={gs,s3}://bucket
"--externalLocation=<GCS BUCKET URL STARTING WITH GS://>",
- Example: --tempLocation=gs://my_bucket/temp/
-]' --no-build-cache</code></pre></p><p>Where all parameters are starting
with “–”, they are surrounded with double quotation and separated with
comma:</p><ul><li><p><code>--serverName=<SNOWFLAKE SERVER
NAME></code></p><ul><li>Specifies the full name of your account (provided by
Snowflake). Note that your full account name might include additional segments
that identify the region and cloud platform where your account is
hosted.</li><li>Example: <code>--serverName=xy12345.eu- [...]
+ Example: --tempLocation=gs://bucket/temp/
+]' --no-build-cache</code></pre></p><p>Where all parameters are starting
with “–”, they are surrounded with double quotation and separated with
comma:</p><ul><li><p><code>--serverName=<SNOWFLAKE SERVER
NAME></code></p><ul><li>Specifies the full name of your account (provided by
Snowflake). Note that your full account name might include additional segments
that identify the region and cloud platform where your account is
hosted.</li><li>Example: <code>--serverName=xy12345.eu- [...]
<span class=n>SnowflakeIO</span><span class=o>.<</span><span
class=n>type</span><span class=o>></span><span class=n>write</span><span
class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>toTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>)</span></code></pre></div></div>Replace type with the data type
of the PCollection object to write; for example, SnowflakeIO.<string> for an
input PCollection of Strings.</p><p>All the below parameters are
required:</p><ul><li><p><code>.withDataSourceConfiguration()</code> Accepts a
DatasourceConfiguration object.</p></li><li><p><code>.toTable()</code> Accepts
the target Snowflake table
name.</p></li><li><p><code>.withStagingBucketName()</code> Accepts a cloud
bucket path [...]
--Example:
<code>.withStagingBucketName("gs://mybucket/my/dir/")</code></p></li><li><p><code>.withStorageIntegrationName()</code>
Accepts a name of a Snowflake storage integration object created according to
Snowflake documentationt. Example:<pre><code>CREATE OR REPLACE STORAGE
INTEGRATION test_integration
+-Example:
<code>.withStagingBucketName("{gs,s3}://bucket/my/dir/")</code></p></li><li><p><code>.withStorageIntegrationName()</code>
Accepts a name of a Snowflake storage integration object created according to
Snowflake documentation. Examples:<pre><code>CREATE OR REPLACE STORAGE
INTEGRATION test_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
-STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></p></li><li><p><code>.withUserDataMapper()</code>
Accepts the UserDataMapper function that will map a user’s PCollection
to an array of String values
<code>(String[])</code>.</p></li></ul><p><strong>Note</strong>:
+STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre><pre><code>CREATE STORAGE INTEGRATION
test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = '<ARN ROLE NAME>'
+STORAGE_ALLOWED_LOCATIONS =
('s3://bucket/')</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></p></li><li><p><code>.withUserDataMapper()</code>
Accepts the UserDataMapper function that will map a user’s PCollection
to an array of String values
<code>(String[])</code>.</p></li></ul><p><strong>Note</strong>:
SnowflakeIO uses COPY statements behind the scenes to write (using <a
href=https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html>COPY
to table</a>). StagingBucketName will be used to save CSV files which will end
up in Snowflake. Those CSV files will be saved under the “stagingBucketName”
path.</p><p><strong>Optional</strong> for
batching:</p><ul><li><code>.withQuotationMark()</code><ul><li>Default value:
<code>‘</code> (single quotation mark).</li><li>Accepts String [...]
<span class=n>SnowflakeIO</span><span class=o>.<</span><span
class=n>type</span><span class=o>></span><span class=n>write</span><span
class=o>()</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
@@ -88,11 +93,16 @@ SnowflakeIO uses COPY statements behind the scenes to write
(using <a href=https
<span class=o>.</span><span class=na>withFlushTimeLimit</span><span
class=o>(</span><span class=n>Duration</span><span class=o>.</span><span
class=na>millis</span><span class=o>(</span><span class=n>time</span><span
class=o>))</span>
<span class=o>.</span><span class=na>withFlushRowLimit</span><span
class=o>(</span><span class=n>rowsNumber</span><span class=o>)</span>
<span class=o>.</span><span class=na>withShardsNumber</span><span
class=o>(</span><span class=n>shardsNumber</span><span class=o>)</span>
-<span class=o>)</span></code></pre></div></div></p><h4
id=parameters>Parameters</h4><p><strong>Required</strong> for
streaming:</p><ul><li><p><code>.withDataSourceConfiguration()</code></p><ul><li>Accepts
a DatasourceConfiguration
object.</li></ul></li><li><p><code>.toTable()</code></p><ul><li>Accepts the
target Snowflake table name.</li><li>Example:
<code>.toTable("MY_TABLE)</code></li></ul></li><li><p><code>.withStagingBucketName()</code></p><ul><li>Accepts
a cloud bucket path ended wi [...]
+<span class=o>)</span></code></pre></div></div></p><h4
id=parameters>Parameters</h4><p><strong>Required</strong> for
streaming:</p><ul><li><p><code>.withDataSourceConfiguration()</code></p><ul><li>Accepts
a DatasourceConfiguration
object.</li></ul></li><li><p><code>.toTable()</code></p><ul><li>Accepts the
target Snowflake table name.</li><li>Example:
<code>.toTable("MY_TABLE)</code></li></ul></li><li><p><code>.withStagingBucketName()</code></p><ul><li>Accepts
a cloud bucket path ended wi [...]
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
-STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></li></ul></li><li><p><code>.withSnowPipe()</code></p><ul><li><p>Accepts
the target SnowPipe name. <code>.withSnowPipe()</code> accepts the exact name
of snowpipe.
+STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre><pre><code>CREATE STORAGE INTEGRATION
test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = '<ARN ROLE NAME>'
+STORAGE_ALLOWED_LOCATIONS =
('s3://bucket/')</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></li></ul></li><li><p><code>.withSnowPipe()</code></p><ul><li><p>Accepts
the target SnowPipe name. <code>.withSnowPipe()</code> accepts the exact name
of snowpipe.
Example:<pre><code>CREATE OR REPLACE PIPE test_database.public.test_gcs_pipe
AS COPY INTO stream_table from
@streamstage;</code></pre></p></li><li><p>Then:<pre><code>.withSnowPipe(test_gcs_pipe)</code></pre></p></li></ul></li></ul><p><strong>Note</strong>:
this is important to provide <strong>schema</strong> and
<strong>database</strong>
names.</p><ul><li><code>.withUserDataMapper()</code><ul><li>Accepts the <a
href=https://beam.apache.org/documentation/io/built-in/snowflake/#userdatamapper-function>UserDataMapper</a>
function that will map a user’s PCollec [...]
<span class=k>return</span> <span class=o>(</span><span
class=n>SnowflakeIO</span><span class=o>.</span><span
class=na>UserDataMapper</span><span class=o><</span><span
class=n>Long</span><span class=o>>)</span> <span class=n>recordLine</span>
<span class=o>-></span> <span class=k>new</span> <span
class=n>String</span><span class=o>[]</span> <span class=o>{</span><span
class=n>recordLine</span><span class=o>.</span><span
class=na>toString</span><span class=o>()};</span>
@@ -101,7 +111,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre></p></li><li><p>Then:<p
<span class=n>SnowflakeIO</span><span class=o>.<~></span><span
class=n>write</span><span class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>toTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withQueryTransformation</span><span class=o>(</span><span
class=n>query</span><span class=o>)</span>
@@ -109,7 +119,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre></p></li><li><p>Then:<p
<span class=n>SnowflakeIO</span><span class=o>.<~></span><span
class=n>write</span><span class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>toTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>.</span><span class=na>withWriteDisposition</span><span
class=o>(</span><span class=n>TRUNCATE</span><span class=o>)</span>
@@ -117,7 +127,7 @@ AS COPY INTO stream_table from
@streamstage;</code></pre></p></li><li><p>Then:<p
<span class=n>SnowflakeIO</span><span class=o>.<~></span><span
class=n>write</span><span class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>toTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>.</span><span class=na>withCreateDisposition</span><span
class=o>(</span><span class=n>CREATE_NEVER</span><span class=o>)</span>
@@ -132,7 +142,7 @@ A table schema is a list of <code>SFColumn</code> objects
with name and type cor
<span class=n>SnowflakeIO</span><span class=o>.<~></span><span
class=n>write</span><span class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>toTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET NAME"</span><span class=o>)</span>
+ <span class=o>.</span><span class=na>withStagingBucketName</span><span
class=o>(</span><span class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withUserDataMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>.</span><span class=na>withTableSchema</span><span
class=o>(</span><span class=n>tableSchema</span><span class=o>)</span>
@@ -140,7 +150,7 @@ A table schema is a list of <code>SFColumn</code> objects
with name and type cor
<span class=n>SnowflakeIO</span><span class=o>.<</span><span
class=n>USER_DATA_TYPE</span><span class=o>></span><span
class=n>read</span><span class=o>()</span>
<span class=o>.</span><span
class=na>withDataSourceConfiguration</span><span class=o>(</span><span
class=n>dc</span><span class=o>)</span>
<span class=o>.</span><span class=na>fromTable</span><span
class=o>(</span><span class=s>"MY_TABLE"</span><span class=o>)</span>
<span class=c1>// or .fromQuery("QUERY")
-</span><span class=c1></span> <span class=o>.</span><span
class=na>withStagingBucketName</span><span class=o>(</span><span
class=s>"BUCKET NAME"</span><span class=o>)</span>
+</span><span class=c1></span> <span class=o>.</span><span
class=na>withStagingBucketName</span><span class=o>(</span><span
class=s>"BUCKET"</span><span class=o>)</span>
<span class=o>.</span><span
class=na>withStorageIntegrationName</span><span class=o>(</span><span
class=s>"STORAGE INTEGRATION NAME"</span><span class=o>)</span>
<span class=o>.</span><span class=na>withCsvMapper</span><span
class=o>(</span><span class=n>mapper</span><span class=o>)</span>
<span class=o>.</span><span class=na>withCoder</span><span
class=o>(</span><span class=n>coder</span><span class=o>));</span>
@@ -148,8 +158,13 @@ A table schema is a list of <code>SFColumn</code> objects
with name and type cor
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
-STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></p></li><li><p><code>.withCsvMapper(mapper)</code></p><ul><li>Accepts
a <a
href=https://beam.apache.org/documentation/io/built-in/snowflake/#csvmapper>CSVMapper</a>
instance for mapping String[] to
USER_DATA_TYPE.</li></ul></li><li><p><code>.withCoder(coder)</code></p><ul><li>Accepts
the <a href=https://beam.apache.org/releases/javadoc/current/org/ [...]
-SnowflakeIO uses COPY statements behind the scenes to read (using <a
href=https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html>COPY
to location</a>) files staged in cloud storage.StagingBucketName will be used
as a temporary location for storing CSV files. Those temporary directories will
be named <code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX</code> and they will be
removed automatically once Read operation finishes.</p><h3
id=csvmapper>CSVMapper</h3><p>SnowflakeIO use [...]
+STORAGE_ALLOWED_LOCATIONS =
('gcs://bucket/');</code></pre><pre><code>CREATE STORAGE INTEGRATION
test_integration
+TYPE = EXTERNAL_STAGE
+STORAGE_PROVIDER = S3
+ENABLED = TRUE
+STORAGE_AWS_ROLE_ARN = '<ARN ROLE NAME>'
+STORAGE_ALLOWED_LOCATIONS =
('s3://bucket/')</code></pre>Then:<pre><code>.withStorageIntegrationName(test_integration)</code></pre></p></li><li><p><code>.withCsvMapper(mapper)</code></p><ul><li>Accepts
a <a
href=https://beam.apache.org/documentation/io/built-in/snowflake/#csvmapper>CSVMapper</a>
instance for mapping String[] to
USER_DATA_TYPE.</li></ul></li><li><p><code>.withCoder(coder)</code></p><ul><li>Accepts
the <a href=https://beam.apache.org/releases/javadoc/current/org/ap [...]
+SnowflakeIO uses COPY statements behind the scenes to read (using <a
href=https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html>COPY
to location</a>) files staged in cloud storage.StagingBucketName will be used
as a temporary location for storing CSV files. Those temporary directories will
be named <code>sf_copy_csv_DATE_TIME_RANDOMSUFFIX</code> and they will be
removed automatically once Read operation finishes.</p><h3
id=csvmapper>CSVMapper</h3><p>SnowflakeIO use [...]
<span class=k>return</span> <span class=o>(</span><span
class=n>SnowflakeIO</span><span class=o>.</span><span
class=na>CsvMapper</span><span class=o><</span><span
class=n>GenericRecord</span><span class=o>>)</span>
<span class=n>parts</span> <span class=o>-></span> <span
class=o>{</span>
<span class=k>return</span> <span class=k>new</span> <span
class=n>GenericRecordBuilder</span><span class=o>(</span><span
class=n>PARQUET_SCHEMA</span><span class=o>)</span>
@@ -158,7 +173,25 @@ SnowflakeIO uses COPY statements behind the scenes to read
(using <a href=https:
<span class=o>[...]</span>
<span class=o>.</span><span class=na>build</span><span
class=o>();</span>
<span class=o>};</span>
-<span class=o>}</span></code></pre></div></div></p><h2
id=using-snowflakeio-in-python-sdk>Using SnowflakeIO in Python SDK</h2><h3
id=intro>Intro</h3><p>Snowflake cross-language implementation is supporting
both reading and writing operations for Python programming language, thanks to
+<span class=o>}</span></code></pre></div></div></p><h2
id=using-snowflakeio-with-aws-s3>Using SnowflakeIO with AWS S3</h2><p>To be
able to use AWS S3 bucket as <code>stagingBucketName</code> is required
to:</p><ol><li>Create <code>PipelineOptions</code> interface which is <a
href=https://beam.apache.org/documentation/io/built-in/snowflake/#extending-pipeline-options>extending</a>
<code>SnowflakePipelineOptions</code> and <a
href=https://beam.apache.org/releases/javadoc/current/org/apache [...]
+with <code>AwsAccessKey</code> and <code>AwsSecretKey</code> options.
Example:</li></ol><p><div class=language-java><div class=highlight><pre
class=chroma><code class=language-java data-lang=java><span
class=kd>public</span> <span class=kd>interface</span> <span
class=nc>AwsPipelineOptions</span> <span class=kd>extends</span> <span
class=n>SnowflakePipelineOptions</span><span class=o>,</span> <span
class=n>S3Options</span> <span class=o>{</span>
+
+ <span class=nd>@Description</span><span class=o>(</span><span
class=s>"AWS Access Key"</span><span class=o>)</span>
+ <span class=nd>@Default.String</span><span class=o>(</span><span
class=s>"access_key"</span><span class=o>)</span>
+ <span class=n>String</span> <span class=nf>getAwsAccessKey</span><span
class=o>();</span>
+
+ <span class=kt>void</span> <span class=nf>setAwsAccessKey</span><span
class=o>(</span><span class=n>String</span> <span
class=n>awsAccessKey</span><span class=o>);</span>
+
+ <span class=nd>@Description</span><span class=o>(</span><span
class=s>"AWS secret key"</span><span class=o>)</span>
+ <span class=nd>@Default.String</span><span class=o>(</span><span
class=s>"secret_key"</span><span class=o>)</span>
+ <span class=n>String</span> <span class=nf>getAwsSecretKey</span><span
class=o>();</span>
+
+ <span class=kt>void</span> <span class=nf>setAwsSecretKey</span><span
class=o>(</span><span class=n>String</span> <span
class=n>awsSecretKey</span><span class=o>);</span>
+<span class=o>}</span></code></pre></div></div>2. Set
<code>AwsCredentialsProvider</code> option by using <code>AwsAccessKey</code>
and <code>AwsSecretKey</code> options.</p><p><div class=language-java><div
class=highlight><pre class=chroma><code class=language-java
data-lang=java><span class=n>options</span><span class=o>.</span><span
class=na>setAwsCredentialsProvider</span><span class=o>(</span>
+ <span class=k>new</span> <span
class=n>AWSStaticCredentialsProvider</span><span class=o>(</span>
+ <span class=k>new</span> <span class=n>BasicAWSCredentials</span><span
class=o>(</span><span class=n>options</span><span class=o>.</span><span
class=na>getAwsAccessKey</span><span class=o>(),</span> <span
class=n>options</span><span class=o>.</span><span
class=na>getAwsSecretKey</span><span class=o>())</span>
+ <span class=o>)</span>
+<span class=o>);</span></code></pre></div></div>3. Create pipeline</p><div
class=language-java><div class=highlight><pre class=chroma><code
class=language-java data-lang=java><span class=n>Pipeline</span> <span
class=n>p</span> <span class=o>=</span> <span class=n>Pipeline</span><span
class=o>.</span><span class=na>create</span><span class=o>(</span><span
class=n>options</span><span class=o>);</span></code></pre></div></div><p>note:
remember to set <code>awsRegion</code> from <a href=htt [...]
cross-language which is part of <a
href=https://beam.apache.org/roadmap/portability/>Portability Framework
Roadmap</a> which aims to provide full interoperability
across the Beam ecosystem. From a developer perspective it means the
possibility of combining transforms written in different
languages(Java/Python/Go).</p><p>For more information about cross-language
please see <a href=https://beam.apache.org/roadmap/connectors-multi-sdk/>multi
sdk efforts</a>
and <a
href=https://beam.apache.org/roadmap/connectors-multi-sdk/#cross-language-transforms-api-and-expansion-service>Cross-language
transforms API and expansion service</a> articles.</p><h3
id=reading-from-snowflake-1>Reading from Snowflake</h3><p>One of the functions
of SnowflakeIO is reading Snowflake tables - either full tables via table name
or custom data via query. Output of the read transform is a <a
href=https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apac
[...]
@@ -166,7 +199,7 @@ and <a
href=https://beam.apache.org/roadmap/connectors-multi-sdk/#cross-language
<span class=k>with</span> <span class=n>TestPipeline</span><span
class=p>(</span><span class=n>options</span><span class=o>=</span><span
class=n>PipelineOptions</span><span class=p>(</span><span
class=n>OPTIONS</span><span class=p>))</span> <span class=k>as</span> <span
class=n>p</span><span class=p>:</span>
<span class=p>(</span><span class=n>p</span>
<span class=o>|</span> <span class=n>ReadFromSnowflake</span><span
class=p>(</span><span class=o>...</span><span class=p>)</span>
- <span class=o>|</span> <span class=o><</span><span
class=n>FURTHER</span> <span class=n>TRANSFORMS</span><span
class=o>></span><span class=p>)</span></code></pre></div></div><h4
id=required-parameters>Required
parameters</h4><ul><li><p><code>server_name</code> Full Snowflake server name
with an account, zone, and domain.</p></li><li><p><code>schema</code> Name of
the Snowflake schema in the database to
use.</p></li><li><p><code>database</code> Name of the Snowflake database [...]
+ <span class=o>|</span> <span class=o><</span><span
class=n>FURTHER</span> <span class=n>TRANSFORMS</span><span
class=o>></span><span class=p>)</span></code></pre></div></div><h4
id=required-parameters>Required
parameters</h4><ul><li><p><code>server_name</code> Full Snowflake server name
with an account, zone, and domain.</p></li><li><p><code>schema</code> Name of
the Snowflake schema in the database to
use.</p></li><li><p><code>database</code> Name of the Snowflake database [...]
Example:<div class=language-py><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=k>def</span> <span
class=nf>csv_mapper</span><span class=p>(</span><span
class=n>strings_array</span><span class=p>):</span>
<span class=k>return</span> <span class=n>User</span><span
class=p>(</span><span class=n>strings_array</span><span class=p>[</span><span
class=mi>0</span><span class=p>],</span> <span class=nb>int</span><span
class=p>(</span><span class=n>strings_array</span><span class=p>[</span><span
class=mi>1</span><span
class=p>])))</span></code></pre></div></div></p></li><li><p><code>table</code>
or <code>query</code> Specifies a Snowflake table name or custom SQL
query</p></li></ul><h4 id=auth [...]
@@ -183,7 +216,7 @@ Example:<div class=language-py><div class=highlight><pre
class=chroma><code clas
<span class=n>private_key_passphrase</span><span
class=o>=<</span><span class=n>PASSWORD</span> <span class=n>FOR</span>
<span class=n>KEY</span><span class=o>></span><span class=p>,</span>
<span class=n>schema</span><span class=o>=<</span><span
class=n>SNOWFLAKE</span> <span class=n>SCHEMA</span><span
class=o>></span><span class=p>,</span>
<span class=n>database</span><span class=o>=<</span><span
class=n>SNOWFLAKE</span> <span class=n>DATABASE</span><span
class=o>></span><span class=p>,</span>
- <span class=n>staging_bucket_name</span><span
class=o>=<</span><span class=n>GCS</span> <span class=n>BUCKET</span> <span
class=n>NAME</span><span class=o>></span><span class=p>,</span>
+ <span class=n>staging_bucket_name</span><span
class=o>=<</span><span class=n>GCS</span> <span class=n>OR</span> <span
class=n>S3</span> <span class=n>BUCKET</span><span class=o>></span><span
class=p>,</span>
<span class=n>storage_integration_name</span><span
class=o>=<</span><span class=n>SNOWFLAKE</span> <span class=n>STORAGE</span>
<span class=n>INTEGRATION</span> <span class=n>NAME</span><span
class=o>></span><span class=p>,</span>
<span class=n>create_disposition</span><span
class=o>=<</span><span class=n>CREATE</span> <span
class=n>DISPOSITION</span><span class=o>></span><span class=p>,</span>
<span class=n>write_disposition</span><span
class=o>=<</span><span class=n>WRITE</span> <span
class=n>DISPOSITION</span><span class=o>></span><span class=p>,</span>
@@ -193,7 +226,7 @@ Example:<div class=language-py><div class=highlight><pre
class=chroma><code clas
<span class=n>query</span><span class=o>=<</span><span
class=n>IF</span> <span class=n>NOT</span> <span class=n>TABLE</span> <span
class=n>THEN</span> <span class=n>QUERY</span><span class=o>></span><span
class=p>,</span>
<span class=n>role</span><span class=o>=<</span><span
class=n>SNOWFLAKE</span> <span class=n>ROLE</span><span
class=o>></span><span class=p>,</span>
<span class=n>warehouse</span><span class=o>=<</span><span
class=n>SNOWFLAKE</span> <span class=n>WAREHOUSE</span><span
class=o>></span><span class=p>,</span>
- <span class=n>expansion_service</span><span
class=o>=<</span><span class=n>EXPANSION</span> <span class=n>SERVICE</span>
<span class=n>ADDRESS</span><span class=o>></span><span
class=p>))</span></code></pre></div></div><h4 id=required-parameters-1>Required
parameters</h4><ul><li><p><code>server_name</code> Full Snowflake server name
with account, zone and domain.</p></li><li><p><code>schema</code> Name of the
Snowflake schema in the database to use.</p></li><li><p><code> [...]
+ <span class=n>expansion_service</span><span
class=o>=<</span><span class=n>EXPANSION</span> <span class=n>SERVICE</span>
<span class=n>ADDRESS</span><span class=o>></span><span
class=p>))</span></code></pre></div></div><h4 id=required-parameters-1>Required
parameters</h4><ul><li><p><code>server_name</code> Full Snowflake server name
with account, zone and domain.</p></li><li><p><code>schema</code> Name of the
Snowflake schema in the database to use.</p></li><li><p><code> [...]
Example:<div class=language-py><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=k>def</span> <span
class=nf>user_data_mapper</span><span class=p>(</span><span
class=n>user</span><span class=p>):</span>
<span class=k>return</span> <span class=p>[</span><span
class=n>user</span><span class=o>.</span><span class=n>name</span><span
class=p>,</span> <span class=nb>str</span><span class=p>(</span><span
class=n>user</span><span class=o>.</span><span class=n>age</span><span
class=p>)]</span></code></pre></div></div></p></li><li><p><code>table</code> or
<code>query</code> Specifies a Snowflake table name or custom SQL
query</p></li></ul><h4 id=authentication-parameters-1>Authentication para [...]
<span class=p>{</span>
diff --git
a/website/generated-content/documentation/runners/twister2/index.html
b/website/generated-content/documentation/runners/twister2/index.html
index c05fdd2..03cedb6 100644
--- a/website/generated-content/documentation/runners/twister2/index.html
+++ b/website/generated-content/documentation/runners/twister2/index.html
@@ -7,7 +7,7 @@ a Twister2 cluster either as a local deployment or distributed
deployment using,
Kubernetes, Slurm, etc.</p><p>The Twister2 runner is suitable for large scale
batch jobs, specially jobs that
require high performance, and provide.</p><ul><li>Batch pipeline
support.</li><li>Support for HPC environments, supports propriety interconnects
such as Infiniband.</li><li>Distributed massively parallel data processing
engine with high performance using
Bulk Synchronous Parallel (BSP) style execution.</li><li>Native support for
Beam side-inputs.</li></ul><p>The <a
href=/documentation/runners/capability-matrix/>Beam Capability Matrix</a>
documents the
-supported capabilities of the Jet Runner.</p><h2
id=running-wordcount-with-the-twister2-runner>Running WordCount with the
Twister2 Runner</h2><h3 id=generating-the-beam-examples-project>Generating the
Beam examples project</h3><p>Just follow the instruction from the <a
href=/get-started/quickstart-java/#get-the-wordcount-code>Java Quickstart
page</a></p><h3 id=running-wordcount-on-a-twister2-local-deployment>Running
WordCount on a Twister2 Local Deployment</h3><p>Issue following command [...]
+supported capabilities of the Twister2 Runner.</p><h2
id=running-wordcount-with-the-twister2-runner>Running WordCount with the
Twister2 Runner</h2><h3 id=generating-the-beam-examples-project>Generating the
Beam examples project</h3><p>Just follow the instruction from the <a
href=/get-started/quickstart-java/#get-the-wordcount-code>Java Quickstart
page</a></p><h3 id=running-wordcount-on-a-twister2-local-deployment>Running
WordCount on a Twister2 Local Deployment</h3><p>Issue following com [...]
-DskipTests \
-Dexec.mainClass=org.apache.beam.examples.WordCount \
-Dexec.args="\
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 6a19a7e..9ed1d40 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.24.0/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/blog/p
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.24.0/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-09-18T12:38:38-07:00</lastmod></url><url><loc>/blog/p
[...]
\ No newline at end of file