This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new d90bb3bc550 Publishing website 2023/07/11 22:16:02 at commit 33518fb
d90bb3bc550 is described below
commit d90bb3bc5504632b0875bc75b703150a6efd9b19
Author: jenkins <[email protected]>
AuthorDate: Tue Jul 11 22:16:02 2023 +0000
Publishing website 2023/07/11 22:16:02 at commit 33518fb
---
website/generated-content/contribute/index.xml | 5 +++-
.../contribute/release-guide/index.html | 6 +++--
.../io/built-in/google-bigquery/index.html | 28 ++++++++++++++++++----
website/generated-content/sitemap.xml | 2 +-
4 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/website/generated-content/contribute/index.xml
b/website/generated-content/contribute/index.xml
index 0af6e7da522..47ea65bedc0 100644
--- a/website/generated-content/contribute/index.xml
+++ b/website/generated-content/contribute/index.xml
@@ -1201,6 +1201,7 @@ You don&rsquo;t need to wait for the action to
complete to start running the
<ol>
<li>Clone the repo at the selected RC tag.</li>
<li>Run gradle publish to push java artifacts into Maven staging
repo.</li>
+<li>Stage SDK docker images to <a
href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">docker
hub Apache organization</a>.</li>
</ol>
</li>
</ul>
@@ -1235,9 +1236,11 @@ Some additional validation should be done during the rc
validation step.</li>
<p><strong>The script will:</strong></p>
<ol>
<li>Clone the repo at the selected RC tag.</li>
-<li>Stage source release into dist.apache.org dev <a
href="https://dist.apache.org/repos/dist/dev/beam/">repo</a>.</li>
+<li>Stage source release into dist.apache.org dev <a
href="https://dist.apache.org/repos/dist/dev/beam/">repo</a>.
+Skip this step if you already did it with the build_release_candidate GitHub
Actions workflow.</li>
<li>Stage, sign and hash python source distribution and wheels into
dist.apache.org dev repo python dir</li>
<li>Stage SDK docker images to <a
href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">docker
hub Apache organization</a>.
+Skip this step if you already did it with the build_release_candidate GitHub
Actions workflow.
Note: if you are not a member of the <a
href="https://hub.docker.com/orgs/apache/teams/beam"><code>beam</code>
DockerHub team</a> you will need
help with this step. Please email <code>dev@</code> and ask a member of
the <code>beam</code> DockerHub team for help.</li>
<li>Create a PR to update beam-site, changes includes:
diff --git a/website/generated-content/contribute/release-guide/index.html
b/website/generated-content/contribute/release-guide/index.html
index 31108ece97c..28af8361a77 100644
--- a/website/generated-content/contribute/release-guide/index.html
+++ b/website/generated-content/contribute/release-guide/index.html
@@ -142,12 +142,14 @@ The final state of the repository should match this
diagram:</p><p><img src=/ima
adjust the version, and add the tag locally. If it looks good, run it again
with <code>--push-tag</code>.
If you already have a clone that includes the <code>${COMMIT_REF}</code> then
you can omit <code>--clone</code>. This
is perfectly safe since the script does not depend on the current working
tree.</p><p>See the source of the script for more details, or to run commands
manually in case of a problem.</p><h3
id=run-build_release_candidate-github-action-to-create-a-release-candidate>Run
build_release_candidate GitHub Action to create a release
candidate</h3><p>Note: This step is partially automated (in progress), so part
of the rc creation is done by GitHub Actions and the rest is done by a script.
-You don’t need to wait for the action to complete to start running the
script.</p><ul><li><p><strong>Action</strong> <a
href=https://github.com/apache/beam/actions/workflows/build_release_candidate.yml>build_release_candidate</a>
(click <code>run workflow</code>)</p></li><li><p><strong>The script
will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Run
gradle publish to push java artifacts into Maven staging
repo.</li></ol></li></ul><h4 id=tasks-you-need-to-do-m [...]
+You don’t need to wait for the action to complete to start running the
script.</p><ul><li><p><strong>Action</strong> <a
href=https://github.com/apache/beam/actions/workflows/build_release_candidate.yml>build_release_candidate</a>
(click <code>run workflow</code>)</p></li><li><p><strong>The script
will:</strong></p><ol><li>Clone the repo at the selected RC tag.</li><li>Run
gradle publish to push java artifacts into Maven staging repo.</li><li>Stage
SDK docker images to <a href="http [...]
They should contain all relevant parts for each module, including
<code>pom.xml</code>, jar, test jar, javadoc, etc.
Artifact names should follow <a
href=https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22>the
existing format</a> in which artifact name mirrors directory structure, e.g.,
<code>beam-sdks-java-io-kafka</code>.
Carefully review any new artifacts.
Some additional validation should be done during the rc validation
step.</li></ol></li></ol><h3
id=run-build_release_candidatesh-to-create-a-release-candidate>Run
build_release_candidate.sh to create a release
candidate</h3><ul><li><p><strong>Script:</strong> <a
href=https://github.com/apache/beam/blob/master/release/src/main/scripts/build_release_candidate.sh>build_release_candidate.sh</a></p></li><li><p><strong>Usage</strong></p><pre><code>./beam/release/src/main/scripts/build_release_
[...]
-</code></pre></li><li><p><strong>The script will:</strong></p><ol><li>Clone
the repo at the selected RC tag.</li><li>Stage source release into
dist.apache.org dev <a
href=https://dist.apache.org/repos/dist/dev/beam/>repo</a>.</li><li>Stage, sign
and hash python source distribution and wheels into dist.apache.org dev repo
python dir</li><li>Stage SDK docker images to <a
href="https://hub.docker.com/search?q=apache%2Fbeam&type=image">docker hub
Apache organization</a>.
+</code></pre></li><li><p><strong>The script will:</strong></p><ol><li>Clone
the repo at the selected RC tag.</li><li>Stage source release into
dist.apache.org dev <a
href=https://dist.apache.org/repos/dist/dev/beam/>repo</a>.
+Skip this step if you already did it with the build_release_candidate GitHub
Actions workflow.</li><li>Stage, sign and hash python source distribution and
wheels into dist.apache.org dev repo python dir</li><li>Stage SDK docker images
to <a href="https://hub.docker.com/search?q=apache%2Fbeam&type=image">docker
hub Apache organization</a>.
+Skip this step if you already did it with the build_release_candidate GitHub
Actions workflow.
Note: if you are not a member of the <a
href=https://hub.docker.com/orgs/apache/teams/beam><code>beam</code> DockerHub
team</a> you will need
help with this step. Please email <code>dev@</code> and ask a member of the
<code>beam</code> DockerHub team for help.</li><li>Create a PR to update
beam-site, changes includes:<ul><li>Copy python doc into beam-site</li><li>Copy
java doc into beam-site</li><li><strong>NOTE</strong>: Do not merge this PR
until after an RC has been approved (see “Finalize the
Release”).</li></ul></li></ol></li></ul><h4
id=tasks-you-need-to-do-manually-1>Tasks you need to do manually</h4><ol><li
[...]
Please note that dependencies for the SDKs with different Python versions vary.
diff --git
a/website/generated-content/documentation/io/built-in/google-bigquery/index.html
b/website/generated-content/documentation/io/built-in/google-bigquery/index.html
index 896c92d6904..5d99d128df3 100644
---
a/website/generated-content/documentation/io/built-in/google-bigquery/index.html
+++
b/website/generated-content/documentation/io/built-in/google-bigquery/index.html
@@ -327,7 +327,11 @@ GitHub</a>.</p><div class="language-java snippet"><div
class="notebook-skip code
<span class=k>return</span> <span class=n>rows</span><span class=o>;</span>
<span class=o>}</span>
-<span class=o>}</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=c1># The SDK for Python does not
support the BigQuery Storage API.</span></code></pre></div></div></div><p>The
following code snippet reads wit [...]
+<span class=o>}</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=n>max_temperatures</span> <span
class=o>=</span> <span class=p>(</span>
+ <span class=n>pipeline</span>
+ <span class=o>|</span> <span
class=s1>'ReadTableWithStorageAPI'</span> <span class=o>>></span>
<span class=n>beam</span><span class=o>.</span><span class=n>io</span><span
class=o>.</span><span class=n>ReadFromBigQuery</span><span class=p>(</span>
+ <span class=n>table</span><span class=o>=</span><span
class=n>table_spec</span><span class=p>,</span> <span
class=n>method</span><span class=o>=</span><span class=n>beam</span><span
class=o>.</span><span class=n>io</span><span class=o>.</span><span
class=n>ReadFromBigQuery</span><span class=o>.</span><span
class=n>Method</span><span class=o>.</span><span
class=n>DIRECT_READ</span><span class=p>)</span>
+ <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>Map</span><span class=p>(</span><span
class=k>lambda</span> <span class=n>elem</span><span class=p>:</span> <span
class=n>elem</span><span class=p>[</span><span
class=s1>'max_temperature'</span><span
class=p>]))</span></code></pre></div></div></div><p>The following code snippet
reads with a query string.</p><div class="language-java snippet"><div
class="notebook-skip code-snippet"><a class=cop [...]
<span class=kn>import</span> <span
class=nn>org.apache.beam.sdk.Pipeline</span><span class=o>;</span>
<span class=kn>import</span> <span
class=nn>org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO</span><span
class=o>;</span>
<span class=kn>import</span> <span
class=nn>org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method</span><span
class=o>;</span>
@@ -623,7 +627,7 @@ be replaced.</p><div class="language-java snippet"><div
class="notebook-skip cod
You can either keep retrying, or return the failed records in a separate
<code>PCollection</code> using the <code>WriteResult.getFailedInserts()</code>
method.</p><h3 id=storage-write-api>Using the Storage Write API</h3><p>Starting
with version 2.36.0 of the Beam SDK for Java, you can use the
<a href=https://cloud.google.com/bigquery/docs/write-api>BigQuery Storage
Write API</a>
-from the BigQueryIO connector.</p><h4 id=exactly-once-semantics>Exactly-once
semantics</h4><p>To write to BigQuery using the Storage Write API, set
<code>withMethod</code> to
+from the BigQueryIO connector.</p><p>Also after version 2.47.0 of Beam SDK for
Python, SDK supports BigQuery Storage Write API.</p><p
class=language-py>BigQuery Storage Write API for Python SDK currently has some
limitations on supported data types. As this method makes use of cross-language
transforms, we are limited to the types supported at the cross-language
boundary. For example, <code>apache_beam.utils.timestamp.Timestamp</code> is
needed to write a <code>TIMESTAMP</code> BigQuery [...]
<a
href=https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#STORAGE_WRITE_API><code>Method.STORAGE_WRITE_API</code></a>.
Here’s an example transform that writes to BigQuery using the Storage Write
API and exactly-once semantics:</p><p><div class="language-java snippet"><div
class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-java data-lang=java><span class=n>WriteResult</span> <span
class=n>writeResult</span> <span class=o>=</span> [...]
<span class=n>BigQueryIO</span><span class=o>.</span><span
class=na>writeTableRows</span><span class=o>()</span>
@@ -631,7 +635,10 @@ Here’s an example transform that writes to BigQuery using
the Storage Write AP
<span class=o>.</span><span class=na>withWriteDisposition</span><span
class=o>(</span><span class=n>WriteDisposition</span><span
class=o>.</span><span class=na>WRITE_APPEND</span><span class=o>)</span>
<span class=o>.</span><span class=na>withCreateDisposition</span><span
class=o>(</span><span class=n>CreateDisposition</span><span
class=o>.</span><span class=na>CREATE_NEVER</span><span class=o>)</span>
<span class=o>.</span><span class=na>withMethod</span><span
class=o>(</span><span class=n>Method</span><span class=o>.</span><span
class=na>STORAGE_WRITE_API</span><span class=o>)</span>
-<span class=o>);</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=c1># The SDK for Python does not
support the BigQuery Storage
API.</span></code></pre></div></div></div></p><p>If you want to change the
behav [...]
+<span class=o>);</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=n>quotes</span> <span
class=o>|</span> <span class=s2>"WriteTableWithStorageAPI"</span> <span
class=o>>></span> <span class=n>be [...]
+ <span class=n>table_spec</span><span class=p>,</span>
+ <span class=n>schema</span><span class=o>=</span><span
class=n>table_schema</span><span class=p>,</span>
+ <span class=n>method</span><span class=o>=</span><span
class=n>beam</span><span class=o>.</span><span class=n>io</span><span
class=o>.</span><span class=n>WriteToBigQuery</span><span class=o>.</span><span
class=n>Method</span><span class=o>.</span><span
class=n>STORAGE_WRITE_API</span><span
class=p>)</span></code></pre></div></div></div></p><p>If you want to change the
behavior of BigQueryIO so that all the BigQuery sinks
for your pipeline use the Storage Write API by default, set the
<a
href=https://github.com/apache/beam/blob/2c18ce0ccd7705473aa9ecc443dcdbe223dd9449/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L82-L86><code>UseStorageWriteApi</code>
option</a>.</p><p>If your pipeline needs to create the table (in case it
doesn’t exist and you
specified the create disposition as <code>CREATE_IF_NEEDED</code>), you must
provide a
@@ -645,12 +652,23 @@ binary protocol.</p><p><div class="language-java
snippet"><div class="notebook-s
<span class=k>new</span> <span
class=n>TableFieldSchema</span><span class=o>()</span>
<span class=o>.</span><span class=na>setName</span><span
class=o>(</span><span class=s>"user_name"</span><span class=o>)</span>
<span class=o>.</span><span class=na>setType</span><span
class=o>(</span><span class=s>"STRING"</span><span class=o>)</span>
- <span class=o>.</span><span class=na>setMode</span><span
class=o>(</span><span class=s>"REQUIRED"</span><span
class=o>)));</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=c1># The SDK [...]
+ <span class=o>.</span><span class=na>setMode</span><span
class=o>(</span><span class=s>"REQUIRED"</span><span
class=o>)));</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=n>table_sche [...]
+ <span class=s1>'fields'</span><span class=p>:</span> <span
class=p>[{</span>
+ <span class=s1>'name'</span><span class=p>:</span> <span
class=s1>'source'</span><span class=p>,</span> <span
class=s1>'type'</span><span class=p>:</span> <span
class=s1>'STRING'</span><span class=p>,</span> <span
class=s1>'mode'</span><span class=p>:</span> <span
class=s1>'NULLABLE'</span>
+ <span class=p>},</span> <span class=p>{</span>
+ <span class=s1>'name'</span><span class=p>:</span> <span
class=s1>'quote'</span><span class=p>,</span> <span
class=s1>'type'</span><span class=p>:</span> <span
class=s1>'STRING'</span><span class=p>,</span> <span
class=s1>'mode'</span><span class=p>:</span> <span
class=s1>'REQUIRED'</span>
+ <span class=p>}]</span>
+<span class=p>}</span></code></pre></div></div></div></p><p>For streaming
pipelines, you need to set two additional parameters: the number
of streams and the triggering frequency.</p><p><div class="language-java
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-java data-lang=java><span class=n>BigQueryIO</span><span
class=o>.</span><span class=na>writeTableRows</span><span class=o>()</span>
<span class=c1>// ...
</span><span class=c1></span> <span class=o>.</span><span
class=na>withTriggeringFrequency</span><span class=o>(</span><span
class=n>Duration</span><span class=o>.</span><span
class=na>standardSeconds</span><span class=o>(</span><span
class=n>5</span><span class=o>))</span>
<span class=o>.</span><span
class=na>withNumStorageWriteApiStreams</span><span class=o>(</span><span
class=n>3</span><span class=o>)</span>
-<span class=o>);</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=c1># The SDK for Python does not
support the BigQuery Storage
API.</span></code></pre></div></div></div></p><p>The number of streams defines
t [...]
+<span class=o>);</span></code></pre></div></div></div><div class="language-py
snippet"><div class="notebook-skip code-snippet"><a class=copy type=button
data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img
src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code
class=language-py data-lang=py><span class=c1># The Python SDK doesn't
currently support setting the number of write streams</span>
+<span class=n>quotes</span> <span class=o>|</span> <span
class=s2>"StorageWriteAPIWithFrequency"</span> <span
class=o>>></span> <span class=n>beam</span><span class=o>.</span><span
class=n>io</span><span class=o>.</span><span
class=n>WriteToBigQuery</span><span class=p>(</span>
+ <span class=n>table_spec</span><span class=p>,</span>
+ <span class=n>schema</span><span class=o>=</span><span
class=n>table_schema</span><span class=p>,</span>
+ <span class=n>method</span><span class=o>=</span><span
class=n>beam</span><span class=o>.</span><span class=n>io</span><span
class=o>.</span><span class=n>WriteToBigQuery</span><span class=o>.</span><span
class=n>Method</span><span class=o>.</span><span
class=n>STORAGE_WRITE_API</span><span class=p>,</span>
+ <span class=n>triggering_frequency</span><span class=o>=</span><span
class=mi>5</span><span
class=p>)</span></code></pre></div></div></div></p><p>The number of streams
defines the parallelism of the BigQueryIO Write transform
and roughly corresponds to the number of Storage Write API streams that the
pipeline uses. You can set it explicitly on the transform via
<a
href=https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withNumStorageWriteApiStreams-int-><code>withNumStorageWriteApiStreams</code></a>
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 75596dc1504..43a9b580b16 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod></url><url><loc>/blog/managing-beam-dependencies-in-java/</loc><lastmod>2023-07-11T11:31:17-04:00</lastmod>
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod></url><url><loc>/blog/managing-beam-dependencies-in-java/</loc><lastmod>2023-07-11T15:34:39-04:00</lastmod>
[...]
\ No newline at end of file