This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 6ca65070cf2 Publishing website 2024/02/22 23:37:08 at commit 11f9bce
6ca65070cf2 is described below
commit 6ca65070cf243fdd5cf0591ac6609a59a893c7d9
Author: runner <runner@main-runner-zt478-cqkl7>
AuthorDate: Thu Feb 22 23:37:08 2024 +0000
Publishing website 2024/02/22 23:37:08 at commit 11f9bce
---
.../sdks/yaml-inline-python/index.html | 128 ++++++++++++++++++++-
.../documentation/sdks/yaml/index.html | 8 +-
website/generated-content/sitemap.xml | 2 +-
3 files changed, 132 insertions(+), 6 deletions(-)
diff --git
a/website/generated-content/documentation/sdks/yaml-inline-python/index.html
b/website/generated-content/documentation/sdks/yaml-inline-python/index.html
index 86b5625fce2..5afa8335c32 100644
--- a/website/generated-content/documentation/sdks/yaml-inline-python/index.html
+++ b/website/generated-content/documentation/sdks/yaml-inline-python/index.html
@@ -36,8 +36,132 @@
<img class=banner-img-mobile
src=/images/banners/tour-of-beam/tour-of-beam-mobile.png alt="Start Tour of
Beam"></a></div><div class=swiper-slide><a
href=https://beam.apache.org/documentation/ml/overview/><img
class=banner-img-desktop
src=/images/banners/machine-learning/machine-learning-desktop.jpg alt="Machine
Learning">
<img class=banner-img-mobile
src=/images/banners/machine-learning/machine-learning-mobile.jpg alt="Machine
Learning"></a></div></div><div class=swiper-pagination></div></div><script
src=/js/swiper-bundle.min.min.e0e8f81b0b15728d35ff73c07f42ddbb17a108d6f23df4953cb3e60df7ade675.js></script>
<script
src=/js/sliders/top-banners.min.91104c476b3d8123ebee5ed9a8168556ec546abb698549551b38a0cee187ee1c.js></script>
-<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
-<a
href=https://beam.apache.org/documentation/sdks/yaml-inline-python/>https://beam.apache.org/documentation/sdks/yaml-inline-python/</a></p></div></div><footer
class=footer><div class=footer__contained><div class=footer__cols><div
class="footer__cols__col footer__cols__col__logos"><div
class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg
class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg
class=footer__logo a [...]
+<script>function showSearch(){addPlaceholder();var
e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function
addPlaceholder(){$("input:text").attr("placeholder","What are you looking
for?")}function endSearch(){var
e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function
blockScroll(){$("body").toggleClass(" [...]
+<code>PyTransform</code> type, simply referencing them by fully qualified name.
+For example,</p><pre tabindex=0><code>- type: PyTransform
+ config:
+ constructor: apache_beam.pkg.module.SomeTransform
+ args: [1, 'foo']
+ kwargs:
+ baz: 3
+</code></pre><p>will invoke the transform
<code>apache_beam.pkg.mod.SomeTransform(1, 'foo', baz=3)</code>.
+This fully qualified name can be any PTransform class or other callable that
+returns a PTransform. Note, however, that PTransforms that do not accept or
+return schema’d data may not be as useable to use from YAML.
+Restoring the schema-ness after a non-schema returning transform can be done
+by using the <code>callable</code> option on <code>MapToFields</code> which
takes the entire element
+as an input, e.g.</p><pre tabindex=0><code>- type: PyTransform
+ config:
+ constructor: apache_beam.pkg.module.SomeTransform
+ args: [1, 'foo']
+ kwargs:
+ baz: 3
+- type: MapToFields
+ config:
+ language: python
+ fields:
+ col1:
+ callable: 'lambda element: element.col1'
+ output_type: string
+ col2:
+ callable: 'lambda element: element.col2'
+ output_type: integer
+</code></pre><p>This can be used to call arbitrary transforms in the Beam SDK,
e.g.</p><pre tabindex=0><code>pipeline:
+ transforms:
+ - type: PyTransform
+ name: ReadFromTsv
+ input: {}
+ config:
+ constructor: apache_beam.io.ReadFromCsv
+ kwargs:
+ path: '/path/to/*.tsv'
+ sep: '\t'
+ skip_blank_lines: True
+ true_values: ['yes']
+ false_values: ['no']
+ comment: '#'
+ on_bad_lines: 'skip'
+ binary: False
+ splittable: False
+</code></pre><h2 id=defining-a-transform-inline-using-__constructor__>Defining
a transform inline using <code>__constructor__</code></h2><p>If the desired
transform does not exist, one can define it inline as well.
+This is done with the special <code>__constructor__</code> keywords,
+similar to how cross-language transforms are done.</p><p>With the
<code>__constuctor__</code> keyword, one defines a Python callable that, on
+invocation, <em>returns</em> the desired transform. The first argument (or
<code>source</code>
+keyword argument, if there are no positional arguments)
+is interpreted as the Python code. For example</p><pre tabindex=0><code>-
type: PyTransform
+ config:
+ constructor: __constructor__
+ kwargs:
+ source: |
+ import apache_beam as beam
+
+ def create_my_transform(inc):
+ return beam.Map(lambda x: beam.Row(a=x.col2 + inc))
+
+ inc: 10
+</code></pre><p>will apply <code>beam.Map(lambda x: beam.Row(a=x.col2 +
10))</code> to the incoming
+PCollection.</p><p>As a class object can be invoked as its own constructor,
this allows one to
+define a <code>beam.PTransform</code> inline, e.g.</p><pre tabindex=0><code>-
type: PyTransform
+ config:
+ constructor: __constructor__
+ kwargs:
+ source: |
+ class MyPTransform(beam.PTransform):
+ def __init__(self, inc):
+ self._inc = inc
+ def expand(self, pcoll):
+ return pcoll | beam.Map(lambda x: beam.Row(a=x.col2 + self._inc))
+
+ inc: 10
+</code></pre><p>which works exactly as one would expect.</p><h2
id=defining-a-transform-inline-using-__callable__>Defining a transform inline
using <code>__callable__</code></h2><p>The <code>__callable__</code> keyword
works similarly, but instead of defining a
+callable that returns an applicable <code>PTransform</code> one simply defines
the
+expansion to be performed as a callable. This is analogous to
BeamPython’s
+<code>ptransform.ptransform_fn</code> decorator.</p><p>In this case one can
simply write</p><pre tabindex=0><code>- type: PyTransform
+ config:
+ constructor: __callable__
+ kwargs:
+ source: |
+ def my_ptransform(pcoll, inc):
+ return pcoll | beam.Map(lambda x: beam.Row(a=x.col2 + inc))
+
+ inc: 10
+</code></pre><h1 id=external-transforms>External transforms</h1><p>One can
also invoke PTransforms define elsewhere via a <code>python</code> provider,
+for example</p><pre tabindex=0><code>pipeline:
+ transforms:
+ - ...
+ - type: MyTransform
+ config:
+ kwarg: whatever
+
+providers:
+ - ...
+ - type: python
+ input: ...
+ config:
+ packages:
+ - 'some_pypi_package>=version'
+ transforms:
+ MyTransform: 'pkg.module.MyTransform'
+</code></pre><p>These can be defined inline as well, with or without
dependencies, e.g.</p><pre tabindex=0><code>pipeline:
+ transforms:
+ - ...
+ - type: ToCase
+ input: ...
+ config:
+ upper: True
+
+providers:
+ - type: python
+ config: {}
+ transforms:
+ 'ToCase': |
+ @beam.ptransform_fn
+ def ToCase(pcoll, upper):
+ if upper:
+ return pcoll | beam.Map(lambda x: str(x).upper())
+ else:
+ return pcoll | beam.Map(lambda x: str(x).lower())
+</code></pre></div></div><footer class=footer><div
class=footer__contained><div class=footer__cols><div class="footer__cols__col
footer__cols__col__logos"><div class=footer__cols__col__logo><img
src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg
class=footer__logo alt="Apache logo"></div></div><div class=footer-wrapper><div
class=wrapper-grid><div class=footer__cols__col><div class=footer__c [...]
<a href=https://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div><div class="footer__cols__col
footer__cols__col__logos"><div class=footer__cols__col--group><div
class=footer__cols__col__logo><a href=https://github.com/apache/beam><im [...]
\ No newline at end of file
diff --git a/website/generated-content/documentation/sdks/yaml/index.html
b/website/generated-content/documentation/sdks/yaml/index.html
index a05b94675ac..ff48cf71c28 100644
--- a/website/generated-content/documentation/sdks/yaml/index.html
+++ b/website/generated-content/documentation/sdks/yaml/index.html
@@ -69,8 +69,10 @@ runner such as Flink or Dataflow.</p><p>Once the
prerequisites are installed, yo
in a yaml file as</p><pre tabindex=0><code>python -m apache_beam.yaml.main
--yaml_pipeline_file=/path/to/pipeline.yaml [other pipeline options such as the
runner]
</code></pre><p>You can do a dry-run of your pipeline using the render runner
to see what the
execution graph is, e.g.</p><pre tabindex=0><code>python -m
apache_beam.yaml.main --yaml_pipeline_file=/path/to/pipeline.yaml
--runner=apache_beam.runners.render.RenderRunner --render_output=out.png
[--render_port=0]
-</code></pre><p>(This requires <a
href=https://graphviz.org/download/>Graphviz</a> to be installed to render the
pipeline.)</p><p>We intend to support running a pipeline on Dataflow by
directly passing the
-yaml specification to a template, no local installation of the Beam SDKs
required.</p><h2 id=example-pipelines>Example pipelines</h2><p>Here is a simple
pipeline that reads some data from csv files and
+</code></pre><p>(This requires <a
href=https://graphviz.org/download/>Graphviz</a> to be installed to render the
pipeline.)</p><p>You can also submit a YAML pipeline directly by using the
Dataflow CLI command
+<a
href=https://cloud.google.com/sdk/gcloud/reference/beta/dataflow/yaml/run><code>gcloud
beta dataflow yaml run</code></a>.
+When you use the <code>gcloud</code> CLI, you don’t need to install the
Beam SDKs locally.</p><pre tabindex=0><code>gcloud beta dataflow yaml run
job_name --yaml-pipeline-file=/path/to/pipeline.yaml --region=europe-west1
+</code></pre><h2 id=example-pipelines>Example pipelines</h2><p>Here is a
simple pipeline that reads some data from csv files and
writes it out in json format.</p><pre tabindex=0><code>pipeline:
transforms:
- type: ReadFromCsv
@@ -463,7 +465,7 @@ providers:
- /path/to/local/package.zip
transforms:
MyCustomTransform: "pkg.subpkg.PTransformClassOrCallable"
-</code></pre><h2 id=other-resources>Other Resources</h2><ul><li><a
href=https://gist.github.com/robertwb/2cb26973f1b1203e8f5f8f88c5764da0>Example
pipelines</a></li><li><a
href=https://github.com/Polber/beam/tree/jkinard/bug-bash/sdks/python/apache_beam/yaml/examples>More
examples</a></li><li><a
href=https://gist.github.com/robertwb/64e2f51ff88320eeb6ffd96634202df7>Transform
glossary</a></li></ul><p>Additional documentation in this
directory</p><ul><li><a href=yaml_mapping.md>Mapping</a>< [...]
+</code></pre><h2 id=other-resources>Other Resources</h2><ul><li><a
href=https://gist.github.com/robertwb/2cb26973f1b1203e8f5f8f88c5764da0>Example
pipelines</a></li><li><a
href=https://github.com/Polber/beam/tree/jkinard/bug-bash/sdks/python/apache_beam/yaml/examples>More
examples</a></li><li><a
href=https://gist.github.com/robertwb/64e2f51ff88320eeb6ffd96634202df7>Transform
glossary</a></li></ul></div></div><footer class=footer><div
class=footer__contained><div class=footer__cols><div cl [...]
<a href=https://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div><div class="footer__cols__col
footer__cols__col__logos"><div class=footer__cols__col--group><div
class=footer__cols__col__logo><a href=https://github.com/apache/beam><im [...]
\ No newline at end of file
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index af889f77d2b..622e7a7e6ca 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.54.0/</loc><lastmod>2024-02-22T16:51:24+01:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2024-02-22T16:51:24+01:00</lastmod></url><url><loc>/blog/</loc><lastmod>2024-02-22T16:51:24+01:00</lastmod></url><url><loc>/categories/</loc><lastmod>2024-02-22T16:51:24+01:00</lastmod></url><url><loc>/catego
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.54.0/</loc><lastmod>2024-02-22T16:13:55-05:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2024-02-22T16:13:55-05:00</lastmod></url><url><loc>/blog/</loc><lastmod>2024-02-22T16:13:55-05:00</lastmod></url><url><loc>/categories/</loc><lastmod>2024-02-22T16:13:55-05:00</lastmod></url><url><loc>/catego
[...]
\ No newline at end of file