This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new b5db0e5368d Publishing website 2025/07/16 17:46:04 at commit 84d423f b5db0e5368d is described below commit b5db0e5368df00e42f699ee7118346f3f038e45b Author: runner <runner@main-runner-frrkx-7cnqr> AuthorDate: Wed Jul 16 17:46:05 2025 +0000 Publishing website 2025/07/16 17:46:04 at commit 84d423f --- .../extensions/create-external-table/index.html | 63 +++++++++++++++++++++- website/generated-content/sitemap.xml | 2 +- 2 files changed, 62 insertions(+), 3 deletions(-) diff --git a/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html b/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html index c562b194beb..1c5a5122d8a 100644 --- a/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html +++ b/website/generated-content/documentation/dsls/sql/extensions/create-external-table/index.html @@ -35,7 +35,7 @@ <img class=banner-img-mobile src=/images/banners/tour-of-beam/tour-of-beam-mobile.png alt="Start Tour of Beam"></a></div><div class=swiper-slide><a href=https://beam.apache.org/documentation/ml/overview/><img class=banner-img-desktop src=/images/banners/machine-learning/machine-learning-desktop.jpg alt="Machine Learning"> <img class=banner-img-mobile src=/images/banners/machine-learning/machine-learning-mobile.jpg alt="Machine Learning"></a></div></div><div class=swiper-pagination></div><div class=swiper-button-prev></div><div class=swiper-button-next></div></div><script src=/js/swiper-bundle.min.min.e0e8f81b0b15728d35ff73c07f42ddbb17a108d6f23df4953cb3e60df7ade675.js></script> <script src=/js/sliders/top-banners.min.afa7d0a19acf7a3b28ca369490b3d401a619562a2a4c9612577be2f66a4b9855.js></script> -<script>function showSearch(){addPlaceholder();var e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function addPlaceholder(){$("input:text").attr("placeholder","What are you looking for?")}function endSearch(){var e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function blockScroll(){$("body").toggleClass(" [...] +<script>function showSearch(){addPlaceholder();var e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function addPlaceholder(){$("input:text").attr("placeholder","What are you looking for?")}function endSearch(){var e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function blockScroll(){$("body").toggleClass(" [...] <a href=/documentation/io/built-in/>external storage system</a>. For some storage systems, <code>CREATE EXTERNAL TABLE</code> does not create a physical table until a write occurs. After the physical table exists, you can access the table with @@ -256,7 +256,66 @@ See the following table:</li></ul></li></ul><div class=table-container-wrapper>< types specified in the schema using org.apache.commons.csv.</li></ul></li></ul><h3 id=schema-5>Schema</h3><p>Only simple types are supported.</p><h3 id=example-6>Example</h3><pre tabindex=0><code>CREATE EXTERNAL TABLE orders (id INTEGER, price INTEGER) TYPE text LOCATION '/home/admin/orders' -</code></pre><h2 id=generic-payload-handling>Generic Payload Handling</h2><p>Certain data sources and sinks support generic payload handling. This handling +</code></pre><h2 id=datagen>DataGen</h2><p>The <strong>DataGen</strong> connector allows for creating tables based on in-memory data generation. This is useful for developing and testing queries locally without requiring access to external systems. The DataGen connector is built-in; no additional dependencies are required.It is available for Beam 2.67.0+</p><p>Tables can be either <strong>bounded</strong> (generating a fixed number of rows) or <strong>unbounded</strong> (generating a str [...] +</span></span></span><span class=line><span class=cl><span class=w></span><span class=k>TYPE</span><span class=w> </span><span class=n>datagen</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=p>[</span><span class=n>TBLPROPERTIES</span><span class=w> </span><span class=n>tblProperties</span><span class=p>]</span><span class=w> +</span></span></span></code></pre></div><h3 id=table-properties-tblproperties>Table Properties (<code>TBLPROPERTIES</code>)</h3><p>The <code>TBLPROPERTIES</code> JSON object is used to configure the generator’s behavior.</p><h4 id=general-options>General Options</h4><table><thead><tr><th style=text-align:left>Key</th><th style=text-align:left>Required</th><th style=text-align:left>Description</th></tr></thead><tbody><tr><td style=text-align:left><code>number-of-rows</code></td><td [...] +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>id</span><span class=w> </span><span class=nb>BIGINT</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>product_name</span><span class=w> </span><span class=nb>VARCHAR</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=p>)</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=k>TYPE</span><span class=w> </span><span class=n>datagen</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=n>TBLPROPERTIES</span><span class=w> </span><span class=s1>'{ +</span></span></span><span class=line><span class=cl><span class=s1> "number-of-rows": "1000" +</span></span></span><span class=line><span class=cl><span class=s1>}'</span><span class=w> +</span></span></span></code></pre></div><h4 id=unbounded-streaming-table>Unbounded Streaming Table</h4><p>This example creates a streaming table that generates 10 rows per second.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sql data-lang=sql><span class=line><span class=cl><span class=k>CREATE</span><span class=w> </span><span class=k>EXTERNAL</span><span class=w> </span><span class=k>TABLE</span><span class=w> </span><span class=n>user_impressions</span><sp [...] +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>user_id</span><span class=w> </span><span class=nb>VARCHAR</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>impression_time</span><span class=w> </span><span class=k>TIMESTAMP</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=p>)</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=k>TYPE</span><span class=w> </span><span class=n>datagen</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=n>TBLPROPERTIES</span><span class=w> </span><span class=s1>'{ +</span></span></span><span class=line><span class=cl><span class=s1> "rows-per-second": "10" +</span></span></span><span class=line><span class=cl><span class=s1>}'</span><span class=w> +</span></span></span></code></pre></div><hr><h4 id=bounded-table-with-custom-field-generation>Bounded Table with Custom Field Generation</h4><p>This is a comprehensive example demonstrating various field-level customizations. The table is bounded because a sequence generator is used.</p><div class=highlight><pre tabindex=0 class=chroma><code class=language-sql data-lang=sql><span class=line><span class=cl><span class=k>CREATE</span><span class=w> </span><span class=k>EXTERNAL</span><span [...] +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>event_id</span><span class=w> </span><span class=nb>BIGINT</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>user_id</span><span class=w> </span><span class=nb>VARCHAR</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>click_timestamp</span><span class=w> </span><span class=k>TIMESTAMP</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>score</span><span class=w> </span><span class=n>DOUBLE</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=p>)</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=k>TYPE</span><span class=w> </span><span class=s1>'datagen'</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=n>TBLPROPERTIES</span><span class=w> </span><span class=s1>'{ +</span></span></span><span class=line><span class=cl><span class=s1> "number-of-rows": "1000000", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.kind": "sequence", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.start": "1", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.end": "1000000", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.user_id.kind": "random", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.user_id.length": "12", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.click_timestamp.kind": "random", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.click_timestamp.max-past": "60000", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.score.kind": "random", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.score.min": "0.0", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.score.max": "1.0", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.score.null-rate": "0.1" +</span></span></span><span class=line><span class=cl><span class=s1>}'</span><span class=w> +</span></span></span></code></pre></div><h4 id=unbounded-streaming-table-with-event-time>Unbounded Streaming Table with Event Time</h4><p>This example creates a streaming table that generates 10 rows per second. It uses the <code>click_timestamp</code> column to drive the event-time watermark, allowing for up to 5 seconds of out-of-order data. The <code>ingestion_timestamp</code> column is populated separately with the processing time.</p><div class=highlight><pre tabindex=0 class=chroma [...] +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>event_id</span><span class=w> </span><span class=nb>BIGINT</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>user_id</span><span class=w> </span><span class=nb>VARCHAR</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>click_timestamp</span><span class=w> </span><span class=k>TIMESTAMP</span><span class=p>,</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=n>ingestion_timestamp</span><span class=w> </span><span class=k>TIMESTAMP</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=p>)</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=k>TYPE</span><span class=w> </span><span class=s1>'datagen'</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w></span><span class=n>TBLPROPERTIES</span><span class=w> </span><span class=s1>'{ +</span></span></span><span class=line><span class=cl><span class=s1> "rows-per-second": "10", +</span></span></span><span class=line><span class=cl><span class=s1> "timestamp.behavior": "event-time", +</span></span></span><span class=line><span class=cl><span class=s1> "event-time.timestamp-column": "click_timestamp", +</span></span></span><span class=line><span class=cl><span class=s1> "event-time.max-out-of-orderness": "5000", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.kind": "sequence", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.start": "1", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.event_id.end": "1000000", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.user_id.kind": "random", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.user_id.length": "12", +</span></span></span><span class=line><span class=cl><span class=s1> "fields.ingestion_timestamp.kind": "timestamp" +</span></span></span><span class=line><span class=cl><span class=s1>}'</span><span class=w> +</span></span></span></code></pre></div><h2 id=generic-payload-handling>Generic Payload Handling</h2><p>Certain data sources and sinks support generic payload handling. This handling parses a byte array payload field into a table schema. The following schemas are supported by this handling. All require at least setting <code>"format": "<type>"</code>, and may require other properties.</p><ul><li><code>avro</code>: Avro<ul><li>An Avro schema is automatically generated from the specified field diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index 8441193e28c..e3ef8e8b1c2 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2025-07-16T07:22:40-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2025-07-16T07:22:40-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2025-07-16T07:22:40-04:00</lastmod></url><url><loc>/blog/beam-summit-2025-hackathon-pcollectors-blog/</loc><lastmod>2025-07-16T07:22:40-04:00< [...] \ No newline at end of file +<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2025-07-16T12:42:51-04:00</lastmod></url><url><loc>/blog/</loc><lastmod>2025-07-16T12:42:51-04:00</lastmod></url><url><loc>/categories/</loc><lastmod>2025-07-16T12:42:51-04:00</lastmod></url><url><loc>/blog/beam-summit-2025-hackathon-pcollectors-blog/</loc><lastmod>2025-07-16T12:42:51-04:00< [...] \ No newline at end of file