This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new c45b074 Commit build products c45b074 is described below commit c45b07486b481ea7264cc3fed5309ecbb778b2f6 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Tue Jul 29 22:42:20 2025 +0000 Commit build products --- blog/2025/06/09/metadata-handling/index.html | 21 ++++++++++++-------- .../tim-saucer-dewey-dunnington-andrew-lamb.html | 2 +- blog/category/blog.html | 2 +- blog/feed.xml | 2 +- blog/feeds/all-en.atom.xml | 23 +++++++++++++--------- blog/feeds/blog.atom.xml | 23 +++++++++++++--------- ...im-saucer-dewey-dunnington-andrew-lamb.atom.xml | 23 +++++++++++++--------- ...tim-saucer-dewey-dunnington-andrew-lamb.rss.xml | 2 +- blog/index.html | 2 +- 9 files changed, 60 insertions(+), 40 deletions(-) diff --git a/blog/2025/06/09/metadata-handling/index.html b/blog/2025/06/09/metadata-handling/index.html index 95655b5..8a3ecaf 100644 --- a/blog/2025/06/09/metadata-handling/index.html +++ b/blog/2025/06/09/metadata-handling/index.html @@ -62,13 +62,18 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which are user defined -data types. The data is stored using one of the existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the -metadata specifies how we are to interpret the stored data. These can be used for things -like specifying a currency on a floating point value, indicating that a fixed length -binary data is a UUID, or adding geometric information to a binary array.</p> -<p>The use of extension types was one of the primary motivations for adding metadata -to the function processing, but arbitrary metadata can be put on the input and +<p>Metadata is specified as a map of key-value pairs of strings. This extra metadata is used +by Arrow implementations implement <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a> and can also be used to add +use case-specific context to a column of values where the formality of an extension type +is not required. In previous versions of DataFusion field metadata was propagated through +certain operations (e.g., renaming or selecting a column) but was not accessible to others +(e.g., scalar, window, or aggregate function calls). In the new implementation, during +processing of all user defined functions we pass the input field information and allow +user defined function implementations to return field information to the caller.</p> +<p><a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">Extension types</a> are user defined data types where the data is stored using one of the +existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the metadata specifies how we are to interpret the +stored data. The use of extension types was one of the primary motivations for adding +metadata to the function processing, but arbitrary metadata can be put on the input and output fields. This allows for a range of other interesting use cases.</p> <h2>Why metadata handling is important</h2> <p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each @@ -226,7 +231,7 @@ context about columns in Arrow record batches.</p> forward in the ability to handle more interesting types of data. We can validate the input data matches not only the data types but also the intent of the data to be processed. We can enable complex operations on binary data because we understand the encoding used. We -can also use metadata to create new and interesting user defined data types.</p> +can also use metadata to create new and interesting user defined data types. </p> <h2>Get Involved</h2> <p>The DataFusion team is an active and engaging community and we would love to have you join us and help the project.</p> diff --git a/blog/author/tim-saucer-dewey-dunnington-andrew-lamb.html b/blog/author/tim-saucer-dewey-dunnington-andrew-lamb.html index 63a29a9..66deb91 100644 --- a/blog/author/tim-saucer-dewey-dunnington-andrew-lamb.html +++ b/blog/author/tim-saucer-dewey-dunnington-andrew-lamb.html @@ -47,7 +47,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p> </div><!-- /.entry-content --> +<p>Metadata is specified as a map of key-value pairs of strings. This …</p> </div><!-- /.entry-content --> </article></li> </ol><!-- /#posts-list --> </section><!-- /#content --> diff --git a/blog/category/blog.html b/blog/category/blog.html index 6455876..d7d0128 100644 --- a/blog/category/blog.html +++ b/blog/category/blog.html @@ -318,7 +318,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p> </div><!-- /.entry-content --> +<p>Metadata is specified as a map of key-value pairs of strings. This …</p> </div><!-- /.entry-content --> </article></li> <li><article class="hentry"> <header> <h2 class="entry-title"><a href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel="bookmark" title="Permalink to Apache DataFusion Comet 0.8.0 Release">Apache DataFusion Comet 0.8.0 Release</a></h2> </header> diff --git a/blog/feed.xml b/blog/feed.xml index 43540d4..1458ca2 100644 --- a/blog/feed.xml +++ b/blog/feed.xml @@ -208,7 +208,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer, Dewey Dunnington, Andrew Lamb</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</ca [...] +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer, Dewey Dunnington, Andrew Lamb</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/b [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index b9718d0..ef3b25f 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -2393,7 +2393,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></summary><content type="html"><!-- +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -2412,13 +2412,18 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which are user defined -data types. The data is stored using one of the existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the -metadata specifies how we are to interpret the stored data. These can be used for things -like specifying a currency on a floating point value, indicating that a fixed length -binary data is a UUID, or adding geometric information to a binary array.</p> -<p>The use of extension types was one of the primary motivations for adding metadata -to the function processing, but arbitrary metadata can be put on the input and +<p>Metadata is specified as a map of key-value pairs of strings. This extra metadata is used +by Arrow implementations implement <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a> and can also be used to add +use case-specific context to a column of values where the formality of an extension type +is not required. In previous versions of DataFusion field metadata was propagated through +certain operations (e.g., renaming or selecting a column) but was not accessible to others +(e.g., scalar, window, or aggregate function calls). In the new implementation, during +processing of all user defined functions we pass the input field information and allow +user defined function implementations to return field information to the caller.</p> +<p><a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">Extension types</a> are user defined data types where the data is stored using one of the +existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the metadata specifies how we are to interpret the +stored data. The use of extension types was one of the primary motivations for adding +metadata to the function processing, but arbitrary metadata can be put on the input and output fields. This allows for a range of other interesting use cases.</p> <h2>Why metadata handling is important</h2> <p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each @@ -2576,7 +2581,7 @@ context about columns in Arrow record batches.</p> forward in the ability to handle more interesting types of data. We can validate the input data matches not only the data types but also the intent of the data to be processed. We can enable complex operations on binary data because we understand the encoding used. We -can also use metadata to create new and interesting user defined data types.</p> +can also use metadata to create new and interesting user defined data types. </p> <h2>Get Involved</h2> <p>The DataFusion team is an active and engaging community and we would love to have you join us and help the project.</p> diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index 2288fdd..10a1e31 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -2393,7 +2393,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></summary><content type="html"><!-- +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -2412,13 +2412,18 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which are user defined -data types. The data is stored using one of the existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the -metadata specifies how we are to interpret the stored data. These can be used for things -like specifying a currency on a floating point value, indicating that a fixed length -binary data is a UUID, or adding geometric information to a binary array.</p> -<p>The use of extension types was one of the primary motivations for adding metadata -to the function processing, but arbitrary metadata can be put on the input and +<p>Metadata is specified as a map of key-value pairs of strings. This extra metadata is used +by Arrow implementations implement <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a> and can also be used to add +use case-specific context to a column of values where the formality of an extension type +is not required. In previous versions of DataFusion field metadata was propagated through +certain operations (e.g., renaming or selecting a column) but was not accessible to others +(e.g., scalar, window, or aggregate function calls). In the new implementation, during +processing of all user defined functions we pass the input field information and allow +user defined function implementations to return field information to the caller.</p> +<p><a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">Extension types</a> are user defined data types where the data is stored using one of the +existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the metadata specifies how we are to interpret the +stored data. The use of extension types was one of the primary motivations for adding +metadata to the function processing, but arbitrary metadata can be put on the input and output fields. This allows for a range of other interesting use cases.</p> <h2>Why metadata handling is important</h2> <p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each @@ -2576,7 +2581,7 @@ context about columns in Arrow record batches.</p> forward in the ability to handle more interesting types of data. We can validate the input data matches not only the data types but also the intent of the data to be processed. We can enable complex operations on binary data because we understand the encoding used. We -can also use metadata to create new and interesting user defined data types.</p> +can also use metadata to create new and interesting user defined data types. </p> <h2>Get Involved</h2> <p>The DataFusion team is an active and engaging community and we would love to have you join us and help the project.</p> diff --git a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml index 5aaa821..e221645 100644 --- a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml +++ b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.atom.xml @@ -18,7 +18,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></summary><content type="html"><!-- +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -37,13 +37,18 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which are user defined -data types. The data is stored using one of the existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the -metadata specifies how we are to interpret the stored data. These can be used for things -like specifying a currency on a floating point value, indicating that a fixed length -binary data is a UUID, or adding geometric information to a binary array.</p> -<p>The use of extension types was one of the primary motivations for adding metadata -to the function processing, but arbitrary metadata can be put on the input and +<p>Metadata is specified as a map of key-value pairs of strings. This extra metadata is used +by Arrow implementations implement <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a> and can also be used to add +use case-specific context to a column of values where the formality of an extension type +is not required. In previous versions of DataFusion field metadata was propagated through +certain operations (e.g., renaming or selecting a column) but was not accessible to others +(e.g., scalar, window, or aggregate function calls). In the new implementation, during +processing of all user defined functions we pass the input field information and allow +user defined function implementations to return field information to the caller.</p> +<p><a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">Extension types</a> are user defined data types where the data is stored using one of the +existing <a href="https://arrow.apache.org/docs/format/Columnar.html#data-types">Arrow data types</a> but the metadata specifies how we are to interpret the +stored data. The use of extension types was one of the primary motivations for adding +metadata to the function processing, but arbitrary metadata can be put on the input and output fields. This allows for a range of other interesting use cases.</p> <h2>Why metadata handling is important</h2> <p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each @@ -201,7 +206,7 @@ context about columns in Arrow record batches.</p> forward in the ability to handle more interesting types of data. We can validate the input data matches not only the data types but also the intent of the data to be processed. We can enable complex operations on binary data because we understand the encoding used. We -can also use metadata to create new and interesting user defined data types.</p> +can also use metadata to create new and interesting user defined data types. </p> <h2>Get Involved</h2> <p>The DataFusion team is an active and engaging community and we would love to have you join us and help the project.</p> diff --git a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.rss.xml b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.rss.xml index da47fce..5606b92 100644 --- a/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.rss.xml +++ b/blog/feeds/tim-saucer-dewey-dunnington-andrew-lamb.rss.xml @@ -18,4 +18,4 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer, Dewey Dunnington, Andrew Lamb</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</ca [...] \ No newline at end of file +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer, Dewey Dunnington, Andrew Lamb</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item></channel></rss> \ No newline at end of file diff --git a/blog/index.html b/blog/index.html index 05fdc47..21c06dc 100644 --- a/blog/index.html +++ b/blog/index.html @@ -413,7 +413,7 @@ limitations under the License. <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions which enables a variety of interesting improvements. Now users can access metadata on the input columns to functions and produce metadata in the output.</p> -<p>One use case for this metadata is to enable <a href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types">extension types</a>, which …</p></p> +<p>Metadata is specified as a map of key-value pairs of strings. This …</p></p> <footer> <ul class="actions"> <div style="text-align: right"><a href="/blog/2025/06/09/metadata-handling" class="button medium">Continue Reading</a></div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org