This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new cf25209  Commit build products
cf25209 is described below

commit cf252093d678a2c8956efbaff7e0699879684252
Author: Build Pelican (action) <[email protected]>
AuthorDate: Tue Sep 23 17:11:48 2025 +0000

    Commit build products
---
 .../09/21/custom-types-using-metadata/index.html   | 370 +++++++++++++++++++++
 ...-dunningtonwherobots-andrew-lambinfluxdata.html |  64 ++++
 output/category/blog.html                          |  32 ++
 output/feed.xml                                    |  24 +-
 output/feeds/all-en.atom.xml                       | 272 ++++++++++++++-
 output/feeds/blog.atom.xml                         | 272 ++++++++++++++-
 ...ningtonwherobots-andrew-lambinfluxdata.atom.xml | 272 +++++++++++++++
 ...nningtonwherobots-andrew-lambinfluxdata.rss.xml |  24 ++
 .../metadata-handling/arrow_record_batch.png       | Bin 0 -> 224968 bytes
 output/index.html                                  |  41 +++
 10 files changed, 1368 insertions(+), 3 deletions(-)

diff --git a/output/2025/09/21/custom-types-using-metadata/index.html 
b/output/2025/09/21/custom-types-using-metadata/index.html
new file mode 100644
index 0000000..816136c
--- /dev/null
+++ b/output/2025/09/21/custom-types-using-metadata/index.html
@@ -0,0 +1,370 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="x-ua-compatible" content="ie=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Implementing User Defined Types and Custom Metadata in DataFusion - 
Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<link href="/blog/css/app.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>  </head>
+  <body class="d-flex flex-column h-100">
+  <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>    
+<!-- article contents -->
+<div id="contents">
+  <div class="bg-white p-4 p-md-5 rounded">
+    <div class="row justify-content-center">
+      <div class="col-12 col-md-8 main-content">
+        <h1>
+          Implementing User Defined Types and Custom Metadata in DataFusion
+        </h1>
+        <p>Posted on: Sun 21 September 2025 by Tim Saucer(rerun.io), Dewey 
Dunnington(Wherobots), Andrew Lamb(InfluxData)</p>
+
+        <aside class="toc-container d-md-none mb-2">
+          <div class="toc"><span class="toctitle">Contents</span><ul>
+<li><a href="#user-defined-types-extension-types">User defined types == 
extension types</a></li>
+<li><a href="#metadata-in-apache-arrow-fields">Metadata in Apache Arrow 
Fields</a></li>
+<li><a href="#metadata-handling">Metadata handling</a></li>
+<li><a href="#how-to-use-metadata-in-user-defined-functions">How to use 
metadata in user defined functions</a></li>
+<li><a href="#extension-types">Extension types</a></li>
+<li><a href="#other-use-cases">Other use cases</a></li>
+<li><a href="#acknowledgements">Acknowledgements</a></li>
+<li><a href="#conclusion">Conclusion</a></li>
+<li><a href="#get-involved">Get Involved</a></li>
+</ul>
+</div>
+        </aside>
+
+        <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+
+<p><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/";>Apache 
DataFusion</a> significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.</p>
+<h2 id="user-defined-types-extension-types">User defined types == extension 
types<a class="headerlink" href="#user-defined-types-extension-types" 
title="Permanent link">¶</a></h2>
+<p>DataFusion directly uses <a href="https://arrow.apache.org";>Apache 
Arrow</a>'s <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html";>DataTypes</a>
 as its type system. This
+has several benefits including being simple to explain, supports a rich set of
+both scalar and nested types, true zero copy interoperability with other Arrow
+implementations, and world-class library support (via <a 
href="https://github.com/apache/arrow-rs";>arrow-rs</a>). However, one
+challenge of directly using the Arrow type system is there is no distinction
+between logical types and physical types. For example, the Arrow type system
+contains multiple types which can store "String"s (sequences of UTF8 encoded
+bytes) such as <code>Utf8</code>, <code>LargeUTF8</code>, 
<code>Dictionary(Utf8)</code>, and <code>Utf8View</code>. </p>
+<p>However, Apache Arrow does provide <a 
href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types";>extension
 types</a>, a version of logical type
+information, which describe how to interpret data stored in one of the existing
+physical types. With the improved support for metadata in DataFusion 48.0.0, it
+is now easier to implement user defined types using Arrow extension types.</p>
+<h2 id="metadata-in-apache-arrow-fields">Metadata in Apache Arrow 
<code>Field</code>s<a class="headerlink" 
href="#metadata-in-apache-arrow-fields" title="Permanent link">¶</a></h2>
+<p>The <a href="https://arrow.apache.org/docs/format/Columnar.html";>Arrow 
specification</a> defines Metadata as a map of key-value pairs of
+strings. This metadata is used to attach extension types and use case-specific
+context to a column of values. The Rust implementation of Apache Arrow,
+<a href="https://github.com/apache/arrow-rs";>arrow-rs</a>, stores metadata on 
<a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field";>Field</a>s,
 but prior to DataFusion 48.0.0, many of
+DataFusion's internal APIs used <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html";>DataTypes</a>
 directly, and thus did not propagate
+metadata through all operations.</p>
+<p>In previous versions of DataFusion <code>Field</code> metadata was 
propagated through certain
+operations (e.g., renaming or selecting a column) but was not 
+others (e.g., scalar, window, or aggregate function calls). In DataFusion 
48.0.0, 
+and later, all user defined functions are passed the full
+input <code>Field</code> information and can return <code>Field</code> 
information to the caller.</p>
+<p>Supporting extension types was a key motivation for adding metadata to the
+function processing, the same mechanism can store arbitrary metadata on the
+input and output fields, which supports other interesting use cases as we 
describe
+later in this post.</p>
+<h2 id="metadata-handling">Metadata handling<a class="headerlink" 
href="#metadata-handling" title="Permanent link">¶</a></h2>
+<p>Data in Arrow record batches carry a <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/struct.Schema.html";>Schema</a>
 in addition to the Arrow arrays. Each
+<a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field";>Field</a> 
in this <code>Schema</code> contains a name, data type, nullability, and 
metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.</p>
+<figure>
+<img alt="Relationship between a Record Batch, it's schema, and the underlying 
arrays. There is a one to one relationship between each Field in the Schema and 
Array entry in the Columns." class="img-responsive" 
src="/blog/images/metadata-handling/arrow_record_batch.png" width="100%"/>
+<figcaption>
+<b>Figure 1:</b> Relationship between a Record Batch, it's schema, and the 
underlying arrays. There is a one to one relationship between each Field in the 
Schema and Array entry in the Columns.
+  </figcaption>
+</figure>
+<p>It is often desirable to write a generic function for reuse. Prior versions 
of
+user defined functions only had access to the <code>DataType</code> of the 
input columns.
+This works well for some features that only rely on the types of data, but 
other
+use cases may need additional information that describes the data.</p>
+<p>For example, suppose I wish to write a function that takes in a UUID and 
returns a string
+of the <a href="https://www.ietf.org/rfc/rfc9562.html#section-4.1";>variant</a> 
of the input field. We would want this function to be able to handle
+all of the string types and also a binary encoded UUID. The Arrow 
specification does not
+contain a unsigned 128 bit value, it is common to encode a UUID as a fixed 
sized binary
+array where each element is 16 bytes long. With the metadata handling in 
[DataFusion 48.0.0]
+we can validate during planning that the input data not only has the correct 
underlying
+data type, but that it also represents the right <em>kind</em> of data. The 
UUID example is a
+common one, and it is included in the <a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html";>canonical 
extension types</a> that are now
+supported in DataFusion.</p>
+<p>Another common application of metadata handling is understanding encoding 
of a blob of data.
+Suppose you have a column that contains image data. Most likely this data is 
stored as
+an array of <code>u8</code> data. Without knowing a priori what the encoding 
of that blob of data is,
+you cannot ensure you are using the correct methods for decoding it. You may 
work around
+this by adding another column to your data source indicating the encoding, but 
this can be
+wasteful for systems where the encoding never changes. Instead, you could use 
metadata to
+specify the encoding for the entire column.</p>
+<h2 id="how-to-use-metadata-in-user-defined-functions">How to use metadata in 
user defined functions<a class="headerlink" 
href="#how-to-use-metadata-in-user-defined-functions" title="Permanent 
link">¶</a></h2>
+<p>When working with metadata for <a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html";>user
 defined scalar functions</a>, there are typically two
+places in the function definition that require implementation.</p>
+<ul>
+<li>Computing the return field from the arguments</li>
+<li>Invocation</li>
+</ul>
+<p>During planning, we will attempt to call the function <a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.return_field_from_args";>return_field_from_args()</a>.
 This will
+provide a list of input fields to the function and return the output field. To 
evaluate
+metadata on the input side, you can write a functions similar to this 
example:</p>
+<pre><code class="language-rust">fn return_field_from_args(
+    &amp;self,
+    args: ReturnFieldArgs,
+) -&gt; datafusion::common::Result&lt;FieldRef&gt; {
+    if args.arg_fields.len() != 1 {
+        return exec_err!("Incorrect number of arguments for uuid_version");
+    }
+
+    let input_field = &amp;args.arg_fields[0];
+    if &amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let Ok(CanonicalExtensionType::Uuid(_)) = 
input_field.try_canonical_extension_type()
+        else {
+            return exec_err!("Input field must contain the UUID canonical 
extension type");
+        };
+    }
+
+    let is_nullable = args.arg_fields[0].is_nullable();
+
+    Ok(Arc::new(Field::new(self.name(), DataType::UInt32, is_nullable)))
+}
+</code></pre>
+<p>In this example, we take advantage of the fact that we already have support 
for extension
+types that evaluate metadata. If you were attempting to check for metadata 
other than
+extension type support, we could have instead written a snippet such as:</p>
+<pre><code class="language-rust">    if &amp;DataType::FixedSizeBinary(16) == 
input_field.data_type() {
+        let _ = input_field
+            .metadata()
+            .get("ARROW:extension:metadata")
+            .ok_or(exec_datafusion_err!("Input field must contain the UUID 
canonical extension type"))?;
+        };
+    }
+</code></pre>
+<p>If you are writing a user defined function that will instead return 
metadata on output
+you can add this directly into the <code>Field</code> that is the output of 
the <code>return_field_from_args</code>
+call. In our above example, we could change the return line to:</p>
+<pre><code class="language-rust">    Ok(Arc::new(
+        Field::new(self.name(), DataType::UInt32, is_nullable).with_metadata(
+            [("my_key".to_string(), "my_value".to_string())]
+                .into_iter()
+                .collect(),
+        ),
+    ))
+</code></pre>
+<p>By checking the metadata during the planning process, we can identify 
errors early in
+the query process. There are cases were we wish to have access to this 
metadata during
+execution as well. The function <a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args";>invoke_with_args</a>
 in the user defined function takes
+the updated struct <a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarFunctionArgs.html";>ScalarFunctionArgs</a>.
 This now contains the input fields, which can
+be used to check for metadata. For example, you can do the following:</p>
+<pre><code class="language-rust">fn invoke_with_args(&amp;self, args: 
ScalarFunctionArgs) -&gt; Result&lt;ColumnarValue&gt; {
+    assert_eq!(args.arg_fields.len(), 1);
+    let my_value = args.arg_fields[0]
+        .metadata()
+        .get("encoding_type");
+    ...
+</code></pre>
+<p>In this snippet we have extracted an <code>Option&lt;String&gt;</code> from 
the input field metadata
+which we can then use to determine which functions we might want to call. We 
could
+then parse the returned value to determine what type of encoding to use when
+evaluating the array in the arguments. Since 
<code>return_field_from_args</code> is not <code>&amp;mut self</code>
+this check could not be performed during the planning stage.</p>
+<p>The description in this section applies to scalar user defined functions, 
but equivalent
+support exists for aggregate and window functions.</p>
+<h2 id="extension-types">Extension types<a class="headerlink" 
href="#extension-types" title="Permanent link">¶</a></h2>
+<p>Extension types are one of the primary motivations for this  enhancement in
+[Datafusion 48.0.0]. The official Rust implementation of Apache Arrow, <a 
href="https://github.com/apache/arrow-rs";>arrow-rs</a>,
+already contains support for the <a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html";>canonical 
extension types</a>. This support includes
+helper functions such as <code>try_canonical_extension_type()</code> in the 
earlier example.</p>
+<p>For a concrete example of how extension types can be used in DataFusion 
functions,
+there is an <a 
href="https://github.com/timsaucer/datafusion_extension_type_examples";>example 
repository</a> that demonstrates using UUIDs. The UUID extension
+type specifies that the data are stored as a Fixed Size Binary of length 16. 
In the
+DataFusion core functions, we have the ability to generate string 
representations of
+UUIDs that match the version 4 specification. These are helpful, but a user may
+wish to do additional work with UUIDs where having them in the dense 
representation
+is preferable. Alternatively, the user may already have data with the binary 
encoding
+and we want to extract values such as the version, timestamp, or string
+representation.</p>
+<p>In the example repository we have created three user defined functions: 
<code>UuidVersion</code>,
+<code>StringToUuid</code>, and <code>UuidToString</code>. Each of these 
implements <code>ScalarUDFImpl</code> and can
+be used thusly:</p>
+<pre><code class="language-rust">async fn main() -&gt; Result&lt;()&gt; {
+    let ctx = create_context()?;
+
+    // get a DataFrame from the context
+    let mut df = ctx.table("t").await?;
+
+    // Create the string UUIDs
+    df = df.select(vec![uuid().alias("string_uuid")])?;
+
+    // Convert string UUIDs to canonical extension UUIDs
+    let string_to_uuid = ScalarUDF::new_from_impl(StringToUuid::default());
+    df = df.with_column("uuid", 
string_to_uuid.call(vec![col("string_uuid")]))?;
+
+    // Extract version number from canonical extension UUIDs
+    let version = ScalarUDF::new_from_impl(UuidVersion::default());
+    df = df.with_column("version", version.call(vec![col("uuid")]))?;
+
+    // Convert back to a string
+    let uuid_to_string = ScalarUDF::new_from_impl(UuidToString::default());
+    df = df.with_column("string_round_trip", 
uuid_to_string.call(vec![col("uuid")]))?;
+
+    df.show().await?;
+
+    Ok(())
+}
+</code></pre>
+<p>The <a 
href="https://github.com/timsaucer/datafusion_extension_type_examples";>example 
repository</a> also contains a crate that demonstrates how to expose these
+UDFs to <a href="https://datafusion.apache.org/python/";>datafusion-python</a>. 
This requires version 48.0.0 or later.</p>
+<h2 id="other-use-cases">Other use cases<a class="headerlink" 
href="#other-use-cases" title="Permanent link">¶</a></h2>
+<p>The metadata attached to the fields can be used to store <em>any</em> user 
data in key/value
+pairs. Some of the other use cases that have been identified include:</p>
+<ul>
+<li>Creating output for downstream systems. One user of DataFusion produces
+  <a href="https://rerun.io/blog/column-chunks";>data visualizations</a> that 
are dependant upon metadata in record batch fields. By
+  enabling metadata on output of user defined functions, we can now produce 
batches
+  that are directly consumable by these systems.</li>
+<li>Describe the relationships between columns of data. You can store data 
about how
+  one column of data relates to another and use these during function 
evaluation. For
+  example, in robotics it is common to use <a 
href="https://wiki.ros.org/tf2";>transforms</a> to describe how to convert
+  from one coordinate system to another. It can be convenient to send the 
function
+  all the columns that contain transform information and then allow the 
function
+  to determine which columns to use based on the metadata. This allows for
+  encapsulation of the transform logic within the user function.</li>
+<li>Storing logical types of the data model. <a 
href="https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/";>InfluxDB</a>
 uses field metadata to specify
+  which columns are used for tags, times, and fields.</li>
+</ul>
+<p>Based on the experience of the authors, we recommend caution when using 
metadata
+for use cases other than type extension. One issue that can arises is that as 
columns
+are used to compute new fields, some functions may pass through the metadata 
and the
+semantic meaning may change. For example, suppose you decided to use metadata 
to
+store some kind of statistics for the entire stream of record batches. Then 
you pass
+that column through a filter that removes many rows of data. Your statistics
+metadata may now be invalid, even though it was passed through the filter.</p>
+<p>Similarly, if you use metadata to form relations between one column and 
another and
+the naming of the columns has changed at some point in your workflow, then the 
metadata
+may indicate an incorrect column of data it is referring to. This can be 
mitigated by
+not relying on column naming but rather adding additional metadata to all 
columns of
+interest.</p>
+<h2 id="acknowledgements">Acknowledgements<a class="headerlink" 
href="#acknowledgements" title="Permanent link">¶</a></h2>
+<p>We would like to thank <a href="https://rerun.io";>Rerun.io</a> for 
sponsoring the development of this work. <a href="https://rerun.io";>Rerun.io</a>
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.</p>
+<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" 
title="Permanent link">¶</a></h2>
+<p>The enhanced metadata handling in [DataFusion 48.0.0] is a significant step
+forward in the ability to handle more interesting types of data. Users can
+validate the input data matches the intent of the data to be processed, enable
+complex operations on binary data because we understand the encoding used, and 
+use metadata to create new and interesting user defined data types.
+We can't wait to see what you build with it!</p>
+<h2 id="get-involved">Get Involved<a class="headerlink" href="#get-involved" 
title="Permanent link">¶</a></h2>
+<p>The DataFusion team is an active and engaging community and we would love 
to have you join
+us and help the project.</p>
+<p>Here are some ways to get involved:</p>
+<ul>
+<li>Learn more by visiting the <a 
href="https://datafusion.apache.org/index.html";>DataFusion</a> project 
page.</li>
+<li>Try out the project and provide feedback, file issues, and contribute 
code.</li>
+<li>Work on a <a 
href="https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22";>good
 first issue</a>.</li>
+<li>Reach out to us via the <a 
href="https://datafusion.apache.org/contributor-guide/communication.html";>communication
 doc</a>.</li>
+</ul>
+
+<!--
+  Comments Section
+  Loaded only after explicit visitor consent to comply with ASF policy.
+-->
+
+<div id="comments">
+  <hr>
+  <h3>Comments</h3>
+
+  <!-- Local loader script -->
+  <script src="/content/js/giscus-consent.js" defer></script>
+
+  <!-- Consent UI -->
+  <div id="giscus-consent">
+    <p>
+        We use <a href="https://giscus.app/";>Giscus</a> for comments, powered 
by GitHub Discussions.
+        To respect your privacy, Giscus and comments will load only if you 
click "Show Comments"
+    </p>
+
+    <div class="consent-actions">
+      <button id="giscus-load" type="button">Show Comments</button>
+      <button id="giscus-revoke" type="button" hidden>Hide Comments</button>
+    </div>
+
+    <noscript>JavaScript is required to load comments from Giscus.</noscript>
+  </div>
+
+  <!-- Container where Giscus will render -->
+  <div id="comment-thread"></div>
+</div>      </div>
+      <aside class="toc-container d-none d-md-block col-md-4 col-xl-3 ms-xl-2">
+        <div class="toc"><span class="toctitle">Contents</span><ul>
+<li><a href="#user-defined-types-extension-types">User defined types == 
extension types</a></li>
+<li><a href="#metadata-in-apache-arrow-fields">Metadata in Apache Arrow 
Fields</a></li>
+<li><a href="#metadata-handling">Metadata handling</a></li>
+<li><a href="#how-to-use-metadata-in-user-defined-functions">How to use 
metadata in user defined functions</a></li>
+<li><a href="#extension-types">Extension types</a></li>
+<li><a href="#other-use-cases">Other use cases</a></li>
+<li><a href="#acknowledgements">Acknowledgements</a></li>
+<li><a href="#conclusion">Conclusion</a></li>
+<li><a href="#get-involved">Get Involved</a></li>
+</ul>
+</div>
+      </aside>
+    </div>
+  </div>
+</div>    
+    <!-- footer -->
+    <div class="row g-0">
+      <div class="col-12">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>  </main>
+  </body>
+</html>
diff --git 
a/output/author/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.html
 
b/output/author/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.html
new file mode 100644
index 0000000..e03bb2b
--- /dev/null
+++ 
b/output/author/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.html
@@ -0,0 +1,64 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+        <title>Apache DataFusion Blog - Articles by Tim Saucer(rerun.io), 
Dewey Dunnington(Wherobots), Andrew Lamb(InfluxData)</title>
+        <meta charset="utf-8" />
+        <meta name="generator" content="Pelican" />
+        <link href="https://datafusion.apache.org/blog/feed.xml"; 
type="application/rss+xml" rel="alternate" title="Apache DataFusion Blog RSS 
Feed" />
+</head>
+
+<body id="index" class="home">
+        <header id="banner" class="body">
+                <h1><a href="https://datafusion.apache.org/blog/";>Apache 
DataFusion Blog</a></h1>
+        </header><!-- /#banner -->
+        <nav id="menu"><ul>
+            <li><a 
href="https://datafusion.apache.org/blog/pages/about.html";>About</a></li>
+            <li><a 
href="https://datafusion.apache.org/blog/pages/index.html";>index</a></li>
+            <li><a 
href="https://datafusion.apache.org/blog/category/blog.html";>blog</a></li>
+        </ul></nav><!-- /#menu -->
+<section id="content">
+<h2>Articles by Tim Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew 
Lamb(InfluxData)</h2>
+
+<ol id="post-list">
+        <li><article class="hentry">
+                <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata";
 rel="bookmark" title="Permalink to Implementing User Defined Types and Custom 
Metadata in DataFusion">Implementing User Defined Types and Custom Metadata in 
DataFusion</a></h2> </header>
+                <footer class="post-info">
+                    <time class="published" 
datetime="2025-09-21T00:00:00+00:00"> Sun 21 September 2025 </time>
+                    <address class="vcard author">By
+                        <a class="url fn" 
href="https://datafusion.apache.org/blog/author/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.html";>Tim
 Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew Lamb(InfluxData)</a>
+                    </address>
+                </footer><!-- /.post-info -->
+                <div class="entry-content"> <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+
+<p><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/";>Apache 
DataFusion</a> significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.</p>
+<h2 id="user-defined-types-extension-types">User defined types == extension 
types<a class="headerlink" href="#user-defined-types-extension-types" 
title="Permanent link">¶</a></h2>
+<p>DataFusion directly uses <a href="https://arrow.apache.org";>Apache 
Arrow</a>'s <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html";>DataTypes</a>
 as its type system. This
+has …</p> </div><!-- /.entry-content -->
+        </article></li>
+</ol><!-- /#posts-list -->
+</section><!-- /#content -->
+        <footer id="contentinfo" class="body">
+                <address id="about" class="vcard body">
+                Proudly powered by <a 
href="https://getpelican.com/";>Pelican</a>,
+                which takes great advantage of <a 
href="https://www.python.org/";>Python</a>.
+                </address><!-- /#about -->
+        </footer><!-- /#contentinfo -->
+</body>
+</html>
\ No newline at end of file
diff --git a/output/category/blog.html b/output/category/blog.html
index 1b7eb1a..eb412a0 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -21,6 +21,38 @@
 <h2>Articles in the blog category</h2>
 
 <ol id="post-list">
+        <li><article class="hentry">
+                <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata";
 rel="bookmark" title="Permalink to Implementing User Defined Types and Custom 
Metadata in DataFusion">Implementing User Defined Types and Custom Metadata in 
DataFusion</a></h2> </header>
+                <footer class="post-info">
+                    <time class="published" 
datetime="2025-09-21T00:00:00+00:00"> Sun 21 September 2025 </time>
+                    <address class="vcard author">By
+                        <a class="url fn" 
href="https://datafusion.apache.org/blog/author/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.html";>Tim
 Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew Lamb(InfluxData)</a>
+                    </address>
+                </footer><!-- /.post-info -->
+                <div class="entry-content"> <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+
+<p><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/";>Apache 
DataFusion</a> significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.</p>
+<h2 id="user-defined-types-extension-types">User defined types == extension 
types<a class="headerlink" href="#user-defined-types-extension-types" 
title="Permanent link">¶</a></h2>
+<p>DataFusion directly uses <a href="https://arrow.apache.org";>Apache 
Arrow</a>'s <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html";>DataTypes</a>
 as its type system. This
+has …</p> </div><!-- /.entry-content -->
+        </article></li>
         <li><article class="hentry">
                 <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0"; 
rel="bookmark" title="Permalink to Apache DataFusion Comet 0.10.0 
Release">Apache DataFusion Comet 0.10.0 Release</a></h2> </header>
                 <footer class="post-info">
diff --git a/output/feed.xml b/output/feed.xml
index 9e545dd..5e9f341 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,27 @@
 <?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue,
 16 Sep 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 
0.10.0 
Release</title><link>https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0</link><description>&lt;!--
+<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sun,
 21 Sep 2025 00:00:00 +0000</lastBuildDate><item><title>Implementing User 
Defined Types and Custom Metadata in 
DataFusion</title><link>https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has …&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>Tim Saucer(rerun.io), Dewey 
Dunnington(Wherobots), Andrew Lamb(InfluxData)</dc:creator><pubDate>Sun, 21 Sep 
2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-09-21:/blog/2025/09/21/custom-types-using-metadata</guid><category>blog</category></item><item><title>Apache
 DataFusion Comet 0.10.0 
Release</title><link>https://datafusion.apache.org/blog/2025/09/16/datafusion-co
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index c27f401..e216160 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,275 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-09-16T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion Comet 0.10.0 Release</title><link 
href="https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0"; r 
[...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-09-21T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Implementing
 User Defined Types and Custom Metadata in DataFusion</title><link 
href="https://datafusion.apache.org/blog/2025/09/21 [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has …&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has several benefits including being simple to explain, supports a rich set of
+both scalar and nested types, true zero copy interoperability with other Arrow
+implementations, and world-class library support (via &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;). However, one
+challenge of directly using the Arrow type system is there is no distinction
+between logical types and physical types. For example, the Arrow type system
+contains multiple types which can store "String"s (sequences of UTF8 encoded
+bytes) such as &lt;code&gt;Utf8&lt;/code&gt;, 
&lt;code&gt;LargeUTF8&lt;/code&gt;, &lt;code&gt;Dictionary(Utf8)&lt;/code&gt;, 
and &lt;code&gt;Utf8View&lt;/code&gt;. &lt;/p&gt;
+&lt;p&gt;However, Apache Arrow does provide &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types"&gt;extension
 types&lt;/a&gt;, a version of logical type
+information, which describe how to interpret data stored in one of the existing
+physical types. With the improved support for metadata in DataFusion 48.0.0, it
+is now easier to implement user defined types using Arrow extension 
types.&lt;/p&gt;
+&lt;h2 id="metadata-in-apache-arrow-fields"&gt;Metadata in Apache Arrow 
&lt;code&gt;Field&lt;/code&gt;s&lt;a class="headerlink" 
href="#metadata-in-apache-arrow-fields" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html"&gt;Arrow 
specification&lt;/a&gt; defines Metadata as a map of key-value pairs of
+strings. This metadata is used to attach extension types and use case-specific
+context to a column of values. The Rust implementation of Apache Arrow,
+&lt;a href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;, stores 
metadata on &lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;s,
 but prior to DataFusion 48.0.0, many of
+DataFusion's internal APIs used &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 directly, and thus did not propagate
+metadata through all operations.&lt;/p&gt;
+&lt;p&gt;In previous versions of DataFusion &lt;code&gt;Field&lt;/code&gt; 
metadata was propagated through certain
+operations (e.g., renaming or selecting a column) but was not 
+others (e.g., scalar, window, or aggregate function calls). In DataFusion 
48.0.0, 
+and later, all user defined functions are passed the full
+input &lt;code&gt;Field&lt;/code&gt; information and can return 
&lt;code&gt;Field&lt;/code&gt; information to the caller.&lt;/p&gt;
+&lt;p&gt;Supporting extension types was a key motivation for adding metadata 
to the
+function processing, the same mechanism can store arbitrary metadata on the
+input and output fields, which supports other interesting use cases as we 
describe
+later in this post.&lt;/p&gt;
+&lt;h2 id="metadata-handling"&gt;Metadata handling&lt;a class="headerlink" 
href="#metadata-handling" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/struct.Schema.html"&gt;Schema&lt;/a&gt;
 in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img alt="Relationship between a Record Batch, it's schema, and the 
underlying arrays. There is a one to one relationship between each Field in the 
Schema and Array entry in the Columns." class="img-responsive" 
src="/blog/images/metadata-handling/arrow_record_batch.png" width="100%"/&gt;
+&lt;figcaption&gt;
+&lt;b&gt;Figure 1:&lt;/b&gt; Relationship between a Record Batch, it's schema, 
and the underlying arrays. There is a one to one relationship between each 
Field in the Schema and Array entry in the Columns.
+  &lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. Prior 
versions of
+user defined functions only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns.
+This works well for some features that only rely on the types of data, but 
other
+use cases may need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I wish to write a function that takes in a UUID 
and returns a string
+of the &lt;a 
href="https://www.ietf.org/rfc/rfc9562.html#section-4.1"&gt;variant&lt;/a&gt; 
of the input field. We would want this function to be able to handle
+all of the string types and also a binary encoded UUID. The Arrow 
specification does not
+contain a unsigned 128 bit value, it is common to encode a UUID as a fixed 
sized binary
+array where each element is 16 bytes long. With the metadata handling in 
[DataFusion 48.0.0]
+we can validate during planning that the input data not only has the correct 
underlying
+data type, but that it also represents the right &lt;em&gt;kind&lt;/em&gt; of 
data. The UUID example is a
+common one, and it is included in the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt; that are now
+supported in DataFusion.&lt;/p&gt;
+&lt;p&gt;Another common application of metadata handling is understanding 
encoding of a blob of data.
+Suppose you have a column that contains image data. Most likely this data is 
stored as
+an array of &lt;code&gt;u8&lt;/code&gt; data. Without knowing a priori what 
the encoding of that blob of data is,
+you cannot ensure you are using the correct methods for decoding it. You may 
work around
+this by adding another column to your data source indicating the encoding, but 
this can be
+wasteful for systems where the encoding never changes. Instead, you could use 
metadata to
+specify the encoding for the entire column.&lt;/p&gt;
+&lt;h2 id="how-to-use-metadata-in-user-defined-functions"&gt;How to use 
metadata in user defined functions&lt;a class="headerlink" 
href="#how-to-use-metadata-in-user-defined-functions" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;When working with metadata for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html"&gt;user
 defined scalar functions&lt;/a&gt;, there are typically two
+places in the function definition that require implementation.&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Computing the return field from the arguments&lt;/li&gt;
+&lt;li&gt;Invocation&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;During planning, we will attempt to call the function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.return_field_from_args"&gt;return_field_from_args()&lt;/a&gt;.
 This will
+provide a list of input fields to the function and return the output field. To 
evaluate
+metadata on the input side, you can write a functions similar to this 
example:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn return_field_from_args(
+    &amp;amp;self,
+    args: ReturnFieldArgs,
+) -&amp;gt; datafusion::common::Result&amp;lt;FieldRef&amp;gt; {
+    if args.arg_fields.len() != 1 {
+        return exec_err!("Incorrect number of arguments for uuid_version");
+    }
+
+    let input_field = &amp;amp;args.arg_fields[0];
+    if &amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let Ok(CanonicalExtensionType::Uuid(_)) = 
input_field.try_canonical_extension_type()
+        else {
+            return exec_err!("Input field must contain the UUID canonical 
extension type");
+        };
+    }
+
+    let is_nullable = args.arg_fields[0].is_nullable();
+
+    Ok(Arc::new(Field::new(self.name(), DataType::UInt32, is_nullable)))
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this example, we take advantage of the fact that we already have 
support for extension
+types that evaluate metadata. If you were attempting to check for metadata 
other than
+extension type support, we could have instead written a snippet such 
as:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    if 
&amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let _ = input_field
+            .metadata()
+            .get("ARROW:extension:metadata")
+            .ok_or(exec_datafusion_err!("Input field must contain the UUID 
canonical extension type"))?;
+        };
+    }
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;If you are writing a user defined function that will instead return 
metadata on output
+you can add this directly into the &lt;code&gt;Field&lt;/code&gt; that is the 
output of the &lt;code&gt;return_field_from_args&lt;/code&gt;
+call. In our above example, we could change the return line to:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    Ok(Arc::new(
+        Field::new(self.name(), DataType::UInt32, is_nullable).with_metadata(
+            [("my_key".to_string(), "my_value".to_string())]
+                .into_iter()
+                .collect(),
+        ),
+    ))
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;By checking the metadata during the planning process, we can identify 
errors early in
+the query process. There are cases were we wish to have access to this 
metadata during
+execution as well. The function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args"&gt;invoke_with_args&lt;/a&gt;
 in the user defined function takes
+the updated struct &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarFunctionArgs.html"&gt;ScalarFunctionArgs&lt;/a&gt;.
 This now contains the input fields, which can
+be used to check for metadata. For example, you can do the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn 
invoke_with_args(&amp;amp;self, args: ScalarFunctionArgs) -&amp;gt; 
Result&amp;lt;ColumnarValue&amp;gt; {
+    assert_eq!(args.arg_fields.len(), 1);
+    let my_value = args.arg_fields[0]
+        .metadata()
+        .get("encoding_type");
+    ...
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this snippet we have extracted an 
&lt;code&gt;Option&amp;lt;String&amp;gt;&lt;/code&gt; from the input field 
metadata
+which we can then use to determine which functions we might want to call. We 
could
+then parse the returned value to determine what type of encoding to use when
+evaluating the array in the arguments. Since 
&lt;code&gt;return_field_from_args&lt;/code&gt; is not &lt;code&gt;&amp;amp;mut 
self&lt;/code&gt;
+this check could not be performed during the planning stage.&lt;/p&gt;
+&lt;p&gt;The description in this section applies to scalar user defined 
functions, but equivalent
+support exists for aggregate and window functions.&lt;/p&gt;
+&lt;h2 id="extension-types"&gt;Extension types&lt;a class="headerlink" 
href="#extension-types" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Extension types are one of the primary motivations for this  
enhancement in
+[Datafusion 48.0.0]. The official Rust implementation of Apache Arrow, &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;,
+already contains support for the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt;. This support includes
+helper functions such as 
&lt;code&gt;try_canonical_extension_type()&lt;/code&gt; in the earlier 
example.&lt;/p&gt;
+&lt;p&gt;For a concrete example of how extension types can be used in 
DataFusion functions,
+there is an &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; that demonstrates using UUIDs. The UUID extension
+type specifies that the data are stored as a Fixed Size Binary of length 16. 
In the
+DataFusion core functions, we have the ability to generate string 
representations of
+UUIDs that match the version 4 specification. These are helpful, but a user may
+wish to do additional work with UUIDs where having them in the dense 
representation
+is preferable. Alternatively, the user may already have data with the binary 
encoding
+and we want to extract values such as the version, timestamp, or string
+representation.&lt;/p&gt;
+&lt;p&gt;In the example repository we have created three user defined 
functions: &lt;code&gt;UuidVersion&lt;/code&gt;,
+&lt;code&gt;StringToUuid&lt;/code&gt;, and 
&lt;code&gt;UuidToString&lt;/code&gt;. Each of these implements 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and can
+be used thusly:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;async fn main() -&amp;gt; 
Result&amp;lt;()&amp;gt; {
+    let ctx = create_context()?;
+
+    // get a DataFrame from the context
+    let mut df = ctx.table("t").await?;
+
+    // Create the string UUIDs
+    df = df.select(vec![uuid().alias("string_uuid")])?;
+
+    // Convert string UUIDs to canonical extension UUIDs
+    let string_to_uuid = ScalarUDF::new_from_impl(StringToUuid::default());
+    df = df.with_column("uuid", 
string_to_uuid.call(vec![col("string_uuid")]))?;
+
+    // Extract version number from canonical extension UUIDs
+    let version = ScalarUDF::new_from_impl(UuidVersion::default());
+    df = df.with_column("version", version.call(vec![col("uuid")]))?;
+
+    // Convert back to a string
+    let uuid_to_string = ScalarUDF::new_from_impl(UuidToString::default());
+    df = df.with_column("string_round_trip", 
uuid_to_string.call(vec![col("uuid")]))?;
+
+    df.show().await?;
+
+    Ok(())
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; also contains a crate that demonstrates how to expose 
these
+UDFs to &lt;a 
href="https://datafusion.apache.org/python/"&gt;datafusion-python&lt;/a&gt;. 
This requires version 48.0.0 or later.&lt;/p&gt;
+&lt;h2 id="other-use-cases"&gt;Other use cases&lt;a class="headerlink" 
href="#other-use-cases" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The metadata attached to the fields can be used to store 
&lt;em&gt;any&lt;/em&gt; user data in key/value
+pairs. Some of the other use cases that have been identified include:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Creating output for downstream systems. One user of DataFusion 
produces
+  &lt;a href="https://rerun.io/blog/column-chunks"&gt;data 
visualizations&lt;/a&gt; that are dependant upon metadata in record batch 
fields. By
+  enabling metadata on output of user defined functions, we can now produce 
batches
+  that are directly consumable by these systems.&lt;/li&gt;
+&lt;li&gt;Describe the relationships between columns of data. You can store 
data about how
+  one column of data relates to another and use these during function 
evaluation. For
+  example, in robotics it is common to use &lt;a 
href="https://wiki.ros.org/tf2"&gt;transforms&lt;/a&gt; to describe how to 
convert
+  from one coordinate system to another. It can be convenient to send the 
function
+  all the columns that contain transform information and then allow the 
function
+  to determine which columns to use based on the metadata. This allows for
+  encapsulation of the transform logic within the user function.&lt;/li&gt;
+&lt;li&gt;Storing logical types of the data model. &lt;a 
href="https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/"&gt;InfluxDB&lt;/a&gt;
 uses field metadata to specify
+  which columns are used for tags, times, and fields.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Based on the experience of the authors, we recommend caution when 
using metadata
+for use cases other than type extension. One issue that can arises is that as 
columns
+are used to compute new fields, some functions may pass through the metadata 
and the
+semantic meaning may change. For example, suppose you decided to use metadata 
to
+store some kind of statistics for the entire stream of record batches. Then 
you pass
+that column through a filter that removes many rows of data. Your statistics
+metadata may now be invalid, even though it was passed through the 
filter.&lt;/p&gt;
+&lt;p&gt;Similarly, if you use metadata to form relations between one column 
and another and
+the naming of the columns has changed at some point in your workflow, then the 
metadata
+may indicate an incorrect column of data it is referring to. This can be 
mitigated by
+not relying on column naming but rather adding additional metadata to all 
columns of
+interest.&lt;/p&gt;
+&lt;h2 id="acknowledgements"&gt;Acknowledgements&lt;a class="headerlink" 
href="#acknowledgements" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h2 id="conclusion"&gt;Conclusion&lt;a class="headerlink" 
href="#conclusion" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The enhanced metadata handling in [DataFusion 48.0.0] is a 
significant step
+forward in the ability to handle more interesting types of data. Users can
+validate the input data matches the intent of the data to be processed, enable
+complex operations on binary data because we understand the encoding used, and 
+use metadata to create new and interesting user defined data types.
+We can't wait to see what you build with it!&lt;/p&gt;
+&lt;h2 id="get-involved"&gt;Get Involved&lt;a class="headerlink" 
href="#get-involved" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The DataFusion team is an active and engaging community and we would 
love to have you join
+us and help the project.&lt;/p&gt;
+&lt;p&gt;Here are some ways to get involved:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Learn more by visiting the &lt;a 
href="https://datafusion.apache.org/index.html"&gt;DataFusion&lt;/a&gt; project 
page.&lt;/li&gt;
+&lt;li&gt;Try out the project and provide feedback, file issues, and 
contribute code.&lt;/li&gt;
+&lt;li&gt;Work on a &lt;a 
href="https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;good
 first issue&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;Reach out to us via the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.10.0 
Release</title><link 
href="https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0"; 
rel="alternate"></link><published>2025-09-16T00:00:00+00:00</published><updated>2025-09-16T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-09-16:/blog/2025/09/16/datafusion-comet-0.10.0</id><summary
 type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index 424ab3f..1f83133 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,275 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-09-16T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion Comet 0.10.0 Release</title><link 
href="https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10 [...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-09-21T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Implementing
 User Defined Types and Custom Metadata in DataFusion</title><link 
href="https://datafusion.apache.org/blog/2025/ [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has …&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has several benefits including being simple to explain, supports a rich set of
+both scalar and nested types, true zero copy interoperability with other Arrow
+implementations, and world-class library support (via &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;). However, one
+challenge of directly using the Arrow type system is there is no distinction
+between logical types and physical types. For example, the Arrow type system
+contains multiple types which can store "String"s (sequences of UTF8 encoded
+bytes) such as &lt;code&gt;Utf8&lt;/code&gt;, 
&lt;code&gt;LargeUTF8&lt;/code&gt;, &lt;code&gt;Dictionary(Utf8)&lt;/code&gt;, 
and &lt;code&gt;Utf8View&lt;/code&gt;. &lt;/p&gt;
+&lt;p&gt;However, Apache Arrow does provide &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types"&gt;extension
 types&lt;/a&gt;, a version of logical type
+information, which describe how to interpret data stored in one of the existing
+physical types. With the improved support for metadata in DataFusion 48.0.0, it
+is now easier to implement user defined types using Arrow extension 
types.&lt;/p&gt;
+&lt;h2 id="metadata-in-apache-arrow-fields"&gt;Metadata in Apache Arrow 
&lt;code&gt;Field&lt;/code&gt;s&lt;a class="headerlink" 
href="#metadata-in-apache-arrow-fields" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html"&gt;Arrow 
specification&lt;/a&gt; defines Metadata as a map of key-value pairs of
+strings. This metadata is used to attach extension types and use case-specific
+context to a column of values. The Rust implementation of Apache Arrow,
+&lt;a href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;, stores 
metadata on &lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;s,
 but prior to DataFusion 48.0.0, many of
+DataFusion's internal APIs used &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 directly, and thus did not propagate
+metadata through all operations.&lt;/p&gt;
+&lt;p&gt;In previous versions of DataFusion &lt;code&gt;Field&lt;/code&gt; 
metadata was propagated through certain
+operations (e.g., renaming or selecting a column) but was not 
+others (e.g., scalar, window, or aggregate function calls). In DataFusion 
48.0.0, 
+and later, all user defined functions are passed the full
+input &lt;code&gt;Field&lt;/code&gt; information and can return 
&lt;code&gt;Field&lt;/code&gt; information to the caller.&lt;/p&gt;
+&lt;p&gt;Supporting extension types was a key motivation for adding metadata 
to the
+function processing, the same mechanism can store arbitrary metadata on the
+input and output fields, which supports other interesting use cases as we 
describe
+later in this post.&lt;/p&gt;
+&lt;h2 id="metadata-handling"&gt;Metadata handling&lt;a class="headerlink" 
href="#metadata-handling" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/struct.Schema.html"&gt;Schema&lt;/a&gt;
 in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img alt="Relationship between a Record Batch, it's schema, and the 
underlying arrays. There is a one to one relationship between each Field in the 
Schema and Array entry in the Columns." class="img-responsive" 
src="/blog/images/metadata-handling/arrow_record_batch.png" width="100%"/&gt;
+&lt;figcaption&gt;
+&lt;b&gt;Figure 1:&lt;/b&gt; Relationship between a Record Batch, it's schema, 
and the underlying arrays. There is a one to one relationship between each 
Field in the Schema and Array entry in the Columns.
+  &lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. Prior 
versions of
+user defined functions only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns.
+This works well for some features that only rely on the types of data, but 
other
+use cases may need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I wish to write a function that takes in a UUID 
and returns a string
+of the &lt;a 
href="https://www.ietf.org/rfc/rfc9562.html#section-4.1"&gt;variant&lt;/a&gt; 
of the input field. We would want this function to be able to handle
+all of the string types and also a binary encoded UUID. The Arrow 
specification does not
+contain a unsigned 128 bit value, it is common to encode a UUID as a fixed 
sized binary
+array where each element is 16 bytes long. With the metadata handling in 
[DataFusion 48.0.0]
+we can validate during planning that the input data not only has the correct 
underlying
+data type, but that it also represents the right &lt;em&gt;kind&lt;/em&gt; of 
data. The UUID example is a
+common one, and it is included in the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt; that are now
+supported in DataFusion.&lt;/p&gt;
+&lt;p&gt;Another common application of metadata handling is understanding 
encoding of a blob of data.
+Suppose you have a column that contains image data. Most likely this data is 
stored as
+an array of &lt;code&gt;u8&lt;/code&gt; data. Without knowing a priori what 
the encoding of that blob of data is,
+you cannot ensure you are using the correct methods for decoding it. You may 
work around
+this by adding another column to your data source indicating the encoding, but 
this can be
+wasteful for systems where the encoding never changes. Instead, you could use 
metadata to
+specify the encoding for the entire column.&lt;/p&gt;
+&lt;h2 id="how-to-use-metadata-in-user-defined-functions"&gt;How to use 
metadata in user defined functions&lt;a class="headerlink" 
href="#how-to-use-metadata-in-user-defined-functions" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;When working with metadata for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html"&gt;user
 defined scalar functions&lt;/a&gt;, there are typically two
+places in the function definition that require implementation.&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Computing the return field from the arguments&lt;/li&gt;
+&lt;li&gt;Invocation&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;During planning, we will attempt to call the function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.return_field_from_args"&gt;return_field_from_args()&lt;/a&gt;.
 This will
+provide a list of input fields to the function and return the output field. To 
evaluate
+metadata on the input side, you can write a functions similar to this 
example:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn return_field_from_args(
+    &amp;amp;self,
+    args: ReturnFieldArgs,
+) -&amp;gt; datafusion::common::Result&amp;lt;FieldRef&amp;gt; {
+    if args.arg_fields.len() != 1 {
+        return exec_err!("Incorrect number of arguments for uuid_version");
+    }
+
+    let input_field = &amp;amp;args.arg_fields[0];
+    if &amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let Ok(CanonicalExtensionType::Uuid(_)) = 
input_field.try_canonical_extension_type()
+        else {
+            return exec_err!("Input field must contain the UUID canonical 
extension type");
+        };
+    }
+
+    let is_nullable = args.arg_fields[0].is_nullable();
+
+    Ok(Arc::new(Field::new(self.name(), DataType::UInt32, is_nullable)))
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this example, we take advantage of the fact that we already have 
support for extension
+types that evaluate metadata. If you were attempting to check for metadata 
other than
+extension type support, we could have instead written a snippet such 
as:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    if 
&amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let _ = input_field
+            .metadata()
+            .get("ARROW:extension:metadata")
+            .ok_or(exec_datafusion_err!("Input field must contain the UUID 
canonical extension type"))?;
+        };
+    }
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;If you are writing a user defined function that will instead return 
metadata on output
+you can add this directly into the &lt;code&gt;Field&lt;/code&gt; that is the 
output of the &lt;code&gt;return_field_from_args&lt;/code&gt;
+call. In our above example, we could change the return line to:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    Ok(Arc::new(
+        Field::new(self.name(), DataType::UInt32, is_nullable).with_metadata(
+            [("my_key".to_string(), "my_value".to_string())]
+                .into_iter()
+                .collect(),
+        ),
+    ))
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;By checking the metadata during the planning process, we can identify 
errors early in
+the query process. There are cases were we wish to have access to this 
metadata during
+execution as well. The function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args"&gt;invoke_with_args&lt;/a&gt;
 in the user defined function takes
+the updated struct &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarFunctionArgs.html"&gt;ScalarFunctionArgs&lt;/a&gt;.
 This now contains the input fields, which can
+be used to check for metadata. For example, you can do the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn 
invoke_with_args(&amp;amp;self, args: ScalarFunctionArgs) -&amp;gt; 
Result&amp;lt;ColumnarValue&amp;gt; {
+    assert_eq!(args.arg_fields.len(), 1);
+    let my_value = args.arg_fields[0]
+        .metadata()
+        .get("encoding_type");
+    ...
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this snippet we have extracted an 
&lt;code&gt;Option&amp;lt;String&amp;gt;&lt;/code&gt; from the input field 
metadata
+which we can then use to determine which functions we might want to call. We 
could
+then parse the returned value to determine what type of encoding to use when
+evaluating the array in the arguments. Since 
&lt;code&gt;return_field_from_args&lt;/code&gt; is not &lt;code&gt;&amp;amp;mut 
self&lt;/code&gt;
+this check could not be performed during the planning stage.&lt;/p&gt;
+&lt;p&gt;The description in this section applies to scalar user defined 
functions, but equivalent
+support exists for aggregate and window functions.&lt;/p&gt;
+&lt;h2 id="extension-types"&gt;Extension types&lt;a class="headerlink" 
href="#extension-types" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Extension types are one of the primary motivations for this  
enhancement in
+[Datafusion 48.0.0]. The official Rust implementation of Apache Arrow, &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;,
+already contains support for the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt;. This support includes
+helper functions such as 
&lt;code&gt;try_canonical_extension_type()&lt;/code&gt; in the earlier 
example.&lt;/p&gt;
+&lt;p&gt;For a concrete example of how extension types can be used in 
DataFusion functions,
+there is an &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; that demonstrates using UUIDs. The UUID extension
+type specifies that the data are stored as a Fixed Size Binary of length 16. 
In the
+DataFusion core functions, we have the ability to generate string 
representations of
+UUIDs that match the version 4 specification. These are helpful, but a user may
+wish to do additional work with UUIDs where having them in the dense 
representation
+is preferable. Alternatively, the user may already have data with the binary 
encoding
+and we want to extract values such as the version, timestamp, or string
+representation.&lt;/p&gt;
+&lt;p&gt;In the example repository we have created three user defined 
functions: &lt;code&gt;UuidVersion&lt;/code&gt;,
+&lt;code&gt;StringToUuid&lt;/code&gt;, and 
&lt;code&gt;UuidToString&lt;/code&gt;. Each of these implements 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and can
+be used thusly:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;async fn main() -&amp;gt; 
Result&amp;lt;()&amp;gt; {
+    let ctx = create_context()?;
+
+    // get a DataFrame from the context
+    let mut df = ctx.table("t").await?;
+
+    // Create the string UUIDs
+    df = df.select(vec![uuid().alias("string_uuid")])?;
+
+    // Convert string UUIDs to canonical extension UUIDs
+    let string_to_uuid = ScalarUDF::new_from_impl(StringToUuid::default());
+    df = df.with_column("uuid", 
string_to_uuid.call(vec![col("string_uuid")]))?;
+
+    // Extract version number from canonical extension UUIDs
+    let version = ScalarUDF::new_from_impl(UuidVersion::default());
+    df = df.with_column("version", version.call(vec![col("uuid")]))?;
+
+    // Convert back to a string
+    let uuid_to_string = ScalarUDF::new_from_impl(UuidToString::default());
+    df = df.with_column("string_round_trip", 
uuid_to_string.call(vec![col("uuid")]))?;
+
+    df.show().await?;
+
+    Ok(())
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; also contains a crate that demonstrates how to expose 
these
+UDFs to &lt;a 
href="https://datafusion.apache.org/python/"&gt;datafusion-python&lt;/a&gt;. 
This requires version 48.0.0 or later.&lt;/p&gt;
+&lt;h2 id="other-use-cases"&gt;Other use cases&lt;a class="headerlink" 
href="#other-use-cases" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The metadata attached to the fields can be used to store 
&lt;em&gt;any&lt;/em&gt; user data in key/value
+pairs. Some of the other use cases that have been identified include:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Creating output for downstream systems. One user of DataFusion 
produces
+  &lt;a href="https://rerun.io/blog/column-chunks"&gt;data 
visualizations&lt;/a&gt; that are dependant upon metadata in record batch 
fields. By
+  enabling metadata on output of user defined functions, we can now produce 
batches
+  that are directly consumable by these systems.&lt;/li&gt;
+&lt;li&gt;Describe the relationships between columns of data. You can store 
data about how
+  one column of data relates to another and use these during function 
evaluation. For
+  example, in robotics it is common to use &lt;a 
href="https://wiki.ros.org/tf2"&gt;transforms&lt;/a&gt; to describe how to 
convert
+  from one coordinate system to another. It can be convenient to send the 
function
+  all the columns that contain transform information and then allow the 
function
+  to determine which columns to use based on the metadata. This allows for
+  encapsulation of the transform logic within the user function.&lt;/li&gt;
+&lt;li&gt;Storing logical types of the data model. &lt;a 
href="https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/"&gt;InfluxDB&lt;/a&gt;
 uses field metadata to specify
+  which columns are used for tags, times, and fields.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Based on the experience of the authors, we recommend caution when 
using metadata
+for use cases other than type extension. One issue that can arises is that as 
columns
+are used to compute new fields, some functions may pass through the metadata 
and the
+semantic meaning may change. For example, suppose you decided to use metadata 
to
+store some kind of statistics for the entire stream of record batches. Then 
you pass
+that column through a filter that removes many rows of data. Your statistics
+metadata may now be invalid, even though it was passed through the 
filter.&lt;/p&gt;
+&lt;p&gt;Similarly, if you use metadata to form relations between one column 
and another and
+the naming of the columns has changed at some point in your workflow, then the 
metadata
+may indicate an incorrect column of data it is referring to. This can be 
mitigated by
+not relying on column naming but rather adding additional metadata to all 
columns of
+interest.&lt;/p&gt;
+&lt;h2 id="acknowledgements"&gt;Acknowledgements&lt;a class="headerlink" 
href="#acknowledgements" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h2 id="conclusion"&gt;Conclusion&lt;a class="headerlink" 
href="#conclusion" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The enhanced metadata handling in [DataFusion 48.0.0] is a 
significant step
+forward in the ability to handle more interesting types of data. Users can
+validate the input data matches the intent of the data to be processed, enable
+complex operations on binary data because we understand the encoding used, and 
+use metadata to create new and interesting user defined data types.
+We can't wait to see what you build with it!&lt;/p&gt;
+&lt;h2 id="get-involved"&gt;Get Involved&lt;a class="headerlink" 
href="#get-involved" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The DataFusion team is an active and engaging community and we would 
love to have you join
+us and help the project.&lt;/p&gt;
+&lt;p&gt;Here are some ways to get involved:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Learn more by visiting the &lt;a 
href="https://datafusion.apache.org/index.html"&gt;DataFusion&lt;/a&gt; project 
page.&lt;/li&gt;
+&lt;li&gt;Try out the project and provide feedback, file issues, and 
contribute code.&lt;/li&gt;
+&lt;li&gt;Work on a &lt;a 
href="https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;good
 first issue&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;Reach out to us via the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.10.0 
Release</title><link 
href="https://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0"; 
rel="alternate"></link><published>2025-09-16T00:00:00+00:00</published><updated>2025-09-16T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-09-16:/blog/2025/09/16/datafusion-comet-0.10.0</id><summary
 type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git 
a/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.atom.xml
 
b/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.atom.xml
new file mode 100644
index 0000000..1243d75
--- /dev/null
+++ 
b/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.atom.xml
@@ -0,0 +1,272 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - Tim 
Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew 
Lamb(InfluxData)</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.atom.xml";
 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-09-21T00:00:00+00:00</updated><subtitle></subtitle><entry><
 [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has …&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has several benefits including being simple to explain, supports a rich set of
+both scalar and nested types, true zero copy interoperability with other Arrow
+implementations, and world-class library support (via &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;). However, one
+challenge of directly using the Arrow type system is there is no distinction
+between logical types and physical types. For example, the Arrow type system
+contains multiple types which can store "String"s (sequences of UTF8 encoded
+bytes) such as &lt;code&gt;Utf8&lt;/code&gt;, 
&lt;code&gt;LargeUTF8&lt;/code&gt;, &lt;code&gt;Dictionary(Utf8)&lt;/code&gt;, 
and &lt;code&gt;Utf8View&lt;/code&gt;. &lt;/p&gt;
+&lt;p&gt;However, Apache Arrow does provide &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types"&gt;extension
 types&lt;/a&gt;, a version of logical type
+information, which describe how to interpret data stored in one of the existing
+physical types. With the improved support for metadata in DataFusion 48.0.0, it
+is now easier to implement user defined types using Arrow extension 
types.&lt;/p&gt;
+&lt;h2 id="metadata-in-apache-arrow-fields"&gt;Metadata in Apache Arrow 
&lt;code&gt;Field&lt;/code&gt;s&lt;a class="headerlink" 
href="#metadata-in-apache-arrow-fields" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The &lt;a 
href="https://arrow.apache.org/docs/format/Columnar.html"&gt;Arrow 
specification&lt;/a&gt; defines Metadata as a map of key-value pairs of
+strings. This metadata is used to attach extension types and use case-specific
+context to a column of values. The Rust implementation of Apache Arrow,
+&lt;a href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;, stores 
metadata on &lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;s,
 but prior to DataFusion 48.0.0, many of
+DataFusion's internal APIs used &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 directly, and thus did not propagate
+metadata through all operations.&lt;/p&gt;
+&lt;p&gt;In previous versions of DataFusion &lt;code&gt;Field&lt;/code&gt; 
metadata was propagated through certain
+operations (e.g., renaming or selecting a column) but was not 
+others (e.g., scalar, window, or aggregate function calls). In DataFusion 
48.0.0, 
+and later, all user defined functions are passed the full
+input &lt;code&gt;Field&lt;/code&gt; information and can return 
&lt;code&gt;Field&lt;/code&gt; information to the caller.&lt;/p&gt;
+&lt;p&gt;Supporting extension types was a key motivation for adding metadata 
to the
+function processing, the same mechanism can store arbitrary metadata on the
+input and output fields, which supports other interesting use cases as we 
describe
+later in this post.&lt;/p&gt;
+&lt;h2 id="metadata-handling"&gt;Metadata handling&lt;a class="headerlink" 
href="#metadata-handling" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/struct.Schema.html"&gt;Schema&lt;/a&gt;
 in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;figure&gt;
+&lt;img alt="Relationship between a Record Batch, it's schema, and the 
underlying arrays. There is a one to one relationship between each Field in the 
Schema and Array entry in the Columns." class="img-responsive" 
src="/blog/images/metadata-handling/arrow_record_batch.png" width="100%"/&gt;
+&lt;figcaption&gt;
+&lt;b&gt;Figure 1:&lt;/b&gt; Relationship between a Record Batch, it's schema, 
and the underlying arrays. There is a one to one relationship between each 
Field in the Schema and Array entry in the Columns.
+  &lt;/figcaption&gt;
+&lt;/figure&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. Prior 
versions of
+user defined functions only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns.
+This works well for some features that only rely on the types of data, but 
other
+use cases may need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I wish to write a function that takes in a UUID 
and returns a string
+of the &lt;a 
href="https://www.ietf.org/rfc/rfc9562.html#section-4.1"&gt;variant&lt;/a&gt; 
of the input field. We would want this function to be able to handle
+all of the string types and also a binary encoded UUID. The Arrow 
specification does not
+contain a unsigned 128 bit value, it is common to encode a UUID as a fixed 
sized binary
+array where each element is 16 bytes long. With the metadata handling in 
[DataFusion 48.0.0]
+we can validate during planning that the input data not only has the correct 
underlying
+data type, but that it also represents the right &lt;em&gt;kind&lt;/em&gt; of 
data. The UUID example is a
+common one, and it is included in the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt; that are now
+supported in DataFusion.&lt;/p&gt;
+&lt;p&gt;Another common application of metadata handling is understanding 
encoding of a blob of data.
+Suppose you have a column that contains image data. Most likely this data is 
stored as
+an array of &lt;code&gt;u8&lt;/code&gt; data. Without knowing a priori what 
the encoding of that blob of data is,
+you cannot ensure you are using the correct methods for decoding it. You may 
work around
+this by adding another column to your data source indicating the encoding, but 
this can be
+wasteful for systems where the encoding never changes. Instead, you could use 
metadata to
+specify the encoding for the entire column.&lt;/p&gt;
+&lt;h2 id="how-to-use-metadata-in-user-defined-functions"&gt;How to use 
metadata in user defined functions&lt;a class="headerlink" 
href="#how-to-use-metadata-in-user-defined-functions" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;When working with metadata for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html"&gt;user
 defined scalar functions&lt;/a&gt;, there are typically two
+places in the function definition that require implementation.&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Computing the return field from the arguments&lt;/li&gt;
+&lt;li&gt;Invocation&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;During planning, we will attempt to call the function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.return_field_from_args"&gt;return_field_from_args()&lt;/a&gt;.
 This will
+provide a list of input fields to the function and return the output field. To 
evaluate
+metadata on the input side, you can write a functions similar to this 
example:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn return_field_from_args(
+    &amp;amp;self,
+    args: ReturnFieldArgs,
+) -&amp;gt; datafusion::common::Result&amp;lt;FieldRef&amp;gt; {
+    if args.arg_fields.len() != 1 {
+        return exec_err!("Incorrect number of arguments for uuid_version");
+    }
+
+    let input_field = &amp;amp;args.arg_fields[0];
+    if &amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let Ok(CanonicalExtensionType::Uuid(_)) = 
input_field.try_canonical_extension_type()
+        else {
+            return exec_err!("Input field must contain the UUID canonical 
extension type");
+        };
+    }
+
+    let is_nullable = args.arg_fields[0].is_nullable();
+
+    Ok(Arc::new(Field::new(self.name(), DataType::UInt32, is_nullable)))
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this example, we take advantage of the fact that we already have 
support for extension
+types that evaluate metadata. If you were attempting to check for metadata 
other than
+extension type support, we could have instead written a snippet such 
as:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    if 
&amp;amp;DataType::FixedSizeBinary(16) == input_field.data_type() {
+        let _ = input_field
+            .metadata()
+            .get("ARROW:extension:metadata")
+            .ok_or(exec_datafusion_err!("Input field must contain the UUID 
canonical extension type"))?;
+        };
+    }
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;If you are writing a user defined function that will instead return 
metadata on output
+you can add this directly into the &lt;code&gt;Field&lt;/code&gt; that is the 
output of the &lt;code&gt;return_field_from_args&lt;/code&gt;
+call. In our above example, we could change the return line to:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;    Ok(Arc::new(
+        Field::new(self.name(), DataType::UInt32, is_nullable).with_metadata(
+            [("my_key".to_string(), "my_value".to_string())]
+                .into_iter()
+                .collect(),
+        ),
+    ))
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;By checking the metadata during the planning process, we can identify 
errors early in
+the query process. There are cases were we wish to have access to this 
metadata during
+execution as well. The function &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#tymethod.invoke_with_args"&gt;invoke_with_args&lt;/a&gt;
 in the user defined function takes
+the updated struct &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarFunctionArgs.html"&gt;ScalarFunctionArgs&lt;/a&gt;.
 This now contains the input fields, which can
+be used to check for metadata. For example, you can do the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;fn 
invoke_with_args(&amp;amp;self, args: ScalarFunctionArgs) -&amp;gt; 
Result&amp;lt;ColumnarValue&amp;gt; {
+    assert_eq!(args.arg_fields.len(), 1);
+    let my_value = args.arg_fields[0]
+        .metadata()
+        .get("encoding_type");
+    ...
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;In this snippet we have extracted an 
&lt;code&gt;Option&amp;lt;String&amp;gt;&lt;/code&gt; from the input field 
metadata
+which we can then use to determine which functions we might want to call. We 
could
+then parse the returned value to determine what type of encoding to use when
+evaluating the array in the arguments. Since 
&lt;code&gt;return_field_from_args&lt;/code&gt; is not &lt;code&gt;&amp;amp;mut 
self&lt;/code&gt;
+this check could not be performed during the planning stage.&lt;/p&gt;
+&lt;p&gt;The description in this section applies to scalar user defined 
functions, but equivalent
+support exists for aggregate and window functions.&lt;/p&gt;
+&lt;h2 id="extension-types"&gt;Extension types&lt;a class="headerlink" 
href="#extension-types" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;Extension types are one of the primary motivations for this  
enhancement in
+[Datafusion 48.0.0]. The official Rust implementation of Apache Arrow, &lt;a 
href="https://github.com/apache/arrow-rs"&gt;arrow-rs&lt;/a&gt;,
+already contains support for the &lt;a 
href="https://arrow.apache.org/docs/format/CanonicalExtensions.html"&gt;canonical
 extension types&lt;/a&gt;. This support includes
+helper functions such as 
&lt;code&gt;try_canonical_extension_type()&lt;/code&gt; in the earlier 
example.&lt;/p&gt;
+&lt;p&gt;For a concrete example of how extension types can be used in 
DataFusion functions,
+there is an &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; that demonstrates using UUIDs. The UUID extension
+type specifies that the data are stored as a Fixed Size Binary of length 16. 
In the
+DataFusion core functions, we have the ability to generate string 
representations of
+UUIDs that match the version 4 specification. These are helpful, but a user may
+wish to do additional work with UUIDs where having them in the dense 
representation
+is preferable. Alternatively, the user may already have data with the binary 
encoding
+and we want to extract values such as the version, timestamp, or string
+representation.&lt;/p&gt;
+&lt;p&gt;In the example repository we have created three user defined 
functions: &lt;code&gt;UuidVersion&lt;/code&gt;,
+&lt;code&gt;StringToUuid&lt;/code&gt;, and 
&lt;code&gt;UuidToString&lt;/code&gt;. Each of these implements 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and can
+be used thusly:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;async fn main() -&amp;gt; 
Result&amp;lt;()&amp;gt; {
+    let ctx = create_context()?;
+
+    // get a DataFrame from the context
+    let mut df = ctx.table("t").await?;
+
+    // Create the string UUIDs
+    df = df.select(vec![uuid().alias("string_uuid")])?;
+
+    // Convert string UUIDs to canonical extension UUIDs
+    let string_to_uuid = ScalarUDF::new_from_impl(StringToUuid::default());
+    df = df.with_column("uuid", 
string_to_uuid.call(vec![col("string_uuid")]))?;
+
+    // Extract version number from canonical extension UUIDs
+    let version = ScalarUDF::new_from_impl(UuidVersion::default());
+    df = df.with_column("version", version.call(vec![col("uuid")]))?;
+
+    // Convert back to a string
+    let uuid_to_string = ScalarUDF::new_from_impl(UuidToString::default());
+    df = df.with_column("string_round_trip", 
uuid_to_string.call(vec![col("uuid")]))?;
+
+    df.show().await?;
+
+    Ok(())
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The &lt;a 
href="https://github.com/timsaucer/datafusion_extension_type_examples"&gt;example
 repository&lt;/a&gt; also contains a crate that demonstrates how to expose 
these
+UDFs to &lt;a 
href="https://datafusion.apache.org/python/"&gt;datafusion-python&lt;/a&gt;. 
This requires version 48.0.0 or later.&lt;/p&gt;
+&lt;h2 id="other-use-cases"&gt;Other use cases&lt;a class="headerlink" 
href="#other-use-cases" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The metadata attached to the fields can be used to store 
&lt;em&gt;any&lt;/em&gt; user data in key/value
+pairs. Some of the other use cases that have been identified include:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Creating output for downstream systems. One user of DataFusion 
produces
+  &lt;a href="https://rerun.io/blog/column-chunks"&gt;data 
visualizations&lt;/a&gt; that are dependant upon metadata in record batch 
fields. By
+  enabling metadata on output of user defined functions, we can now produce 
batches
+  that are directly consumable by these systems.&lt;/li&gt;
+&lt;li&gt;Describe the relationships between columns of data. You can store 
data about how
+  one column of data relates to another and use these during function 
evaluation. For
+  example, in robotics it is common to use &lt;a 
href="https://wiki.ros.org/tf2"&gt;transforms&lt;/a&gt; to describe how to 
convert
+  from one coordinate system to another. It can be convenient to send the 
function
+  all the columns that contain transform information and then allow the 
function
+  to determine which columns to use based on the metadata. This allows for
+  encapsulation of the transform logic within the user function.&lt;/li&gt;
+&lt;li&gt;Storing logical types of the data model. &lt;a 
href="https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout/"&gt;InfluxDB&lt;/a&gt;
 uses field metadata to specify
+  which columns are used for tags, times, and fields.&lt;/li&gt;
+&lt;/ul&gt;
+&lt;p&gt;Based on the experience of the authors, we recommend caution when 
using metadata
+for use cases other than type extension. One issue that can arises is that as 
columns
+are used to compute new fields, some functions may pass through the metadata 
and the
+semantic meaning may change. For example, suppose you decided to use metadata 
to
+store some kind of statistics for the entire stream of record batches. Then 
you pass
+that column through a filter that removes many rows of data. Your statistics
+metadata may now be invalid, even though it was passed through the 
filter.&lt;/p&gt;
+&lt;p&gt;Similarly, if you use metadata to form relations between one column 
and another and
+the naming of the columns has changed at some point in your workflow, then the 
metadata
+may indicate an incorrect column of data it is referring to. This can be 
mitigated by
+not relying on column naming but rather adding additional metadata to all 
columns of
+interest.&lt;/p&gt;
+&lt;h2 id="acknowledgements"&gt;Acknowledgements&lt;a class="headerlink" 
href="#acknowledgements" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h2 id="conclusion"&gt;Conclusion&lt;a class="headerlink" 
href="#conclusion" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The enhanced metadata handling in [DataFusion 48.0.0] is a 
significant step
+forward in the ability to handle more interesting types of data. Users can
+validate the input data matches the intent of the data to be processed, enable
+complex operations on binary data because we understand the encoding used, and 
+use metadata to create new and interesting user defined data types.
+We can't wait to see what you build with it!&lt;/p&gt;
+&lt;h2 id="get-involved"&gt;Get Involved&lt;a class="headerlink" 
href="#get-involved" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;The DataFusion team is an active and engaging community and we would 
love to have you join
+us and help the project.&lt;/p&gt;
+&lt;p&gt;Here are some ways to get involved:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;Learn more by visiting the &lt;a 
href="https://datafusion.apache.org/index.html"&gt;DataFusion&lt;/a&gt; project 
page.&lt;/li&gt;
+&lt;li&gt;Try out the project and provide feedback, file issues, and 
contribute code.&lt;/li&gt;
+&lt;li&gt;Work on a &lt;a 
href="https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;good
 first issue&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;Reach out to us via the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/li&gt;
+&lt;/ul&gt;</content><category term="blog"></category></entry></feed>
\ No newline at end of file
diff --git 
a/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.rss.xml
 
b/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.rss.xml
new file mode 100644
index 0000000..1762954
--- /dev/null
+++ 
b/output/feeds/tim-saucerrerunio-dewey-dunningtonwherobots-andrew-lambinfluxdata.rss.xml
@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="utf-8"?>
+<rss version="2.0"><channel><title>Apache DataFusion Blog - Tim 
Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew 
Lamb(InfluxData)</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sun,
 21 Sep 2025 00:00:00 +0000</lastBuildDate><item><title>Implementing User 
Defined Types and Custom Metadata in 
DataFusion</title><link>https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+
+&lt;p&gt;&lt;a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/"&gt;Apache
 DataFusion&lt;/a&gt; significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.&lt;/p&gt;
+&lt;h2 id="user-defined-types-extension-types"&gt;User defined types == 
extension types&lt;a class="headerlink" 
href="#user-defined-types-extension-types" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
+&lt;p&gt;DataFusion directly uses &lt;a 
href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt;'s &lt;a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html"&gt;DataTypes&lt;/a&gt;
 as its type system. This
+has …&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>Tim Saucer(rerun.io), Dewey 
Dunnington(Wherobots), Andrew Lamb(InfluxData)</dc:creator><pubDate>Sun, 21 Sep 
2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-09-21:/blog/2025/09/21/custom-types-using-metadata</guid><category>blog</category></item></channel></rss>
\ No newline at end of file
diff --git a/output/images/metadata-handling/arrow_record_batch.png 
b/output/images/metadata-handling/arrow_record_batch.png
new file mode 100644
index 0000000..d925b32
Binary files /dev/null and 
b/output/images/metadata-handling/arrow_record_batch.png differ
diff --git a/output/index.html b/output/index.html
index 40fc132..a83b228 100644
--- a/output/index.html
+++ b/output/index.html
@@ -45,6 +45,47 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/09/21/custom-types-using-metadata">Implementing User Defined 
Types and Custom Metadata in DataFusion</a></h1>
+                        <p>Posted on: Sun 21 September 2025 by Tim 
Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew Lamb(InfluxData)</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+
+<p><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/";>Apache 
DataFusion</a> significantly improves support for user
+defined types and metadata. The user defined function APIs let users access
+metadata on the input columns to functions and produce metadata in the 
output.</p>
+<h2 id="user-defined-types-extension-types">User defined types == extension 
types<a class="headerlink" href="#user-defined-types-extension-types" 
title="Permanent link">¶</a></h2>
+<p>DataFusion directly uses <a href="https://arrow.apache.org";>Apache 
Arrow</a>'s <a 
href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html";>DataTypes</a>
 as its type system. This
+has …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/09/21/custom-types-using-metadata" class="button 
medium">Continue Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to