This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/asf-staging by this push:
     new c762288  Commit build products
c762288 is described below

commit c762288786e15f64cb155398e6a89359d259ee66
Author: Build Pelican (action) <priv...@infra.apache.org>
AuthorDate: Sun Jun 8 20:49:01 2025 +0000

    Commit build products
---
 blog/2025/05/06/datafusion-comet-0.8.0/index.html |   2 +-
 blog/2025/06/09/metadata-handling/index.html      | 125 ++++++++++++++++++++++
 blog/author/pmc.html                              |   2 +-
 blog/author/tim-saucer.html                       | 107 ++++++++++++++++++
 blog/category/blog.html                           |  40 ++++++-
 blog/feed.xml                                     |  23 +++-
 blog/feeds/all-en.atom.xml                        |  89 ++++++++++++++-
 blog/feeds/blog.atom.xml                          |  89 ++++++++++++++-
 blog/feeds/pmc.atom.xml                           |   4 +-
 blog/feeds/pmc.rss.xml                            |   2 +-
 blog/feeds/tim-saucer.atom.xml                    |  85 +++++++++++++++
 blog/feeds/tim-saucer.rss.xml                     |  21 ++++
 blog/index.html                                   |  40 ++++++-
 13 files changed, 614 insertions(+), 15 deletions(-)

diff --git a/blog/2025/05/06/datafusion-comet-0.8.0/index.html 
b/blog/2025/05/06/datafusion-comet-0.8.0/index.html
index adf2bc8..12c6841 100644
--- a/blog/2025/05/06/datafusion-comet-0.8.0/index.html
+++ b/blog/2025/05/06/datafusion-comet-0.8.0/index.html
@@ -64,7 +64,7 @@ limitations under the License.
 <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a 
href="https://datafusion.apache.org/comet/";>Comet</a> subproject.</p>
 <p>Comet is an accelerator for Apache Spark that translates Spark physical 
plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code changes.</p>
-<p>This release covers approximately SIX weeks of development work and is the 
result of merging 81 PRs from 11
+<p>This release covers approximately six weeks of development work and is the 
result of merging 81 PRs from 11
 contributors. See the <a 
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md";>change
 log</a> for more information.</p>
 <h2>Release Highlights</h2>
 <h3>Performance &amp; Stability</h3>
diff --git a/blog/2025/06/09/metadata-handling/index.html 
b/blog/2025/06/09/metadata-handling/index.html
new file mode 100644
index 0000000..4e3bd54
--- /dev/null
+++ b/blog/2025/06/09/metadata-handling/index.html
@@ -0,0 +1,125 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="x-ua-compatible" content="ie=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Metadata handling in user defined functions - Apache DataFusion 
Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>  </head>
+  <body class="d-flex flex-column h-100">
+  <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>    
+
+
+<!-- page contents -->
+<div id="contents">
+    <div class="bg-white p-5 rounded">
+        <div class="col-sm-8 mx-auto">
+          <h1>
+              Metadata handling in user defined functions
+          </h1>
+              <p>Posted on: Mon 09 June 2025 by Tim Saucer</p>
+              <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3";>DataFusion 
48.0.0</a> introduced a change in the interface for writing custom functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide variety of other use 
cases.</p>
+<p>TODO: UPDATE LINKS</p>
+<h1>Why metadata handling is important</h1>
+<p>Data in Arrow record batches carry a <code>Schema</code> in addition to the 
Arrow arrays. Each
+<a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field";>Field</a> 
in this <code>Schema</code> contains a name, data type, nullability, and 
metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.</p>
+<p>It is often desirable to write a generic function for reuse. With the prior 
version of
+user defined functions, we only had access to the <code>DataType</code> of the 
input columns. This
+works well for some features that only rely on the types of data. Other use 
cases may
+need additional information that describes the data.</p>
+<p>For example, suppose I write a function that computes the force of gravity 
on an object
+based on it's mass. The general equation is <code>F = m * g</code> where 
<code>g = 9.8 m/s</code>. Suppose
+our documentation for the function specifies the output will be in Newtons. 
This is only
+valid if the input unit is in kilograms. With our metadata enhancement, we 
could update
+this function to now evaluate the input units, perform any kind of required
+transformation, and give consistent output every time. We could also have the 
function
+return an error if an invalid input was given, such as providing an input 
where the
+metadata says the units are in <code>meters</code> instead of a unit of 
mass.</p>
+<p>One common application of metadata handling is understanding encoding of a 
blob of data.
+Suppose you have a column that contains image data. You could use metadata to 
specify
+the encoding of the image data so you could use the appropriate decoder.</p>
+<h1>How to use metadata in user defined functions</h1>
+<p>Using input metadata occurs in two different phases of a user defined 
function. Both during
+the planning phase and execution, we have access to these field information. 
This allows
+the user to determine the appropriate output fields during planning and to 
validate the
+input. For other use cases, it may only be necessary to access these fields 
during execution.
+We leave this open to the user.</p>
+<p>For all types of user defined functions we now evaluate the output <a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field";>Field</a> 
as well. You can
+specify this to create your own metadata from your functions or to pass 
through metadata from
+one or more of your inputs.</p>
+<p>In addition to metadata the input field information carries nullability. 
With these you can
+create more expressive nullability of your output data instead of having a 
single output.
+For example, you could write a function to convert a string to uppercase. If 
we know the
+input field is non-nullable, then we can set the output field to non-nullable 
as well.</p>
+<h1>Extension types</h1>
+<p>TODO</p>
+<h1>Working with literals</h1>
+<p>TODO</p>
+<h1>Thanks to our sponsor</h1>
+<p>We would like to thank <a href="https://rerun.io";>Rerun.io</a> for 
sponsoring the development of this work. <a href="https://rerun.io";>Rerun.io</a>
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.</p>
+<h1>Conclusion</h1>
+<p>TODO</p>
+        </div>
+      </div>
+    </div>    
+    <!-- footer -->
+    <div class="row">
+      <div class="large-12 medium-12 columns">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>  </main>
+  </body>
+</html>
diff --git a/blog/author/pmc.html b/blog/author/pmc.html
index e8c80cb..33ebb44 100644
--- a/blog/author/pmc.html
+++ b/blog/author/pmc.html
@@ -76,7 +76,7 @@ limitations under the License.
 <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a 
href="https://datafusion.apache.org/comet/";>Comet</a> subproject.</p>
 <p>Comet is an accelerator for Apache Spark that translates Spark physical 
plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code changes.</p>
-<p>This release covers approximately SIX weeks of development …</p></p>
+<p>This release covers approximately six weeks of development …</p></p>
                         <footer>
                             <ul class="actions">
                                 <div style="text-align: right"><a 
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue 
Reading</a></div>
diff --git a/blog/author/tim-saucer.html b/blog/author/tim-saucer.html
new file mode 100644
index 0000000..2d88c10
--- /dev/null
+++ b/blog/author/tim-saucer.html
@@ -0,0 +1,107 @@
+    <!doctype html>
+    <html class="no-js" lang="en" dir="ltr">
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="x-ua-compatible" content="ie=edge">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <title>Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>        <link 
href="/blog/css/blog_index.css" rel="stylesheet">
+    </head>
+    <body class="d-flex flex-column h-100">
+    <main class="flex-shrink-0">
+        <div>
+
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>
+            <div id="contents">
+                <div class="bg-white p-5 rounded">
+                    <div class="col-sm-8 mx-auto">
+<div id="contents">
+    <div class="bg-white p-5 rounded">
+        <div class="col-sm-8 mx-auto">
+
+            <h3>Welcome to the Apache DataFusion Blog!</h3>
+            <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
+
+
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined 
functions</a></h1>
+                        <p>Posted on: Mon 09 June 2025 by Tim Saucer</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3";>DataFusion 
48.0.0</a> introduced a change in the interface for writing custom functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/06/09/metadata-handling" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
+
+        </div>
+    </div>
+</div>                    </div>
+                </div>
+            </div>
+
+    <!-- footer -->
+    <div class="row">
+      <div class="large-12 medium-12 columns">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>        </div>
+    </main>
+    </body>
+    </html>
diff --git a/blog/category/blog.html b/blog/category/blog.html
index e52d46d..da62b01 100644
--- a/blog/category/blog.html
+++ b/blog/category/blog.html
@@ -47,6 +47,44 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined 
functions</a></h1>
+                        <p>Posted on: Mon 09 June 2025 by Tim Saucer</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3";>DataFusion 
48.0.0</a> introduced a change in the interface for writing custom functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/06/09/metadata-handling" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">
@@ -76,7 +114,7 @@ limitations under the License.
 <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a 
href="https://datafusion.apache.org/comet/";>Comet</a> subproject.</p>
 <p>Comet is an accelerator for Apache Spark that translates Spark physical 
plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code changes.</p>
-<p>This release covers approximately SIX weeks of development …</p></p>
+<p>This release covers approximately six weeks of development …</p></p>
                         <footer>
                             <ul class="actions">
                                 <div style="text-align: right"><a 
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue 
Reading</a></div>
diff --git a/blog/feed.xml b/blog/feed.xml
index 339530b..4e17682 100644
--- a/blog/feed.xml
+++ b/blog/feed.xml
@@ -1,5 +1,24 @@
 <?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue,
 06 May 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 
0.8.0 
Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><description>&lt;!--
+<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 09 Jun 2025 00:00:00 +0000</lastBuildDate><item><title>Metadata handling in 
user defined 
functions</title><link>https://datafusion.apache.org/blog/2025/06/09/metadata-handling</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>Tim 
Saucer</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item><item><title>Apache
 DataFusion Comet 0.8.0 
Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><descri
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -20,7 +39,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Tue, 06 
May 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User
 defined Window Functions in 
DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window-
 [...]
+&lt;p&gt;This release covers approximately six weeks of development 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Tue, 06 
May 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User
 defined Window Functions in 
DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window-
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index 22ed6b8..d7693d2 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -1,5 +1,88 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion Comet 0.8.0 Release</title><link 
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0"; rel 
[...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata
 handling in user defined functions</title><link 
href="https://datafusion.apache.org/blog/2025/06/09/metadata-handling"; re [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide variety of other use 
cases.&lt;/p&gt;
+&lt;p&gt;TODO: UPDATE LINKS&lt;/p&gt;
+&lt;h1&gt;Why metadata handling is important&lt;/h1&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;code&gt;Schema&lt;/code&gt; 
in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. With the 
prior version of
+user defined functions, we only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns. This
+works well for some features that only rely on the types of data. Other use 
cases may
+need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I write a function that computes the force of 
gravity on an object
+based on it's mass. The general equation is &lt;code&gt;F = m * g&lt;/code&gt; 
where &lt;code&gt;g = 9.8 m/s&lt;/code&gt;. Suppose
+our documentation for the function specifies the output will be in Newtons. 
This is only
+valid if the input unit is in kilograms. With our metadata enhancement, we 
could update
+this function to now evaluate the input units, perform any kind of required
+transformation, and give consistent output every time. We could also have the 
function
+return an error if an invalid input was given, such as providing an input 
where the
+metadata says the units are in &lt;code&gt;meters&lt;/code&gt; instead of a 
unit of mass.&lt;/p&gt;
+&lt;p&gt;One common application of metadata handling is understanding encoding 
of a blob of data.
+Suppose you have a column that contains image data. You could use metadata to 
specify
+the encoding of the image data so you could use the appropriate 
decoder.&lt;/p&gt;
+&lt;h1&gt;How to use metadata in user defined functions&lt;/h1&gt;
+&lt;p&gt;Using input metadata occurs in two different phases of a user defined 
function. Both during
+the planning phase and execution, we have access to these field information. 
This allows
+the user to determine the appropriate output fields during planning and to 
validate the
+input. For other use cases, it may only be necessary to access these fields 
during execution.
+We leave this open to the user.&lt;/p&gt;
+&lt;p&gt;For all types of user defined functions we now evaluate the output 
&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 as well. You can
+specify this to create your own metadata from your functions or to pass 
through metadata from
+one or more of your inputs.&lt;/p&gt;
+&lt;p&gt;In addition to metadata the input field information carries 
nullability. With these you can
+create more expressive nullability of your output data instead of having a 
single output.
+For example, you could write a function to convert a string to uppercase. If 
we know the
+input field is non-nullable, then we can set the output field to non-nullable 
as well.&lt;/p&gt;
+&lt;h1&gt;Extension types&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Working with literals&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Thanks to our sponsor&lt;/h1&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h1&gt;Conclusion&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 
Release</title><link 
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0"; 
rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</id><summary
 type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -20,7 +103,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+&lt;p&gt;This release covers approximately six weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -41,7 +124,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development work and 
is the result of merging 81 PRs from 11
+&lt;p&gt;This release covers approximately six weeks of development work and 
is the result of merging 81 PRs from 11
 contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md"&gt;change
 log&lt;/a&gt; for more information.&lt;/p&gt;
 &lt;h2&gt;Release Highlights&lt;/h2&gt;
 &lt;h3&gt;Performance &amp;amp; Stability&lt;/h3&gt;
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index 612af25..596face 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -1,5 +1,88 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion Comet 0.8.0 Release</title><link 
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0 [...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata
 handling in user defined functions</title><link 
href="https://datafusion.apache.org/blog/2025/06/09/metadata-handlin [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide variety of other use 
cases.&lt;/p&gt;
+&lt;p&gt;TODO: UPDATE LINKS&lt;/p&gt;
+&lt;h1&gt;Why metadata handling is important&lt;/h1&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;code&gt;Schema&lt;/code&gt; 
in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. With the 
prior version of
+user defined functions, we only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns. This
+works well for some features that only rely on the types of data. Other use 
cases may
+need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I write a function that computes the force of 
gravity on an object
+based on it's mass. The general equation is &lt;code&gt;F = m * g&lt;/code&gt; 
where &lt;code&gt;g = 9.8 m/s&lt;/code&gt;. Suppose
+our documentation for the function specifies the output will be in Newtons. 
This is only
+valid if the input unit is in kilograms. With our metadata enhancement, we 
could update
+this function to now evaluate the input units, perform any kind of required
+transformation, and give consistent output every time. We could also have the 
function
+return an error if an invalid input was given, such as providing an input 
where the
+metadata says the units are in &lt;code&gt;meters&lt;/code&gt; instead of a 
unit of mass.&lt;/p&gt;
+&lt;p&gt;One common application of metadata handling is understanding encoding 
of a blob of data.
+Suppose you have a column that contains image data. You could use metadata to 
specify
+the encoding of the image data so you could use the appropriate 
decoder.&lt;/p&gt;
+&lt;h1&gt;How to use metadata in user defined functions&lt;/h1&gt;
+&lt;p&gt;Using input metadata occurs in two different phases of a user defined 
function. Both during
+the planning phase and execution, we have access to these field information. 
This allows
+the user to determine the appropriate output fields during planning and to 
validate the
+input. For other use cases, it may only be necessary to access these fields 
during execution.
+We leave this open to the user.&lt;/p&gt;
+&lt;p&gt;For all types of user defined functions we now evaluate the output 
&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 as well. You can
+specify this to create your own metadata from your functions or to pass 
through metadata from
+one or more of your inputs.&lt;/p&gt;
+&lt;p&gt;In addition to metadata the input field information carries 
nullability. With these you can
+create more expressive nullability of your output data instead of having a 
single output.
+For example, you could write a function to convert a string to uppercase. If 
we know the
+input field is non-nullable, then we can set the output field to non-nullable 
as well.&lt;/p&gt;
+&lt;h1&gt;Extension types&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Working with literals&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Thanks to our sponsor&lt;/h1&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h1&gt;Conclusion&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 
Release</title><link 
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0"; 
rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</id><summary
 type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -20,7 +103,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+&lt;p&gt;This release covers approximately six weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -41,7 +124,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development work and 
is the result of merging 81 PRs from 11
+&lt;p&gt;This release covers approximately six weeks of development work and 
is the result of merging 81 PRs from 11
 contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md"&gt;change
 log&lt;/a&gt; for more information.&lt;/p&gt;
 &lt;h2&gt;Release Highlights&lt;/h2&gt;
 &lt;h3&gt;Performance &amp;amp; Stability&lt;/h3&gt;
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index 3a42c96..c338c99 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -20,7 +20,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+&lt;p&gt;This release covers approximately six weeks of development 
…&lt;/p&gt;</summary><content type="html">&lt;!--
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
@@ -41,7 +41,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development work and 
is the result of merging 81 PRs from 11
+&lt;p&gt;This release covers approximately six weeks of development work and 
is the result of merging 81 PRs from 11
 contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md"&gt;change
 log&lt;/a&gt; for more information.&lt;/p&gt;
 &lt;h2&gt;Release Highlights&lt;/h2&gt;
 &lt;h3&gt;Performance &amp;amp; Stability&lt;/h3&gt;
diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml
index ea12593..b7a4fe1 100644
--- a/blog/feeds/pmc.rss.xml
+++ b/blog/feeds/pmc.rss.xml
@@ -20,7 +20,7 @@ limitations under the License.
 &lt;p&gt;The Apache DataFusion PMC is pleased to announce version 0.8.0 of the 
&lt;a href="https://datafusion.apache.org/comet/"&gt;Comet&lt;/a&gt; 
subproject.&lt;/p&gt;
 &lt;p&gt;Comet is an accelerator for Apache Spark that translates Spark 
physical plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code 
changes.&lt;/p&gt;
-&lt;p&gt;This release covers approximately SIX weeks of development 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Tue, 06 
May 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache
 DataFusion Comet 0.7.0 
Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li
 [...]
+&lt;p&gt;This release covers approximately six weeks of development 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Tue, 06 
May 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache
 DataFusion Comet 0.7.0 
Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/blog/feeds/tim-saucer.atom.xml b/blog/feeds/tim-saucer.atom.xml
new file mode 100644
index 0000000..36c6201
--- /dev/null
+++ b/blog/feeds/tim-saucer.atom.xml
@@ -0,0 +1,85 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - Tim 
Saucer</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/tim-saucer.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata
 handling in user defined functions</title><link 
href="https://datafusion.apache.org/blog/2025/06/09/meta [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide variety of other use 
cases.&lt;/p&gt;
+&lt;p&gt;TODO: UPDATE LINKS&lt;/p&gt;
+&lt;h1&gt;Why metadata handling is important&lt;/h1&gt;
+&lt;p&gt;Data in Arrow record batches carry a &lt;code&gt;Schema&lt;/code&gt; 
in addition to the Arrow arrays. Each
+&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 in this &lt;code&gt;Schema&lt;/code&gt; contains a name, data type, 
nullability, and metadata. The
+metadata is specified as a map of key-value pairs of strings.  In the new
+implementation, during processing of all user defined functions we pass the 
input
+field information.&lt;/p&gt;
+&lt;p&gt;It is often desirable to write a generic function for reuse. With the 
prior version of
+user defined functions, we only had access to the 
&lt;code&gt;DataType&lt;/code&gt; of the input columns. This
+works well for some features that only rely on the types of data. Other use 
cases may
+need additional information that describes the data.&lt;/p&gt;
+&lt;p&gt;For example, suppose I write a function that computes the force of 
gravity on an object
+based on it's mass. The general equation is &lt;code&gt;F = m * g&lt;/code&gt; 
where &lt;code&gt;g = 9.8 m/s&lt;/code&gt;. Suppose
+our documentation for the function specifies the output will be in Newtons. 
This is only
+valid if the input unit is in kilograms. With our metadata enhancement, we 
could update
+this function to now evaluate the input units, perform any kind of required
+transformation, and give consistent output every time. We could also have the 
function
+return an error if an invalid input was given, such as providing an input 
where the
+metadata says the units are in &lt;code&gt;meters&lt;/code&gt; instead of a 
unit of mass.&lt;/p&gt;
+&lt;p&gt;One common application of metadata handling is understanding encoding 
of a blob of data.
+Suppose you have a column that contains image data. You could use metadata to 
specify
+the encoding of the image data so you could use the appropriate 
decoder.&lt;/p&gt;
+&lt;h1&gt;How to use metadata in user defined functions&lt;/h1&gt;
+&lt;p&gt;Using input metadata occurs in two different phases of a user defined 
function. Both during
+the planning phase and execution, we have access to these field information. 
This allows
+the user to determine the appropriate output fields during planning and to 
validate the
+input. For other use cases, it may only be necessary to access these fields 
during execution.
+We leave this open to the user.&lt;/p&gt;
+&lt;p&gt;For all types of user defined functions we now evaluate the output 
&lt;a 
href="https://arrow.apache.org/docs/format/Glossary.html#term-field"&gt;Field&lt;/a&gt;
 as well. You can
+specify this to create your own metadata from your functions or to pass 
through metadata from
+one or more of your inputs.&lt;/p&gt;
+&lt;p&gt;In addition to metadata the input field information carries 
nullability. With these you can
+create more expressive nullability of your output data instead of having a 
single output.
+For example, you could write a function to convert a string to uppercase. If 
we know the
+input field is non-nullable, then we can set the output field to non-nullable 
as well.&lt;/p&gt;
+&lt;h1&gt;Extension types&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Working with literals&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;
+&lt;h1&gt;Thanks to our sponsor&lt;/h1&gt;
+&lt;p&gt;We would like to thank &lt;a 
href="https://rerun.io"&gt;Rerun.io&lt;/a&gt; for sponsoring the development of 
this work. &lt;a href="https://rerun.io"&gt;Rerun.io&lt;/a&gt;
+is building a data visualization system for Physical AI and uses metadata to 
specify 
+context about columns in Arrow record batches.&lt;/p&gt;
+&lt;h1&gt;Conclusion&lt;/h1&gt;
+&lt;p&gt;TODO&lt;/p&gt;</content><category 
term="blog"></category></entry></feed>
\ No newline at end of file
diff --git a/blog/feeds/tim-saucer.rss.xml b/blog/feeds/tim-saucer.rss.xml
new file mode 100644
index 0000000..a5d3197
--- /dev/null
+++ b/blog/feeds/tim-saucer.rss.xml
@@ -0,0 +1,21 @@
+<?xml version="1.0" encoding="utf-8"?>
+<rss version="2.0"><channel><title>Apache DataFusion Blog - Tim 
Saucer</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 09 Jun 2025 00:00:00 +0000</lastBuildDate><item><title>Metadata handling in 
user defined 
functions</title><link>https://datafusion.apache.org/blog/2025/06/09/metadata-handling</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+--&gt;
+&lt;p&gt;&lt;a 
href="https://github.com/apache/datafusion/tree/48.0.0-rc3"&gt;DataFusion 
48.0.0&lt;/a&gt; introduced a change in the interface for writing custom 
functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>Tim 
Saucer</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item></channel></rss>
\ No newline at end of file
diff --git a/blog/index.html b/blog/index.html
index a85fc5d..59a67a7 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -44,6 +44,44 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined 
functions</a></h1>
+                        <p>Posted on: Mon 09 June 2025 by Tim Saucer</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}x
+-->
+<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3";>DataFusion 
48.0.0</a> introduced a change in the interface for writing custom functions
+which enables a variety of interesting improvements. Now users can access 
additional
+data about the input columns to functions, such as their nullability and 
metadata. This
+enables processing of extension types as well as a wide …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/06/09/metadata-handling" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">
@@ -73,7 +111,7 @@ limitations under the License.
 <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a 
href="https://datafusion.apache.org/comet/";>Comet</a> subproject.</p>
 <p>Comet is an accelerator for Apache Spark that translates Spark physical 
plans to DataFusion physical plans for
 improved performance and efficiency without requiring any code changes.</p>
-<p>This release covers approximately SIX weeks of development …</p></p>
+<p>This release covers approximately six weeks of development …</p></p>
                         <footer>
                             <ul class="actions">
                                 <div style="text-align: right"><a 
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue 
Reading</a></div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

Reply via email to