This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new c762288 Commit build products c762288 is described below commit c762288786e15f64cb155398e6a89359d259ee66 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Sun Jun 8 20:49:01 2025 +0000 Commit build products --- blog/2025/05/06/datafusion-comet-0.8.0/index.html | 2 +- blog/2025/06/09/metadata-handling/index.html | 125 ++++++++++++++++++++++ blog/author/pmc.html | 2 +- blog/author/tim-saucer.html | 107 ++++++++++++++++++ blog/category/blog.html | 40 ++++++- blog/feed.xml | 23 +++- blog/feeds/all-en.atom.xml | 89 ++++++++++++++- blog/feeds/blog.atom.xml | 89 ++++++++++++++- blog/feeds/pmc.atom.xml | 4 +- blog/feeds/pmc.rss.xml | 2 +- blog/feeds/tim-saucer.atom.xml | 85 +++++++++++++++ blog/feeds/tim-saucer.rss.xml | 21 ++++ blog/index.html | 40 ++++++- 13 files changed, 614 insertions(+), 15 deletions(-) diff --git a/blog/2025/05/06/datafusion-comet-0.8.0/index.html b/blog/2025/05/06/datafusion-comet-0.8.0/index.html index adf2bc8..12c6841 100644 --- a/blog/2025/05/06/datafusion-comet-0.8.0/index.html +++ b/blog/2025/05/06/datafusion-comet-0.8.0/index.html @@ -64,7 +64,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development work and is the result of merging 81 PRs from 11 +<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> <h2>Release Highlights</h2> <h3>Performance & Stability</h3> diff --git a/blog/2025/06/09/metadata-handling/index.html b/blog/2025/06/09/metadata-handling/index.html new file mode 100644 index 0000000..4e3bd54 --- /dev/null +++ b/blog/2025/06/09/metadata-handling/index.html @@ -0,0 +1,125 @@ +<!doctype html> +<html class="no-js" lang="en" dir="ltr"> + <head> + <meta charset="utf-8"> + <meta http-equiv="x-ua-compatible" content="ie=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1.0"> + <title>Metadata handling in user defined functions - Apache DataFusion Blog</title> +<link href="/blog/css/bootstrap.min.css" rel="stylesheet"> +<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet"> +<link href="/blog/css/headerlink.css" rel="stylesheet"> +<link href="/blog/highlight/default.min.css" rel="stylesheet"> +<script src="/blog/highlight/highlight.js"></script> +<script>hljs.highlightAll();</script> </head> + <body class="d-flex flex-column h-100"> + <main class="flex-shrink-0"> +<!-- nav bar --> +<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth navbar example"> + <div class="container-fluid"> + <a class="navbar-brand" href="/blog"><img src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache DataFusion Blog</a> + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" aria-label="Toggle navigation"> + <span class="navbar-toggler-icon"></span> + </button> + + <div class="collapse navbar-collapse" id="navbarADP"> + <ul class="navbar-nav me-auto mb-2 mb-lg-0"> + <li class="nav-item"> + <a class="nav-link" href="/blog/about.html">About</a> + </li> + <li class="nav-item"> + <a class="nav-link" href="/blog/feed.xml">RSS</a> + </li> + </ul> + </div> + </div> +</nav> + + +<!-- page contents --> +<div id="contents"> + <div class="bg-white p-5 rounded"> + <div class="col-sm-8 mx-auto"> + <h1> + Metadata handling in user defined functions + </h1> + <p>Posted on: Mon 09 June 2025 by Tim Saucer</p> + <!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide variety of other use cases.</p> +<p>TODO: UPDATE LINKS</p> +<h1>Why metadata handling is important</h1> +<p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each +<a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> in this <code>Schema</code> contains a name, data type, nullability, and metadata. The +metadata is specified as a map of key-value pairs of strings. In the new +implementation, during processing of all user defined functions we pass the input +field information.</p> +<p>It is often desirable to write a generic function for reuse. With the prior version of +user defined functions, we only had access to the <code>DataType</code> of the input columns. This +works well for some features that only rely on the types of data. Other use cases may +need additional information that describes the data.</p> +<p>For example, suppose I write a function that computes the force of gravity on an object +based on it's mass. The general equation is <code>F = m * g</code> where <code>g = 9.8 m/s</code>. Suppose +our documentation for the function specifies the output will be in Newtons. This is only +valid if the input unit is in kilograms. With our metadata enhancement, we could update +this function to now evaluate the input units, perform any kind of required +transformation, and give consistent output every time. We could also have the function +return an error if an invalid input was given, such as providing an input where the +metadata says the units are in <code>meters</code> instead of a unit of mass.</p> +<p>One common application of metadata handling is understanding encoding of a blob of data. +Suppose you have a column that contains image data. You could use metadata to specify +the encoding of the image data so you could use the appropriate decoder.</p> +<h1>How to use metadata in user defined functions</h1> +<p>Using input metadata occurs in two different phases of a user defined function. Both during +the planning phase and execution, we have access to these field information. This allows +the user to determine the appropriate output fields during planning and to validate the +input. For other use cases, it may only be necessary to access these fields during execution. +We leave this open to the user.</p> +<p>For all types of user defined functions we now evaluate the output <a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> as well. You can +specify this to create your own metadata from your functions or to pass through metadata from +one or more of your inputs.</p> +<p>In addition to metadata the input field information carries nullability. With these you can +create more expressive nullability of your output data instead of having a single output. +For example, you could write a function to convert a string to uppercase. If we know the +input field is non-nullable, then we can set the output field to non-nullable as well.</p> +<h1>Extension types</h1> +<p>TODO</p> +<h1>Working with literals</h1> +<p>TODO</p> +<h1>Thanks to our sponsor</h1> +<p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> +is building a data visualization system for Physical AI and uses metadata to specify +context about columns in Arrow record batches.</p> +<h1>Conclusion</h1> +<p>TODO</p> + </div> + </div> + </div> + <!-- footer --> + <div class="row"> + <div class="large-12 medium-12 columns"> + <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. + </p> + </div> + </div> + <script src="/blog/js/bootstrap.bundle.min.js"></script> </main> + </body> +</html> diff --git a/blog/author/pmc.html b/blog/author/pmc.html index e8c80cb..33ebb44 100644 --- a/blog/author/pmc.html +++ b/blog/author/pmc.html @@ -76,7 +76,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></p> +<p>This release covers approximately six weeks of development …</p></p> <footer> <ul class="actions"> <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue Reading</a></div> diff --git a/blog/author/tim-saucer.html b/blog/author/tim-saucer.html new file mode 100644 index 0000000..2d88c10 --- /dev/null +++ b/blog/author/tim-saucer.html @@ -0,0 +1,107 @@ + <!doctype html> + <html class="no-js" lang="en" dir="ltr"> + <head> + <meta charset="utf-8"> + <meta http-equiv="x-ua-compatible" content="ie=edge"> + <meta name="viewport" content="width=device-width, initial-scale=1.0"> + <title>Apache DataFusion Blog</title> +<link href="/blog/css/bootstrap.min.css" rel="stylesheet"> +<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet"> +<link href="/blog/css/headerlink.css" rel="stylesheet"> +<link href="/blog/highlight/default.min.css" rel="stylesheet"> +<script src="/blog/highlight/highlight.js"></script> +<script>hljs.highlightAll();</script> <link href="/blog/css/blog_index.css" rel="stylesheet"> + </head> + <body class="d-flex flex-column h-100"> + <main class="flex-shrink-0"> + <div> + +<!-- nav bar --> +<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth navbar example"> + <div class="container-fluid"> + <a class="navbar-brand" href="/blog"><img src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache DataFusion Blog</a> + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" aria-label="Toggle navigation"> + <span class="navbar-toggler-icon"></span> + </button> + + <div class="collapse navbar-collapse" id="navbarADP"> + <ul class="navbar-nav me-auto mb-2 mb-lg-0"> + <li class="nav-item"> + <a class="nav-link" href="/blog/about.html">About</a> + </li> + <li class="nav-item"> + <a class="nav-link" href="/blog/feed.xml">RSS</a> + </li> + </ul> + </div> + </div> +</nav> + <div id="contents"> + <div class="bg-white p-5 rounded"> + <div class="col-sm-8 mx-auto"> +<div id="contents"> + <div class="bg-white p-5 rounded"> + <div class="col-sm-8 mx-auto"> + + <h3>Welcome to the Apache DataFusion Blog!</h3> + <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + + + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined functions</a></h1> + <p>Posted on: Mon 09 June 2025 by Tim Saucer</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/06/09/metadata-handling" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> + + </div> + </div> +</div> </div> + </div> + </div> + + <!-- footer --> + <div class="row"> + <div class="large-12 medium-12 columns"> + <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. + </p> + </div> + </div> + <script src="/blog/js/bootstrap.bundle.min.js"></script> </div> + </main> + </body> + </html> diff --git a/blog/category/blog.html b/blog/category/blog.html index e52d46d..da62b01 100644 --- a/blog/category/blog.html +++ b/blog/category/blog.html @@ -47,6 +47,44 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined functions</a></h1> + <p>Posted on: Mon 09 June 2025 by Tim Saucer</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/06/09/metadata-handling" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -76,7 +114,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></p> +<p>This release covers approximately six weeks of development …</p></p> <footer> <ul class="actions"> <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue Reading</a></div> diff --git a/blog/feed.xml b/blog/feed.xml index 339530b..4e17682 100644 --- a/blog/feed.xml +++ b/blog/feed.xml @@ -1,5 +1,24 @@ <?xml version="1.0" encoding="utf-8"?> -<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue, 06 May 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><description><!-- +<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon, 09 Jun 2025 00:00:00 +0000</lastBuildDate><item><title>Metadata handling in user defined functions</title><link>https://datafusion.apache.org/blog/2025/06/09/metadata-handling</link><description><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><descri [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -20,7 +39,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User defined Window Functions in DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window- [...] +<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User defined Window Functions in DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window- [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 22ed6b8..d7693d2 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -1,5 +1,88 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata handling in user defined functions</title><link href="https://datafusion.apache.org/blog/2025/06/09/metadata-handling" re [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide variety of other use cases.</p> +<p>TODO: UPDATE LINKS</p> +<h1>Why metadata handling is important</h1> +<p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each +<a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> in this <code>Schema</code> contains a name, data type, nullability, and metadata. The +metadata is specified as a map of key-value pairs of strings. In the new +implementation, during processing of all user defined functions we pass the input +field information.</p> +<p>It is often desirable to write a generic function for reuse. With the prior version of +user defined functions, we only had access to the <code>DataType</code> of the input columns. This +works well for some features that only rely on the types of data. Other use cases may +need additional information that describes the data.</p> +<p>For example, suppose I write a function that computes the force of gravity on an object +based on it's mass. The general equation is <code>F = m * g</code> where <code>g = 9.8 m/s</code>. Suppose +our documentation for the function specifies the output will be in Newtons. This is only +valid if the input unit is in kilograms. With our metadata enhancement, we could update +this function to now evaluate the input units, perform any kind of required +transformation, and give consistent output every time. We could also have the function +return an error if an invalid input was given, such as providing an input where the +metadata says the units are in <code>meters</code> instead of a unit of mass.</p> +<p>One common application of metadata handling is understanding encoding of a blob of data. +Suppose you have a column that contains image data. You could use metadata to specify +the encoding of the image data so you could use the appropriate decoder.</p> +<h1>How to use metadata in user defined functions</h1> +<p>Using input metadata occurs in two different phases of a user defined function. Both during +the planning phase and execution, we have access to these field information. This allows +the user to determine the appropriate output fields during planning and to validate the +input. For other use cases, it may only be necessary to access these fields during execution. +We leave this open to the user.</p> +<p>For all types of user defined functions we now evaluate the output <a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> as well. You can +specify this to create your own metadata from your functions or to pass through metadata from +one or more of your inputs.</p> +<p>In addition to metadata the input field information carries nullability. With these you can +create more expressive nullability of your output data instead of having a single output. +For example, you could write a function to convert a string to uppercase. If we know the +input field is non-nullable, then we can set the output field to non-nullable as well.</p> +<h1>Extension types</h1> +<p>TODO</p> +<h1>Working with literals</h1> +<p>TODO</p> +<h1>Thanks to our sponsor</h1> +<p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> +is building a data visualization system for Physical AI and uses metadata to specify +context about columns in Arrow record batches.</p> +<h1>Conclusion</h1> +<p>TODO</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</id><summary type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -20,7 +103,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></summary><content type="html"><!-- +<p>This release covers approximately six weeks of development …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -41,7 +124,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development work and is the result of merging 81 PRs from 11 +<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> <h2>Release Highlights</h2> <h3>Performance &amp; Stability</h3> diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index 612af25..596face 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -1,5 +1,88 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0 [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata handling in user defined functions</title><link href="https://datafusion.apache.org/blog/2025/06/09/metadata-handlin [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide variety of other use cases.</p> +<p>TODO: UPDATE LINKS</p> +<h1>Why metadata handling is important</h1> +<p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each +<a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> in this <code>Schema</code> contains a name, data type, nullability, and metadata. The +metadata is specified as a map of key-value pairs of strings. In the new +implementation, during processing of all user defined functions we pass the input +field information.</p> +<p>It is often desirable to write a generic function for reuse. With the prior version of +user defined functions, we only had access to the <code>DataType</code> of the input columns. This +works well for some features that only rely on the types of data. Other use cases may +need additional information that describes the data.</p> +<p>For example, suppose I write a function that computes the force of gravity on an object +based on it's mass. The general equation is <code>F = m * g</code> where <code>g = 9.8 m/s</code>. Suppose +our documentation for the function specifies the output will be in Newtons. This is only +valid if the input unit is in kilograms. With our metadata enhancement, we could update +this function to now evaluate the input units, perform any kind of required +transformation, and give consistent output every time. We could also have the function +return an error if an invalid input was given, such as providing an input where the +metadata says the units are in <code>meters</code> instead of a unit of mass.</p> +<p>One common application of metadata handling is understanding encoding of a blob of data. +Suppose you have a column that contains image data. You could use metadata to specify +the encoding of the image data so you could use the appropriate decoder.</p> +<h1>How to use metadata in user defined functions</h1> +<p>Using input metadata occurs in two different phases of a user defined function. Both during +the planning phase and execution, we have access to these field information. This allows +the user to determine the appropriate output fields during planning and to validate the +input. For other use cases, it may only be necessary to access these fields during execution. +We leave this open to the user.</p> +<p>For all types of user defined functions we now evaluate the output <a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> as well. You can +specify this to create your own metadata from your functions or to pass through metadata from +one or more of your inputs.</p> +<p>In addition to metadata the input field information carries nullability. With these you can +create more expressive nullability of your output data instead of having a single output. +For example, you could write a function to convert a string to uppercase. If we know the +input field is non-nullable, then we can set the output field to non-nullable as well.</p> +<h1>Extension types</h1> +<p>TODO</p> +<h1>Working with literals</h1> +<p>TODO</p> +<h1>Thanks to our sponsor</h1> +<p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> +is building a data visualization system for Physical AI and uses metadata to specify +context about columns in Arrow record batches.</p> +<h1>Conclusion</h1> +<p>TODO</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</id><summary type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -20,7 +103,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></summary><content type="html"><!-- +<p>This release covers approximately six weeks of development …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -41,7 +124,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development work and is the result of merging 81 PRs from 11 +<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> <h2>Release Highlights</h2> <h3>Performance &amp; Stability</h3> diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml index 3a42c96..c338c99 100644 --- a/blog/feeds/pmc.atom.xml +++ b/blog/feeds/pmc.atom.xml @@ -20,7 +20,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></summary><content type="html"><!-- +<p>This release covers approximately six weeks of development …</p></summary><content type="html"><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -41,7 +41,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development work and is the result of merging 81 PRs from 11 +<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> <h2>Release Highlights</h2> <h3>Performance &amp; Stability</h3> diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml index ea12593..b7a4fe1 100644 --- a/blog/feeds/pmc.rss.xml +++ b/blog/feeds/pmc.rss.xml @@ -20,7 +20,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.7.0 Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li [...] +<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.7.0 Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/tim-saucer.atom.xml b/blog/feeds/tim-saucer.atom.xml new file mode 100644 index 0000000..36c6201 --- /dev/null +++ b/blog/feeds/tim-saucer.atom.xml @@ -0,0 +1,85 @@ +<?xml version="1.0" encoding="utf-8"?> +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - Tim Saucer</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/tim-saucer.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-09T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Metadata handling in user defined functions</title><link href="https://datafusion.apache.org/blog/2025/06/09/meta [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide variety of other use cases.</p> +<p>TODO: UPDATE LINKS</p> +<h1>Why metadata handling is important</h1> +<p>Data in Arrow record batches carry a <code>Schema</code> in addition to the Arrow arrays. Each +<a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> in this <code>Schema</code> contains a name, data type, nullability, and metadata. The +metadata is specified as a map of key-value pairs of strings. In the new +implementation, during processing of all user defined functions we pass the input +field information.</p> +<p>It is often desirable to write a generic function for reuse. With the prior version of +user defined functions, we only had access to the <code>DataType</code> of the input columns. This +works well for some features that only rely on the types of data. Other use cases may +need additional information that describes the data.</p> +<p>For example, suppose I write a function that computes the force of gravity on an object +based on it's mass. The general equation is <code>F = m * g</code> where <code>g = 9.8 m/s</code>. Suppose +our documentation for the function specifies the output will be in Newtons. This is only +valid if the input unit is in kilograms. With our metadata enhancement, we could update +this function to now evaluate the input units, perform any kind of required +transformation, and give consistent output every time. We could also have the function +return an error if an invalid input was given, such as providing an input where the +metadata says the units are in <code>meters</code> instead of a unit of mass.</p> +<p>One common application of metadata handling is understanding encoding of a blob of data. +Suppose you have a column that contains image data. You could use metadata to specify +the encoding of the image data so you could use the appropriate decoder.</p> +<h1>How to use metadata in user defined functions</h1> +<p>Using input metadata occurs in two different phases of a user defined function. Both during +the planning phase and execution, we have access to these field information. This allows +the user to determine the appropriate output fields during planning and to validate the +input. For other use cases, it may only be necessary to access these fields during execution. +We leave this open to the user.</p> +<p>For all types of user defined functions we now evaluate the output <a href="https://arrow.apache.org/docs/format/Glossary.html#term-field">Field</a> as well. You can +specify this to create your own metadata from your functions or to pass through metadata from +one or more of your inputs.</p> +<p>In addition to metadata the input field information carries nullability. With these you can +create more expressive nullability of your output data instead of having a single output. +For example, you could write a function to convert a string to uppercase. If we know the +input field is non-nullable, then we can set the output field to non-nullable as well.</p> +<h1>Extension types</h1> +<p>TODO</p> +<h1>Working with literals</h1> +<p>TODO</p> +<h1>Thanks to our sponsor</h1> +<p>We would like to thank <a href="https://rerun.io">Rerun.io</a> for sponsoring the development of this work. <a href="https://rerun.io">Rerun.io</a> +is building a data visualization system for Physical AI and uses metadata to specify +context about columns in Arrow record batches.</p> +<h1>Conclusion</h1> +<p>TODO</p></content><category term="blog"></category></entry></feed> \ No newline at end of file diff --git a/blog/feeds/tim-saucer.rss.xml b/blog/feeds/tim-saucer.rss.xml new file mode 100644 index 0000000..a5d3197 --- /dev/null +++ b/blog/feeds/tim-saucer.rss.xml @@ -0,0 +1,21 @@ +<?xml version="1.0" encoding="utf-8"?> +<rss version="2.0"><channel><title>Apache DataFusion Blog - Tim Saucer</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon, 09 Jun 2025 00:00:00 +0000</lastBuildDate><item><title>Metadata handling in user defined functions</title><link>https://datafusion.apache.org/blog/2025/06/09/metadata-handling</link><description><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim Saucer</dc:creator><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-06-09:/blog/2025/06/09/metadata-handling</guid><category>blog</category></item></channel></rss> \ No newline at end of file diff --git a/blog/index.html b/blog/index.html index a85fc5d..59a67a7 100644 --- a/blog/index.html +++ b/blog/index.html @@ -44,6 +44,44 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/06/09/metadata-handling">Metadata handling in user defined functions</a></h1> + <p>Posted on: Mon 09 June 2025 by Tim Saucer</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %}x +--> +<p><a href="https://github.com/apache/datafusion/tree/48.0.0-rc3">DataFusion 48.0.0</a> introduced a change in the interface for writing custom functions +which enables a variety of interesting improvements. Now users can access additional +data about the input columns to functions, such as their nullability and metadata. This +enables processing of extension types as well as a wide …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/06/09/metadata-handling" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -73,7 +111,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately SIX weeks of development …</p></p> +<p>This release covers approximately six weeks of development …</p></p> <footer> <ul class="actions"> <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue Reading</a></div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org