This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c282056 Commit build products
c282056 is described below
commit c2820565021b0cfe8b71886e873e41898b780e20
Author: Build Pelican (action) <[email protected]>
AuthorDate: Tue Feb 25 11:01:31 2025 +0000
Commit build products
---
output/2025/02/20/datafusion-45.0.0/index.html | 226 +++++++++++++++++++++
output/author/pmc.html | 42 ++++
output/category/blog.html | 42 ++++
output/feed.xml | 25 ++-
output/feeds/all-en.atom.xml | 190 ++++++++++++++++-
output/feeds/blog.atom.xml | 190 ++++++++++++++++-
output/feeds/pmc.atom.xml | 190 ++++++++++++++++-
output/feeds/pmc.rss.xml | 25 ++-
.../datafusion-45.0.0/performance_over_time.png | Bin 0 -> 52136 bytes
output/index.html | 42 ++++
10 files changed, 967 insertions(+), 5 deletions(-)
diff --git a/output/2025/02/20/datafusion-45.0.0/index.html
b/output/2025/02/20/datafusion-45.0.0/index.html
new file mode 100644
index 0000000..cdf1377
--- /dev/null
+++ b/output/2025/02/20/datafusion-45.0.0/index.html
@@ -0,0 +1,226 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="x-ua-compatible" content="ie=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>Apache DataFusion 45.0.0 Released - Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script> </head>
+ <body class="d-flex flex-column h-100">
+ <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
+ <div class="container-fluid">
+ <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
+ <span class="navbar-toggler-icon"></span>
+ </button>
+
+ <div class="collapse navbar-collapse" id="navbarADP">
+ <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/about.html">About</a>
+ </li>
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/feed.xml">RSS</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+</nav>
+
+
+<!-- page contents -->
+<div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+ <h1>
+ Apache DataFusion 45.0.0 Released
+ </h1>
+ <p>Posted on: Thu 20 February 2025 by pmc</p>
+ <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details -->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion 45.0.0</a>. This
blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development for DataFusion!</p>
+<p><a href="https://datafusion.apache.org/">Apache DataFusion</a> is an
extensible query engine, written in <a
href="https://www.rust-lang.org/">Rust</a>, that
+uses <a href="https://arrow.apache.org">Apache Arrow</a> as its in-memory
format. DataFusion is used by developers to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While <a
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals">DataFusion’s
primary design
+goal</a> is to accelerate the creation of other data centric systems, it has a
+reasonable experience directly out of the box as a <a
href="https://datafusion.apache.org/user-guide/dataframe.html">dataframe
library</a>,
+<a href="https://datafusion.apache.org/python/">python library</a> and <a
href="https://datafusion.apache.org/user-guide/cli/">command line SQL
tool</a>.</p>
+<p>DataFusion's core thesis is that as a community, together we can build much
more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.</p>
+<h2>Community Growth 📈</h2>
+<p>In the last 6 months, between <code>40.0.0</code> and <code>45.0.0</code>,
our community continues to
+grow in new and exciting ways.</p>
+<ol>
+<li>We added several PMC members and new committers: <a
href="https://github.com/jayzhan211">@jayzhan211</a> and <a
href="https://github.com/jonahgao">@jonahgao</a> joined the PMC,
+ <a href="https://github.com/2010YOUY01">@2010YOUY01</a>, <a
href="https://github.com/rachelint">@rachelint</a>, <a
href="https://github.com/findepi/">@findpi</a>, <a
href="https://github.com/iffyio">@iffyio</a>, <a
href="https://github.com/goldmedal">@goldmedal</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/Michael-J-Ward">@Michael-J-Ward</a> and <a
href="https://github.com/korowa">@korowa</a>
+ joined as committers. See the <a
href="https://lists.apache.org/[email protected]">mailing
list</a> for more details.</li>
+<li>In the <a href="https://github.com/apache/arrow-datafusion">core
DataFusion repo</a> alone we reviewed and accepted almost 1600 PRs from 206
different
+ committers, created over 1100 issues and closed 751 of them 🚀. All changes
are listed in the detailed
+ <a
href="https://github.com/apache/datafusion/tree/main/dev/changelog">changelogs</a>.</li>
+<li>DataFusion focused meetups happened in multiple cities around the world:
<a
href="https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273">Hangzhou</a>,
<a href="https://github.com/apache/datafusion/discussions/11431">Belgrade</a>,
<a href="https://github.com/apache/datafusion/discussions/11213">New York</a>,
+ <a
href="https://github.com/apache/datafusion/discussions/10348">Seattle</a>, <a
href="https://github.com/apache/datafusion/discussions/12894">Chicago</a>, <a
href="https://github.com/apache/datafusion/discussions/13165">Boston</a> and <a
href="https://github.com/apache/datafusion/discussions/12988">Amsterdam</a> as
well as a Rust NYC meetup in NYC focused on DataFusion.</li>
+</ol>
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+ 1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+ 206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+<p>DataFusion has put in an application to be part of <a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a> with a
+<a href="https://github.com/apache/datafusion/issues/14478">number of
ideas</a> for projects with mentors already selected. Additionally, <a
href="https://github.com/apache/datafusion/issues/14373">some ideas</a> on
+how to make DataFusion an ideal selection for university database projects
such as the
+<a href="https://15445.courses.cs.cmu.edu/spring2025/">CMU database
classes</a> have been put forward.</p>
+<p>In addition, DataFusion has been appearing publicly more and more, both
online and offline. Here are some highlights:</p>
+<ol>
+<li>A <a href="https://uwheel.rs/post/datafusion_uwheel/">demonstration of how
uwheel</a> is integrated into DataFusion</li>
+<li>Integrating StringView into DataFusion - <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/">part
1</a> and <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/">part
2</a></li>
+<li><a href="https://techontherocks.show/3">Building streams</a> with
DataFusion</li>
+<li><a href="https://blog.haoxp.xyz/posts/caching-datafusion">Caching in
DataFusion</a>: Don't read twice</li>
+<li><a href="https://blog.haoxp.xyz/posts/parquet-to-arrow/">Parquet pruning
in DataFusion</a>: Read no more than you need</li>
+<li>DataFusion is one of <a
href="https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3">The
10 coolest open source software tools</a></li>
+<li><a href="https://www.denormalized.io/blog/building-databases">Building
databases over a weekend</a></li>
+</ol>
+<h2>Improved Performance 🚀</h2>
+<p>DataFusion hit a milestone in its development by becoming <a
href="https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/">the
fastest single node engine</a>
+for querying Apache Parquet files in <a
href="https://benchmark.clickhouse.com/">clickbench</a> benchmark for the
43.0.0 release. A <a
href="https://github.com/apache/datafusion/issues/12821">lot
+of work</a> went into making this happen! While other engines have
subsequently gotten faster,
+displacing DataFusion from the top spot, DataFusion still remains near the top
and we <a href="https://github.com/apache/datafusion/issues/14586">are planning
+more improvements</a>.</p>
+<p><img alt="ClickBench performance results over time for DataFusion"
class="img-responsive"
src="/blog/images/datafusion-45.0.0/performance_over_time.png"
width="100%"/></p>
+<p><strong>Figure 1</strong>: ClickBench performance improved over 33% between
DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). </p>
+<p>The task of <a
href="https://github.com/apache/datafusion/issues/10918">integrating</a> the
new <a
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html">Arrow
StringView</a> which significantly improves performance
+for workloads that scan, filter and group by variable length string and binary
data was completed
+and enabled by default in the past 6 months. The improvement is especially
pronounced for Parquet
+files due to <a href="https://github.com/apache/arrow-rs/issues/5530">upstream
work in the parquet reader</a>. Kudos to <a
href="https://github.com/XiangpengHong">@XiangpengHong</a>, <a
href="https://github.com/AriesDevil">@AriesDevil</a>,
+<a href="https://github.com/PsiACE">@PsiACE</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/a10y">@a10y</a>, and <a
href="https://github.com/RinChanNOWWW">@RinChanNOWWW</a> for driving this
project.</p>
+<h2>Improved Quality 📋</h2>
+<p>DataFusion continues to improve overall in quality. In addition to ongoing
bug
+fixes, one of the most exciting improvements in the last 6 months was the
addition of the
+<a href="https://github.com/apache/datafusion/pull/13936">SQLite sqllogictest
suite</a> thanks to <a href="https://github.com/Omega359">@Omega359</a>. These
tests run over 5 million
+sql statements on every push to the main branch.</p>
+<p>Support for <a
href="https://github.com/apache/datafusion/pull/13651">explicitly checking
logical plan invariants</a> was added by <a
href="https://github.com/wiedld">@wiedld</a> which
+can help catch implicit changes that might cause problems during upgrades.</p>
+<p>We have also started other quality initiatives to make it <a
href="https://github.com/apache/datafusion/issues/13525">easier to use
DataFusion</a>
+based on <a href="https://glaredb.com/">GlareDB</a>'s experience along with
more <a href="https://github.com/apache/datafusion/issues/13661">extensive
prerelease testing</a>. </p>
+<h2>Improved Documentation 📚</h2>
+<p>We continue to improve the documentation to make it easier to get started
using DataFusion.
+During the last 6 months two projects were initiated to migrate the function
documentation
+from strictly static markdown files. First, <a
href="https://github.com/apache/datafusion/pull/12668">@Omega359</a> to allow
function
+documentation to be generated from code and <a
href="https://github.com/jonathanc-n">@jonathanc-n</a> and others helped with
the migration,
+then <a href="https://github.com/comphead">@comphead</a> lead a project to <a
href="https://github.com/apache/datafusion/pull/12822">create a doc macro</a>
to allow for an even easier way to write
+function documentation. A special thanks to <a
href="https://github.com/Chen-Yuan-Lai">@Chen-Yuan-Lai</a> for migrating many
functions to
+the new syntax.</p>
+<p>Additionally, the <a
href="https://github.com/apache/datafusion/pull/13877">examples</a> were <a
href="https://github.com/apache/datafusion/pull/13905">refactored</a> and <a
href="https://github.com/apache/datafusion/pull/13950">cleaned up</a> to
improve their usefulness.</p>
+<h2>New Features ✨</h2>
+<p>There are too many new features in the last 6 months to list them all, but
here
+are some highlights:</p>
+<h3>Functions</h3>
+<ul>
+<li>Uniform Window Functions: <code>BuiltInWindowFunctions</code> was removed
and all now use UDFs (<a href="https://github.com/jcsherin">@jcsherin</a>)</li>
+<li>Uniform Aggregate Functions: <code>BuiltInAggregateFunctions</code> was
removed and all now use UDFs</li>
+<li>As mentioned above function documentation was extracted from the markdown
files</li>
+<li>Some new functions and sql support were added including '<a
href="https://github.com/apache/datafusion/pull/13799">show functions</a>', '<a
href="https://github.com/apache/datafusion/pull/11347">to_local_time</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/12970">regexp_count</a>', '<a
href="https://github.com/apache/datafusion/pull/11969">map_extract</a>', '<a
href="https://github.com/apache/datafusion/pull/12211">array_distance</a>', '<a
href="https://github.com/apache/datafusion/pull/12329">array_any_value</a>',
'<a href="https://github.com/apache/datafusion/pull/12474">greatest</a>',
+ '<a href="https://github.com/apache/datafusion/pull/13786">least</a>', '<a
href="https://github.com/apache/datafusion/pull/14217">arrays_overlap</a>'</li>
+</ul>
+<h3>FFI</h3>
+<ul>
+<li>Foreign Function Interface work has started. This should allow for
+ <a href="https://github.com/apache/datafusion/pull/12920">using table
providers</a> across languages and versions of DataFusion. This
+ is especially pertinent for integration with <a
href="https://delta-io.github.io/delta-rs/">delta-rs</a> and other table
formats.</li>
+</ul>
+<h3>Materialized Views</h3>
+<ul>
+<li><a href="https://github.com/suremarc">@suremarc</a> has added a <a
href="https://github.com/datafusion-contrib/datafusion-materialized-views">materialized
view implementation</a> in datafusion-contrib 🚀</li>
+</ul>
+<h3>Substrait</h3>
+<ul>
+<li>A lot of work was put into improving and enhancing substrait support (<a
href="https://github.com/Blizzara">@Blizzara</a>, <a
href="https://github.com/westonpace">@westonpace</a>, <a
href="https://github.com/tokoko">@tokoko</a>, <a
href="https://github.com/vbarua">@vbarua</a>, <a
href="https://github.com/LatrecheYasser">@LatrecheYasser</a>, <a
href="https://github.com/notfilippo">@notfilippo</a> and others)</li>
+</ul>
+<h2>Looking Ahead: The Next Six Months ðŸ”</h2>
+<p>One of the long term goals of <a
href="https://github.com/alamb">@alamb</a>, DataFusion's PMC chair, has been to
have
+<a href="https://www.influxdata.com/blog/datafusion-2025-influxdb/">1000
DataFusion based projects</a>. This may be the year that happens!</p>
+<p>The community has been <a
href="https://github.com/apache/datafusion/issues/14580">discussing what we
will work on in the next six months</a>.
+Some major initiatives are likely to be:</p>
+<ol>
+<li><em>Performance</em>: A <a
href="https://github.com/apache/datafusion/issues/14482">number of items have
been identified</a> as areas that could use additional work</li>
+<li><em>Memory usage</em>: Tracking and improving memory usage, statistics and
spilling to disk </li>
+<li><em><a href="https://summerofcode.withgoogle.com/">Google Summer of
Code</a> (GSOC)</em>: DataFusion is hopefully selected as a project and we
start accepting and supporting student projects </li>
+<li><em>FFI</em>: Extending the FFI implementation to support to all types of
UDF's and SessionContext</li>
+<li><em>Spark Functions</em>: A <a
href="https://github.com/apache/datafusion/issues/5600">proposal has been made
to add a crate</a> covering spark compatible builtin functions </li>
+</ol>
+<h2>How to Get Involved</h2>
+<p>DataFusion is not a project built or driven by a single person, company, or
+foundation. Rather, our community of users and contributors work together to
+build a shared technology that none of us could have built alone.</p>
+<p>If you are interested in joining us we would love to have you. You can try
out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests or code. A list of open issues suitable for beginners is <a
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">here</a>
and you
+can find how to reach us on the <a
href="https://datafusion.apache.org/contributor-guide/communication.html">communication
doc</a>.</p>
+ </div>
+ </div>
+ </div>
+ <!-- footer -->
+ <div class="row">
+ <div class="large-12 medium-12 columns">
+ <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+ Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
+ Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </div>
+ <script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
+ </body>
+</html>
diff --git a/output/author/pmc.html b/output/author/pmc.html
index 1c09344..b70432d 100644
--- a/output/author/pmc.html
+++ b/output/author/pmc.html
@@ -47,6 +47,48 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/20/datafusion-45.0.0">Apache DataFusion 45.0.0
Released</a></h1>
+ <p>Posted on: Thu 20 February 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details -->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion 45.0.0</a>. This
blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/20/datafusion-45.0.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/output/category/blog.html b/output/category/blog.html
index 739cf1c..9c158d8 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -47,6 +47,48 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/20/datafusion-45.0.0">Apache DataFusion 45.0.0
Released</a></h1>
+ <p>Posted on: Thu 20 February 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details -->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion 45.0.0</a>. This
blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/20/datafusion-45.0.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/output/feed.xml b/output/feed.xml
index d9f2c7b..68fa22c 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,28 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
17 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Thu,
20 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
45.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Thu, 20
Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-20:/blog/2025/02/20/datafusion-45.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 974c2b8..936aba6 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,193 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0" rel
[...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-20T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion 45.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0"
rel="alterna [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development for DataFusion!</p>
+<p><a href="https://datafusion.apache.org/">Apache
DataFusion</a> is an extensible query engine, written in <a
href="https://www.rust-lang.org/">Rust</a>, that
+uses <a href="https://arrow.apache.org">Apache Arrow</a> as its
in-memory format. DataFusion is used by developers to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While <a
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals">DataFusion&rsquo;s
primary design
+goal</a> is to accelerate the creation of other data centric systems, it
has a
+reasonable experience directly out of the box as a <a
href="https://datafusion.apache.org/user-guide/dataframe.html">dataframe
library</a>,
+<a href="https://datafusion.apache.org/python/">python library</a>
and <a href="https://datafusion.apache.org/user-guide/cli/">command line
SQL tool</a>.</p>
+<p>DataFusion's core thesis is that as a community, together we can
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.</p>
+<h2>Community Growth 📈</h2>
+<p>In the last 6 months, between <code>40.0.0</code> and
<code>45.0.0</code>, our community continues to
+grow in new and exciting ways.</p>
+<ol>
+<li>We added several PMC members and new committers: <a
href="https://github.com/jayzhan211">@jayzhan211</a> and <a
href="https://github.com/jonahgao">@jonahgao</a> joined the PMC,
+ <a href="https://github.com/2010YOUY01">@2010YOUY01</a>, <a
href="https://github.com/rachelint">@rachelint</a>, <a
href="https://github.com/findepi/">@findpi</a>, <a
href="https://github.com/iffyio">@iffyio</a>, <a
href="https://github.com/goldmedal">@goldmedal</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/Michael-J-Ward">@Michael-J-Ward</a> and <a
href="https [...]
+ joined as committers. See the <a
href="https://lists.apache.org/[email protected]">mailing
list</a> for more details.</li>
+<li>In the <a
href="https://github.com/apache/arrow-datafusion">core DataFusion
repo</a> alone we reviewed and accepted almost 1600 PRs from 206 different
+ committers, created over 1100 issues and closed 751 of them 🚀. All changes
are listed in the detailed
+ <a
href="https://github.com/apache/datafusion/tree/main/dev/changelog">changelogs</a>.</li>
+<li>DataFusion focused meetups happened in multiple cities around the
world: <a
href="https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273">Hangzhou</a>,
<a
href="https://github.com/apache/datafusion/discussions/11431">Belgrade</a>,
<a href="https://github.com/apache/datafusion/discussions/11213">New
York</a>,
+ <a
href="https://github.com/apache/datafusion/discussions/10348">Seattle</a>,
<a
href="https://github.com/apache/datafusion/discussions/12894">Chicago</a>,
<a
href="https://github.com/apache/datafusion/discussions/13165">Boston</a>
and <a
href="https://github.com/apache/datafusion/discussions/12988">Amsterdam</a>
as well as a Rust NYC meetup in NYC focused on DataFusion.</li>
+</ol>
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+ 1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+ 206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+<p>DataFusion has put in an application to be part of <a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
with a
+<a href="https://github.com/apache/datafusion/issues/14478">number of
ideas</a> for projects with mentors already selected. Additionally, <a
href="https://github.com/apache/datafusion/issues/14373">some
ideas</a> on
+how to make DataFusion an ideal selection for university database projects
such as the
+<a href="https://15445.courses.cs.cmu.edu/spring2025/">CMU database
classes</a> have been put forward.</p>
+<p>In addition, DataFusion has been appearing publicly more and more,
both online and offline. Here are some highlights:</p>
+<ol>
+<li>A <a
href="https://uwheel.rs/post/datafusion_uwheel/">demonstration of how
uwheel</a> is integrated into DataFusion</li>
+<li>Integrating StringView into DataFusion - <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/">part
1</a> and <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/">part
2</a></li>
+<li><a href="https://techontherocks.show/3">Building
streams</a> with DataFusion</li>
+<li><a
href="https://blog.haoxp.xyz/posts/caching-datafusion">Caching in
DataFusion</a>: Don't read twice</li>
+<li><a
href="https://blog.haoxp.xyz/posts/parquet-to-arrow/">Parquet pruning in
DataFusion</a>: Read no more than you need</li>
+<li>DataFusion is one of <a
href="https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3">The
10 coolest open source software tools</a></li>
+<li><a
href="https://www.denormalized.io/blog/building-databases">Building
databases over a weekend</a></li>
+</ol>
+<h2>Improved Performance 🚀</h2>
+<p>DataFusion hit a milestone in its development by becoming <a
href="https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/">the
fastest single node engine</a>
+for querying Apache Parquet files in <a
href="https://benchmark.clickhouse.com/">clickbench</a> benchmark for
the 43.0.0 release. A <a
href="https://github.com/apache/datafusion/issues/12821">lot
+of work</a> went into making this happen! While other engines have
subsequently gotten faster,
+displacing DataFusion from the top spot, DataFusion still remains near the top
and we <a href="https://github.com/apache/datafusion/issues/14586">are
planning
+more improvements</a>.</p>
+<p><img alt="ClickBench performance results over time for DataFusion"
class="img-responsive"
src="/blog/images/datafusion-45.0.0/performance_over_time.png"
width="100%"/></p>
+<p><strong>Figure 1</strong>: ClickBench performance
improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). </p>
+<p>The task of <a
href="https://github.com/apache/datafusion/issues/10918">integrating</a>
the new <a
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html">Arrow
StringView</a> which significantly improves performance
+for workloads that scan, filter and group by variable length string and binary
data was completed
+and enabled by default in the past 6 months. The improvement is especially
pronounced for Parquet
+files due to <a
href="https://github.com/apache/arrow-rs/issues/5530">upstream work in the
parquet reader</a>. Kudos to <a
href="https://github.com/XiangpengHong">@XiangpengHong</a>, <a
href="https://github.com/AriesDevil">@AriesDevil</a>,
+<a href="https://github.com/PsiACE">@PsiACE</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/a10y">@a10y</a>, and <a
href="https://github.com/RinChanNOWWW">@RinChanNOWWW</a> for driving
this project.</p>
+<h2>Improved Quality 📋</h2>
+<p>DataFusion continues to improve overall in quality. In addition to
ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the
addition of the
+<a href="https://github.com/apache/datafusion/pull/13936">SQLite
sqllogictest suite</a> thanks to <a
href="https://github.com/Omega359">@Omega359</a>. These tests run over
5 million
+sql statements on every push to the main branch.</p>
+<p>Support for <a
href="https://github.com/apache/datafusion/pull/13651">explicitly checking
logical plan invariants</a> was added by <a
href="https://github.com/wiedld">@wiedld</a> which
+can help catch implicit changes that might cause problems during
upgrades.</p>
+<p>We have also started other quality initiatives to make it <a
href="https://github.com/apache/datafusion/issues/13525">easier to use
DataFusion</a>
+based on <a href="https://glaredb.com/">GlareDB</a>'s experience
along with more <a
href="https://github.com/apache/datafusion/issues/13661">extensive
prerelease testing</a>. </p>
+<h2>Improved Documentation 📚</h2>
+<p>We continue to improve the documentation to make it easier to get
started using DataFusion.
+During the last 6 months two projects were initiated to migrate the function
documentation
+from strictly static markdown files. First, <a
href="https://github.com/apache/datafusion/pull/12668">@Omega359</a>
to allow function
+documentation to be generated from code and <a
href="https://github.com/jonathanc-n">@jonathanc-n</a> and others
helped with the migration,
+then <a href="https://github.com/comphead">@comphead</a> lead a
project to <a
href="https://github.com/apache/datafusion/pull/12822">create a doc
macro</a> to allow for an even easier way to write
+function documentation. A special thanks to <a
href="https://github.com/Chen-Yuan-Lai">@Chen-Yuan-Lai</a> for
migrating many functions to
+the new syntax.</p>
+<p>Additionally, the <a
href="https://github.com/apache/datafusion/pull/13877">examples</a>
were <a
href="https://github.com/apache/datafusion/pull/13905">refactored</a>
and <a href="https://github.com/apache/datafusion/pull/13950">cleaned
up</a> to improve their usefulness.</p>
+<h2>New Features ✨</h2>
+<p>There are too many new features in the last 6 months to list them
all, but here
+are some highlights:</p>
+<h3>Functions</h3>
+<ul>
+<li>Uniform Window Functions:
<code>BuiltInWindowFunctions</code> was removed and all now use
UDFs (<a
href="https://github.com/jcsherin">@jcsherin</a>)</li>
+<li>Uniform Aggregate Functions:
<code>BuiltInAggregateFunctions</code> was removed and all now use
UDFs</li>
+<li>As mentioned above function documentation was extracted from the
markdown files</li>
+<li>Some new functions and sql support were added including '<a
href="https://github.com/apache/datafusion/pull/13799">show
functions</a>', '<a
href="https://github.com/apache/datafusion/pull/11347">to_local_time</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/12970">regexp_count</a>',
'<a
href="https://github.com/apache/datafusion/pull/11969">map_extract</a>',
'<a
href="https://github.com/apache/datafusion/pull/12211">array_distance</a>',
'<a
href="https://github.com/apache/datafusion/pull/12329">array_any_value</a>',
'<a
href="https://github.com/apache/datafusion/pull/12474">greatest</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/13786">least</a>',
'<a
href="https://github.com/apache/datafusion/pull/14217">arrays_overlap</a>'</li>
+</ul>
+<h3>FFI</h3>
+<ul>
+<li>Foreign Function Interface work has started. This should allow for
+ <a href="https://github.com/apache/datafusion/pull/12920">using table
providers</a> across languages and versions of DataFusion. This
+ is especially pertinent for integration with <a
href="https://delta-io.github.io/delta-rs/">delta-rs</a> and other
table formats.</li>
+</ul>
+<h3>Materialized Views</h3>
+<ul>
+<li><a href="https://github.com/suremarc">@suremarc</a> has
added a <a
href="https://github.com/datafusion-contrib/datafusion-materialized-views">materialized
view implementation</a> in datafusion-contrib 🚀</li>
+</ul>
+<h3>Substrait</h3>
+<ul>
+<li>A lot of work was put into improving and enhancing substrait support
(<a href="https://github.com/Blizzara">@Blizzara</a>, <a
href="https://github.com/westonpace">@westonpace</a>, <a
href="https://github.com/tokoko">@tokoko</a>, <a
href="https://github.com/vbarua">@vbarua</a>, <a
href="https://github.com/LatrecheYasser">@LatrecheYasser</a>, <a
href="https://github.com/notfilippo">@notfilippo</a> and others) [...]
+</ul>
+<h2>Looking Ahead: The Next Six Months ðŸ”</h2>
+<p>One of the long term goals of <a
href="https://github.com/alamb">@alamb</a>, DataFusion's PMC chair,
has been to have
+<a href="https://www.influxdata.com/blog/datafusion-2025-influxdb/">1000
DataFusion based projects</a>. This may be the year that
happens!</p>
+<p>The community has been <a
href="https://github.com/apache/datafusion/issues/14580">discussing what we
will work on in the next six months</a>.
+Some major initiatives are likely to be:</p>
+<ol>
+<li><em>Performance</em>: A <a
href="https://github.com/apache/datafusion/issues/14482">number of items
have been identified</a> as areas that could use additional
work</li>
+<li><em>Memory usage</em>: Tracking and improving memory
usage, statistics and spilling to disk </li>
+<li><em><a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
(GSOC)</em>: DataFusion is hopefully selected as a project and we start
accepting and supporting student projects </li>
+<li><em>FFI</em>: Extending the FFI implementation to
support to all types of UDF's and SessionContext</li>
+<li><em>Spark Functions</em>: A <a
href="https://github.com/apache/datafusion/issues/5600">proposal has been
made to add a crate</a> covering spark compatible builtin functions
</li>
+</ol>
+<h2>How to Get Involved</h2>
+<p>DataFusion is not a project built or driven by a single person,
company, or
+foundation. Rather, our community of users and contributors work together to
+build a shared technology that none of us could have built alone.</p>
+<p>If you are interested in joining us we would love to have you. You
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests or code. A list of open issues suitable for beginners is <a
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">here</a>
and you
+can find how to reach us on the <a
href="https://datafusion.apache.org/contributor-guide/communication.html">communication
doc</a>.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.6.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0"
rel="alternate"></link><published>2025-02-17T00:00:00+00:00</published><updated>2025-02-17T00:00:00+00:00</updated><author><name>pmc</name></author
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index d980ab4..c1ccfd3 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,193 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0 [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-20T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion 45.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0" rel="al
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development for DataFusion!</p>
+<p><a href="https://datafusion.apache.org/">Apache
DataFusion</a> is an extensible query engine, written in <a
href="https://www.rust-lang.org/">Rust</a>, that
+uses <a href="https://arrow.apache.org">Apache Arrow</a> as its
in-memory format. DataFusion is used by developers to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While <a
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals">DataFusion&rsquo;s
primary design
+goal</a> is to accelerate the creation of other data centric systems, it
has a
+reasonable experience directly out of the box as a <a
href="https://datafusion.apache.org/user-guide/dataframe.html">dataframe
library</a>,
+<a href="https://datafusion.apache.org/python/">python library</a>
and <a href="https://datafusion.apache.org/user-guide/cli/">command line
SQL tool</a>.</p>
+<p>DataFusion's core thesis is that as a community, together we can
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.</p>
+<h2>Community Growth 📈</h2>
+<p>In the last 6 months, between <code>40.0.0</code> and
<code>45.0.0</code>, our community continues to
+grow in new and exciting ways.</p>
+<ol>
+<li>We added several PMC members and new committers: <a
href="https://github.com/jayzhan211">@jayzhan211</a> and <a
href="https://github.com/jonahgao">@jonahgao</a> joined the PMC,
+ <a href="https://github.com/2010YOUY01">@2010YOUY01</a>, <a
href="https://github.com/rachelint">@rachelint</a>, <a
href="https://github.com/findepi/">@findpi</a>, <a
href="https://github.com/iffyio">@iffyio</a>, <a
href="https://github.com/goldmedal">@goldmedal</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/Michael-J-Ward">@Michael-J-Ward</a> and <a
href="https [...]
+ joined as committers. See the <a
href="https://lists.apache.org/[email protected]">mailing
list</a> for more details.</li>
+<li>In the <a
href="https://github.com/apache/arrow-datafusion">core DataFusion
repo</a> alone we reviewed and accepted almost 1600 PRs from 206 different
+ committers, created over 1100 issues and closed 751 of them 🚀. All changes
are listed in the detailed
+ <a
href="https://github.com/apache/datafusion/tree/main/dev/changelog">changelogs</a>.</li>
+<li>DataFusion focused meetups happened in multiple cities around the
world: <a
href="https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273">Hangzhou</a>,
<a
href="https://github.com/apache/datafusion/discussions/11431">Belgrade</a>,
<a href="https://github.com/apache/datafusion/discussions/11213">New
York</a>,
+ <a
href="https://github.com/apache/datafusion/discussions/10348">Seattle</a>,
<a
href="https://github.com/apache/datafusion/discussions/12894">Chicago</a>,
<a
href="https://github.com/apache/datafusion/discussions/13165">Boston</a>
and <a
href="https://github.com/apache/datafusion/discussions/12988">Amsterdam</a>
as well as a Rust NYC meetup in NYC focused on DataFusion.</li>
+</ol>
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+ 1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+ 206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+<p>DataFusion has put in an application to be part of <a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
with a
+<a href="https://github.com/apache/datafusion/issues/14478">number of
ideas</a> for projects with mentors already selected. Additionally, <a
href="https://github.com/apache/datafusion/issues/14373">some
ideas</a> on
+how to make DataFusion an ideal selection for university database projects
such as the
+<a href="https://15445.courses.cs.cmu.edu/spring2025/">CMU database
classes</a> have been put forward.</p>
+<p>In addition, DataFusion has been appearing publicly more and more,
both online and offline. Here are some highlights:</p>
+<ol>
+<li>A <a
href="https://uwheel.rs/post/datafusion_uwheel/">demonstration of how
uwheel</a> is integrated into DataFusion</li>
+<li>Integrating StringView into DataFusion - <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/">part
1</a> and <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/">part
2</a></li>
+<li><a href="https://techontherocks.show/3">Building
streams</a> with DataFusion</li>
+<li><a
href="https://blog.haoxp.xyz/posts/caching-datafusion">Caching in
DataFusion</a>: Don't read twice</li>
+<li><a
href="https://blog.haoxp.xyz/posts/parquet-to-arrow/">Parquet pruning in
DataFusion</a>: Read no more than you need</li>
+<li>DataFusion is one of <a
href="https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3">The
10 coolest open source software tools</a></li>
+<li><a
href="https://www.denormalized.io/blog/building-databases">Building
databases over a weekend</a></li>
+</ol>
+<h2>Improved Performance 🚀</h2>
+<p>DataFusion hit a milestone in its development by becoming <a
href="https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/">the
fastest single node engine</a>
+for querying Apache Parquet files in <a
href="https://benchmark.clickhouse.com/">clickbench</a> benchmark for
the 43.0.0 release. A <a
href="https://github.com/apache/datafusion/issues/12821">lot
+of work</a> went into making this happen! While other engines have
subsequently gotten faster,
+displacing DataFusion from the top spot, DataFusion still remains near the top
and we <a href="https://github.com/apache/datafusion/issues/14586">are
planning
+more improvements</a>.</p>
+<p><img alt="ClickBench performance results over time for DataFusion"
class="img-responsive"
src="/blog/images/datafusion-45.0.0/performance_over_time.png"
width="100%"/></p>
+<p><strong>Figure 1</strong>: ClickBench performance
improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). </p>
+<p>The task of <a
href="https://github.com/apache/datafusion/issues/10918">integrating</a>
the new <a
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html">Arrow
StringView</a> which significantly improves performance
+for workloads that scan, filter and group by variable length string and binary
data was completed
+and enabled by default in the past 6 months. The improvement is especially
pronounced for Parquet
+files due to <a
href="https://github.com/apache/arrow-rs/issues/5530">upstream work in the
parquet reader</a>. Kudos to <a
href="https://github.com/XiangpengHong">@XiangpengHong</a>, <a
href="https://github.com/AriesDevil">@AriesDevil</a>,
+<a href="https://github.com/PsiACE">@PsiACE</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/a10y">@a10y</a>, and <a
href="https://github.com/RinChanNOWWW">@RinChanNOWWW</a> for driving
this project.</p>
+<h2>Improved Quality 📋</h2>
+<p>DataFusion continues to improve overall in quality. In addition to
ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the
addition of the
+<a href="https://github.com/apache/datafusion/pull/13936">SQLite
sqllogictest suite</a> thanks to <a
href="https://github.com/Omega359">@Omega359</a>. These tests run over
5 million
+sql statements on every push to the main branch.</p>
+<p>Support for <a
href="https://github.com/apache/datafusion/pull/13651">explicitly checking
logical plan invariants</a> was added by <a
href="https://github.com/wiedld">@wiedld</a> which
+can help catch implicit changes that might cause problems during
upgrades.</p>
+<p>We have also started other quality initiatives to make it <a
href="https://github.com/apache/datafusion/issues/13525">easier to use
DataFusion</a>
+based on <a href="https://glaredb.com/">GlareDB</a>'s experience
along with more <a
href="https://github.com/apache/datafusion/issues/13661">extensive
prerelease testing</a>. </p>
+<h2>Improved Documentation 📚</h2>
+<p>We continue to improve the documentation to make it easier to get
started using DataFusion.
+During the last 6 months two projects were initiated to migrate the function
documentation
+from strictly static markdown files. First, <a
href="https://github.com/apache/datafusion/pull/12668">@Omega359</a>
to allow function
+documentation to be generated from code and <a
href="https://github.com/jonathanc-n">@jonathanc-n</a> and others
helped with the migration,
+then <a href="https://github.com/comphead">@comphead</a> lead a
project to <a
href="https://github.com/apache/datafusion/pull/12822">create a doc
macro</a> to allow for an even easier way to write
+function documentation. A special thanks to <a
href="https://github.com/Chen-Yuan-Lai">@Chen-Yuan-Lai</a> for
migrating many functions to
+the new syntax.</p>
+<p>Additionally, the <a
href="https://github.com/apache/datafusion/pull/13877">examples</a>
were <a
href="https://github.com/apache/datafusion/pull/13905">refactored</a>
and <a href="https://github.com/apache/datafusion/pull/13950">cleaned
up</a> to improve their usefulness.</p>
+<h2>New Features ✨</h2>
+<p>There are too many new features in the last 6 months to list them
all, but here
+are some highlights:</p>
+<h3>Functions</h3>
+<ul>
+<li>Uniform Window Functions:
<code>BuiltInWindowFunctions</code> was removed and all now use
UDFs (<a
href="https://github.com/jcsherin">@jcsherin</a>)</li>
+<li>Uniform Aggregate Functions:
<code>BuiltInAggregateFunctions</code> was removed and all now use
UDFs</li>
+<li>As mentioned above function documentation was extracted from the
markdown files</li>
+<li>Some new functions and sql support were added including '<a
href="https://github.com/apache/datafusion/pull/13799">show
functions</a>', '<a
href="https://github.com/apache/datafusion/pull/11347">to_local_time</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/12970">regexp_count</a>',
'<a
href="https://github.com/apache/datafusion/pull/11969">map_extract</a>',
'<a
href="https://github.com/apache/datafusion/pull/12211">array_distance</a>',
'<a
href="https://github.com/apache/datafusion/pull/12329">array_any_value</a>',
'<a
href="https://github.com/apache/datafusion/pull/12474">greatest</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/13786">least</a>',
'<a
href="https://github.com/apache/datafusion/pull/14217">arrays_overlap</a>'</li>
+</ul>
+<h3>FFI</h3>
+<ul>
+<li>Foreign Function Interface work has started. This should allow for
+ <a href="https://github.com/apache/datafusion/pull/12920">using table
providers</a> across languages and versions of DataFusion. This
+ is especially pertinent for integration with <a
href="https://delta-io.github.io/delta-rs/">delta-rs</a> and other
table formats.</li>
+</ul>
+<h3>Materialized Views</h3>
+<ul>
+<li><a href="https://github.com/suremarc">@suremarc</a> has
added a <a
href="https://github.com/datafusion-contrib/datafusion-materialized-views">materialized
view implementation</a> in datafusion-contrib 🚀</li>
+</ul>
+<h3>Substrait</h3>
+<ul>
+<li>A lot of work was put into improving and enhancing substrait support
(<a href="https://github.com/Blizzara">@Blizzara</a>, <a
href="https://github.com/westonpace">@westonpace</a>, <a
href="https://github.com/tokoko">@tokoko</a>, <a
href="https://github.com/vbarua">@vbarua</a>, <a
href="https://github.com/LatrecheYasser">@LatrecheYasser</a>, <a
href="https://github.com/notfilippo">@notfilippo</a> and others) [...]
+</ul>
+<h2>Looking Ahead: The Next Six Months ðŸ”</h2>
+<p>One of the long term goals of <a
href="https://github.com/alamb">@alamb</a>, DataFusion's PMC chair,
has been to have
+<a href="https://www.influxdata.com/blog/datafusion-2025-influxdb/">1000
DataFusion based projects</a>. This may be the year that
happens!</p>
+<p>The community has been <a
href="https://github.com/apache/datafusion/issues/14580">discussing what we
will work on in the next six months</a>.
+Some major initiatives are likely to be:</p>
+<ol>
+<li><em>Performance</em>: A <a
href="https://github.com/apache/datafusion/issues/14482">number of items
have been identified</a> as areas that could use additional
work</li>
+<li><em>Memory usage</em>: Tracking and improving memory
usage, statistics and spilling to disk </li>
+<li><em><a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
(GSOC)</em>: DataFusion is hopefully selected as a project and we start
accepting and supporting student projects </li>
+<li><em>FFI</em>: Extending the FFI implementation to
support to all types of UDF's and SessionContext</li>
+<li><em>Spark Functions</em>: A <a
href="https://github.com/apache/datafusion/issues/5600">proposal has been
made to add a crate</a> covering spark compatible builtin functions
</li>
+</ol>
+<h2>How to Get Involved</h2>
+<p>DataFusion is not a project built or driven by a single person,
company, or
+foundation. Rather, our community of users and contributors work together to
+build a shared technology that none of us could have built alone.</p>
+<p>If you are interested in joining us we would love to have you. You
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests or code. A list of open issues suitable for beginners is <a
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">here</a>
and you
+can find how to reach us on the <a
href="https://datafusion.apache.org/contributor-guide/communication.html">communication
doc</a>.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.6.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0"
rel="alternate"></link><published>2025-02-17T00:00:00+00:00</published><updated>2025-02-17T00:00:00+00:00</updated><author><name>pmc</name></author
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/pmc.atom.xml b/output/feeds/pmc.atom.xml
index 2f91701..985a30c 100644
--- a/output/feeds/pmc.atom.xml
+++ b/output/feeds/pmc.atom.xml
@@ -1,5 +1,193 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0"
[...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-20T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion 45.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0"
rel="alte [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development for DataFusion!</p>
+<p><a href="https://datafusion.apache.org/">Apache
DataFusion</a> is an extensible query engine, written in <a
href="https://www.rust-lang.org/">Rust</a>, that
+uses <a href="https://arrow.apache.org">Apache Arrow</a> as its
in-memory format. DataFusion is used by developers to
+create new, fast data centric systems such as databases, dataframe libraries,
+machine learning and streaming applications. While <a
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals">DataFusion&rsquo;s
primary design
+goal</a> is to accelerate the creation of other data centric systems, it
has a
+reasonable experience directly out of the box as a <a
href="https://datafusion.apache.org/user-guide/dataframe.html">dataframe
library</a>,
+<a href="https://datafusion.apache.org/python/">python library</a>
and <a href="https://datafusion.apache.org/user-guide/cli/">command line
SQL tool</a>.</p>
+<p>DataFusion's core thesis is that as a community, together we can
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation, and focus on
+what makes our projects unique.</p>
+<h2>Community Growth 📈</h2>
+<p>In the last 6 months, between <code>40.0.0</code> and
<code>45.0.0</code>, our community continues to
+grow in new and exciting ways.</p>
+<ol>
+<li>We added several PMC members and new committers: <a
href="https://github.com/jayzhan211">@jayzhan211</a> and <a
href="https://github.com/jonahgao">@jonahgao</a> joined the PMC,
+ <a href="https://github.com/2010YOUY01">@2010YOUY01</a>, <a
href="https://github.com/rachelint">@rachelint</a>, <a
href="https://github.com/findepi/">@findpi</a>, <a
href="https://github.com/iffyio">@iffyio</a>, <a
href="https://github.com/goldmedal">@goldmedal</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/Michael-J-Ward">@Michael-J-Ward</a> and <a
href="https [...]
+ joined as committers. See the <a
href="https://lists.apache.org/[email protected]">mailing
list</a> for more details.</li>
+<li>In the <a
href="https://github.com/apache/arrow-datafusion">core DataFusion
repo</a> alone we reviewed and accepted almost 1600 PRs from 206 different
+ committers, created over 1100 issues and closed 751 of them 🚀. All changes
are listed in the detailed
+ <a
href="https://github.com/apache/datafusion/tree/main/dev/changelog">changelogs</a>.</li>
+<li>DataFusion focused meetups happened in multiple cities around the
world: <a
href="https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273">Hangzhou</a>,
<a
href="https://github.com/apache/datafusion/discussions/11431">Belgrade</a>,
<a href="https://github.com/apache/datafusion/discussions/11213">New
York</a>,
+ <a
href="https://github.com/apache/datafusion/discussions/10348">Seattle</a>,
<a
href="https://github.com/apache/datafusion/discussions/12894">Chicago</a>,
<a
href="https://github.com/apache/datafusion/discussions/13165">Boston</a>
and <a
href="https://github.com/apache/datafusion/discussions/12988">Amsterdam</a>
as well as a Rust NYC meetup in NYC focused on DataFusion.</li>
+</ol>
+<!--
+$ git log --pretty=oneline 40.0.0..45.0.0 . | wc -l
+ 1532 (up from 1453)
+
+$ git shortlog -sn 40.0.0..45.0.0 . | wc -l
+ 206 (up from 182)
+
+https://crates.io/crates/datafusion/45.0.0
+DataFusion 45 released Feb 7, 2025
+
+https://crates.io/crates/datafusion/40.0.0
+DataFusion 40 released July 12, 2024
+
+Issues created in this time: 375 open, 751 closed (from 321 open, 781 closed)
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2024-07-12..2025-02-07
+
+Issues closed: 956 (up from 911)
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2024-07-12..2025-02-07
+
+PRs merged in this time 1597 (up from 1490)
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2024-07-12..2025-02-07
+
+-->
+<p>DataFusion has put in an application to be part of <a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
with a
+<a href="https://github.com/apache/datafusion/issues/14478">number of
ideas</a> for projects with mentors already selected. Additionally, <a
href="https://github.com/apache/datafusion/issues/14373">some
ideas</a> on
+how to make DataFusion an ideal selection for university database projects
such as the
+<a href="https://15445.courses.cs.cmu.edu/spring2025/">CMU database
classes</a> have been put forward.</p>
+<p>In addition, DataFusion has been appearing publicly more and more,
both online and offline. Here are some highlights:</p>
+<ol>
+<li>A <a
href="https://uwheel.rs/post/datafusion_uwheel/">demonstration of how
uwheel</a> is integrated into DataFusion</li>
+<li>Integrating StringView into DataFusion - <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/">part
1</a> and <a
href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/">part
2</a></li>
+<li><a href="https://techontherocks.show/3">Building
streams</a> with DataFusion</li>
+<li><a
href="https://blog.haoxp.xyz/posts/caching-datafusion">Caching in
DataFusion</a>: Don't read twice</li>
+<li><a
href="https://blog.haoxp.xyz/posts/parquet-to-arrow/">Parquet pruning in
DataFusion</a>: Read no more than you need</li>
+<li>DataFusion is one of <a
href="https://www.crn.com/news/software/2024/the-10-coolest-open-source-software-tools-of-2024?page=3">The
10 coolest open source software tools</a></li>
+<li><a
href="https://www.denormalized.io/blog/building-databases">Building
databases over a weekend</a></li>
+</ol>
+<h2>Improved Performance 🚀</h2>
+<p>DataFusion hit a milestone in its development by becoming <a
href="https://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/">the
fastest single node engine</a>
+for querying Apache Parquet files in <a
href="https://benchmark.clickhouse.com/">clickbench</a> benchmark for
the 43.0.0 release. A <a
href="https://github.com/apache/datafusion/issues/12821">lot
+of work</a> went into making this happen! While other engines have
subsequently gotten faster,
+displacing DataFusion from the top spot, DataFusion still remains near the top
and we <a href="https://github.com/apache/datafusion/issues/14586">are
planning
+more improvements</a>.</p>
+<p><img alt="ClickBench performance results over time for DataFusion"
class="img-responsive"
src="/blog/images/datafusion-45.0.0/performance_over_time.png"
width="100%"/></p>
+<p><strong>Figure 1</strong>: ClickBench performance
improved over 33% between DataFusion 33
+(released Nov. 2023) and DataFusion 45 (released Feb. 2025). </p>
+<p>The task of <a
href="https://github.com/apache/datafusion/issues/10918">integrating</a>
the new <a
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html">Arrow
StringView</a> which significantly improves performance
+for workloads that scan, filter and group by variable length string and binary
data was completed
+and enabled by default in the past 6 months. The improvement is especially
pronounced for Parquet
+files due to <a
href="https://github.com/apache/arrow-rs/issues/5530">upstream work in the
parquet reader</a>. Kudos to <a
href="https://github.com/XiangpengHong">@XiangpengHong</a>, <a
href="https://github.com/AriesDevil">@AriesDevil</a>,
+<a href="https://github.com/PsiACE">@PsiACE</a>, <a
href="https://github.com/Weijun-H">@Weijun-H</a>, <a
href="https://github.com/a10y">@a10y</a>, and <a
href="https://github.com/RinChanNOWWW">@RinChanNOWWW</a> for driving
this project.</p>
+<h2>Improved Quality 📋</h2>
+<p>DataFusion continues to improve overall in quality. In addition to
ongoing bug
+fixes, one of the most exciting improvements in the last 6 months was the
addition of the
+<a href="https://github.com/apache/datafusion/pull/13936">SQLite
sqllogictest suite</a> thanks to <a
href="https://github.com/Omega359">@Omega359</a>. These tests run over
5 million
+sql statements on every push to the main branch.</p>
+<p>Support for <a
href="https://github.com/apache/datafusion/pull/13651">explicitly checking
logical plan invariants</a> was added by <a
href="https://github.com/wiedld">@wiedld</a> which
+can help catch implicit changes that might cause problems during
upgrades.</p>
+<p>We have also started other quality initiatives to make it <a
href="https://github.com/apache/datafusion/issues/13525">easier to use
DataFusion</a>
+based on <a href="https://glaredb.com/">GlareDB</a>'s experience
along with more <a
href="https://github.com/apache/datafusion/issues/13661">extensive
prerelease testing</a>. </p>
+<h2>Improved Documentation 📚</h2>
+<p>We continue to improve the documentation to make it easier to get
started using DataFusion.
+During the last 6 months two projects were initiated to migrate the function
documentation
+from strictly static markdown files. First, <a
href="https://github.com/apache/datafusion/pull/12668">@Omega359</a>
to allow function
+documentation to be generated from code and <a
href="https://github.com/jonathanc-n">@jonathanc-n</a> and others
helped with the migration,
+then <a href="https://github.com/comphead">@comphead</a> lead a
project to <a
href="https://github.com/apache/datafusion/pull/12822">create a doc
macro</a> to allow for an even easier way to write
+function documentation. A special thanks to <a
href="https://github.com/Chen-Yuan-Lai">@Chen-Yuan-Lai</a> for
migrating many functions to
+the new syntax.</p>
+<p>Additionally, the <a
href="https://github.com/apache/datafusion/pull/13877">examples</a>
were <a
href="https://github.com/apache/datafusion/pull/13905">refactored</a>
and <a href="https://github.com/apache/datafusion/pull/13950">cleaned
up</a> to improve their usefulness.</p>
+<h2>New Features ✨</h2>
+<p>There are too many new features in the last 6 months to list them
all, but here
+are some highlights:</p>
+<h3>Functions</h3>
+<ul>
+<li>Uniform Window Functions:
<code>BuiltInWindowFunctions</code> was removed and all now use
UDFs (<a
href="https://github.com/jcsherin">@jcsherin</a>)</li>
+<li>Uniform Aggregate Functions:
<code>BuiltInAggregateFunctions</code> was removed and all now use
UDFs</li>
+<li>As mentioned above function documentation was extracted from the
markdown files</li>
+<li>Some new functions and sql support were added including '<a
href="https://github.com/apache/datafusion/pull/13799">show
functions</a>', '<a
href="https://github.com/apache/datafusion/pull/11347">to_local_time</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/12970">regexp_count</a>',
'<a
href="https://github.com/apache/datafusion/pull/11969">map_extract</a>',
'<a
href="https://github.com/apache/datafusion/pull/12211">array_distance</a>',
'<a
href="https://github.com/apache/datafusion/pull/12329">array_any_value</a>',
'<a
href="https://github.com/apache/datafusion/pull/12474">greatest</a>',
+ '<a
href="https://github.com/apache/datafusion/pull/13786">least</a>',
'<a
href="https://github.com/apache/datafusion/pull/14217">arrays_overlap</a>'</li>
+</ul>
+<h3>FFI</h3>
+<ul>
+<li>Foreign Function Interface work has started. This should allow for
+ <a href="https://github.com/apache/datafusion/pull/12920">using table
providers</a> across languages and versions of DataFusion. This
+ is especially pertinent for integration with <a
href="https://delta-io.github.io/delta-rs/">delta-rs</a> and other
table formats.</li>
+</ul>
+<h3>Materialized Views</h3>
+<ul>
+<li><a href="https://github.com/suremarc">@suremarc</a> has
added a <a
href="https://github.com/datafusion-contrib/datafusion-materialized-views">materialized
view implementation</a> in datafusion-contrib 🚀</li>
+</ul>
+<h3>Substrait</h3>
+<ul>
+<li>A lot of work was put into improving and enhancing substrait support
(<a href="https://github.com/Blizzara">@Blizzara</a>, <a
href="https://github.com/westonpace">@westonpace</a>, <a
href="https://github.com/tokoko">@tokoko</a>, <a
href="https://github.com/vbarua">@vbarua</a>, <a
href="https://github.com/LatrecheYasser">@LatrecheYasser</a>, <a
href="https://github.com/notfilippo">@notfilippo</a> and others) [...]
+</ul>
+<h2>Looking Ahead: The Next Six Months ðŸ”</h2>
+<p>One of the long term goals of <a
href="https://github.com/alamb">@alamb</a>, DataFusion's PMC chair,
has been to have
+<a href="https://www.influxdata.com/blog/datafusion-2025-influxdb/">1000
DataFusion based projects</a>. This may be the year that
happens!</p>
+<p>The community has been <a
href="https://github.com/apache/datafusion/issues/14580">discussing what we
will work on in the next six months</a>.
+Some major initiatives are likely to be:</p>
+<ol>
+<li><em>Performance</em>: A <a
href="https://github.com/apache/datafusion/issues/14482">number of items
have been identified</a> as areas that could use additional
work</li>
+<li><em>Memory usage</em>: Tracking and improving memory
usage, statistics and spilling to disk </li>
+<li><em><a
href="https://summerofcode.withgoogle.com/">Google Summer of Code</a>
(GSOC)</em>: DataFusion is hopefully selected as a project and we start
accepting and supporting student projects </li>
+<li><em>FFI</em>: Extending the FFI implementation to
support to all types of UDF's and SessionContext</li>
+<li><em>Spark Functions</em>: A <a
href="https://github.com/apache/datafusion/issues/5600">proposal has been
made to add a crate</a> covering spark compatible builtin functions
</li>
+</ol>
+<h2>How to Get Involved</h2>
+<p>DataFusion is not a project built or driven by a single person,
company, or
+foundation. Rather, our community of users and contributors work together to
+build a shared technology that none of us could have built alone.</p>
+<p>If you are interested in joining us we would love to have you. You
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests or code. A list of open issues suitable for beginners is <a
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22">here</a>
and you
+can find how to reach us on the <a
href="https://datafusion.apache.org/contributor-guide/communication.html">communication
doc</a>.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.6.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0"
rel="alternate"></link><published>2025-02-17T00:00:00+00:00</published><updated>2025-02-17T00:00:00+00:00</updated><author><name>pmc</name></author
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/pmc.rss.xml b/output/feeds/pmc.rss.xml
index a0ed98c..7620f23 100644
--- a/output/feeds/pmc.rss.xml
+++ b/output/feeds/pmc.rss.xml
@@ -1,5 +1,28 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
17 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Thu,
20 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
45.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details
-->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion
45.0.0</a>. This blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Thu, 20
Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-20:/blog/2025/02/20/datafusion-45.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/images/datafusion-45.0.0/performance_over_time.png
b/output/images/datafusion-45.0.0/performance_over_time.png
new file mode 100644
index 0000000..dc899fa
Binary files /dev/null and
b/output/images/datafusion-45.0.0/performance_over_time.png differ
diff --git a/output/index.html b/output/index.html
index befc312..1085005 100644
--- a/output/index.html
+++ b/output/index.html
@@ -44,6 +44,48 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/20/datafusion-45.0.0">Apache DataFusion 45.0.0
Released</a></h1>
+ <p>Posted on: Thu 20 February 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/11631 for details -->
+<h2>Introduction</h2>
+<p>We are very proud to announce <a
href="https://crates.io/crates/datafusion/45.0.0">DataFusion 45.0.0</a>. This
blog highlights some of the
+many major improvements since we released <a
href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion
40.0.0</a> and a preview of
+what the community is thinking about in the next 6 months. It has been an
exciting
+period of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/20/datafusion-45.0.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]