(datafusion-site) branch asf-site updated: Commit build products

github-bot Thu, 05 Feb 2026 17:48:09 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6f9968b  Commit build products
6f9968b is described below

commit 6f9968bc95e113ce2c11c57cb2e5c72b2e1ca434
Author: Build Pelican (action) <[email protected]>
AuthorDate: Fri Feb 6 01:47:45 2026 +0000

    Commit build products
---
 output/2022/02/28/datafusion-7.0.0/index.html      |  2 +-
 output/2023/01/19/datafusion-16.0.0/index.html     |  2 +-
 output/2024/01/19/datafusion-34.0.0/index.html     |  2 +-
 .../2024/08/20/python-datafusion-40.0.0/index.html |  2 +-
 .../index.html                                     |  4 ++--
 .../datafusion-python-udf-comparisons/index.html   |  8 +++----
 .../2024/12/14/datafusion-python-43.1.0/index.html |  4 ++--
 .../2025/03/30/datafusion-python-46.0.0/index.html |  2 +-
 output/feeds/all-en.atom.xml                       | 26 +++++++++++-----------
 output/feeds/blog.atom.xml                         | 26 +++++++++++-----------
 output/feeds/pmc.atom.xml                          |  6 ++---
 output/feeds/timsaucer.atom.xml                    | 16 ++++++-------
 output/feeds/xiangpeng-hao-andrew-lamb.atom.xml    |  4 ++--
 13 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/output/2022/02/28/datafusion-7.0.0/index.html 
b/output/2022/02/28/datafusion-7.0.0/index.html
index 808cc12..9067f72 100644
--- a/output/2022/02/28/datafusion-7.0.0/index.html
+++ b/output/2022/02/28/datafusion-7.0.0/index.html
@@ -125,7 +125,7 @@ git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli 
datafusion-examples | wc
 <li>Switch from <code>std::sync::Mutex</code> to 
<code>parking_lot::Mutex</code> <a 
href="https://github.com/apache/arrow-datafusion/pull/1720";>#1720</a></li>
 <li>New Features</li>
 <li>Support for memory tracking and spilling to disk<ul>
-<li>MemoryMananger and DiskManager <a 
href="https://github.com/apache/arrow-datafusion/pull/1526";>#1526</a></li>
+<li>MemoryManager and DiskManager <a 
href="https://github.com/apache/arrow-datafusion/pull/1526";>#1526</a></li>
 <li>Out of core sort <a 
href="https://github.com/apache/arrow-datafusion/pull/1526";>#1526</a></li>
 <li>New metrics</li>
 <li><code>Gauge</code> and <code>CurrentMemoryUsage</code> <a 
href="https://github.com/apache/arrow-datafusion/pull/1682";>#1682</a></li>
diff --git a/output/2023/01/19/datafusion-16.0.0/index.html 
b/output/2023/01/19/datafusion-16.0.0/index.html
index ddfa5ae..103c43f 100644
--- a/output/2023/01/19/datafusion-16.0.0/index.html
+++ b/output/2023/01/19/datafusion-16.0.0/index.html
@@ -192,7 +192,7 @@ required synchronous access to all relevant catalog 
information.</p>
 <li>Automatic coercions ast between Date and Timestamp <a 
href="https://github.com/apache/arrow-datafusion/issues/4726";>#4726</a></li>
 <li>Support type coercion for timestamp and utf8 <a 
href="https://github.com/apache/arrow-datafusion/issues/4312";>#4312</a></li>
 <li>Full support for time32 and time64 literal values 
(<code>ScalarValue</code>) <a 
href="https://github.com/apache/arrow-datafusion/issues/4156";>#4156</a></li>
-<li>New functions, incuding <code>uuid()</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4041";>#4041</a>, 
<code>current_time</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4054";>#4054</a>, 
<code>current_date</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4022";>#4022</a></li>
+<li>New functions, including <code>uuid()</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4041";>#4041</a>, 
<code>current_time</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4054";>#4054</a>, 
<code>current_date</code> <a 
href="https://github.com/apache/arrow-datafusion/issues/4022";>#4022</a></li>
 <li>Compressed CSV/JSON support <a 
href="https://github.com/apache/arrow-datafusion/issues/3642";>#3642</a></li>
 </ul>
 <p>The community has also invested in new <a 
href="https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/README.md";>sqllogic
 based</a> tests to keep improving DataFusion's quality with less effort.</p>
diff --git a/output/2024/01/19/datafusion-34.0.0/index.html 
b/output/2024/01/19/datafusion-34.0.0/index.html
index 9740d49..25ea78d 100644
--- a/output/2024/01/19/datafusion-34.0.0/index.html
+++ b/output/2024/01/19/datafusion-34.0.0/index.html
@@ -256,7 +256,7 @@ LIMIT 3;
 3 rows in set. Query took 0.053 seconds.
 </code></pre>
 <h3 id="growth-of-datafusion">Growth of DataFusion 📈<a class="headerlink" 
href="#growth-of-datafusion" title="Permanent link">¶</a></h3>
-<p>DataFusion has been appearing more publically in the wild. For example
+<p>DataFusion has been appearing more publicly in the wild. For example
 * New projects built using DataFusion such as <a 
href="https://lancedb.com/";>lancedb</a>, <a 
href="https://glaredb.com/";>GlareDB</a>, <a 
href="https://www.arroyo.dev/";>Arroyo</a>, and <a 
href="https://github.com/cmu-db/optd";>optd</a>.
 * Public talks such as <a 
href="https://www.youtube.com/watch?v=AJU9rdRNk9I";>Apache Arrow Datafusion: 
Vectorized
   Execution Framework For Maximum Performance</a> in <a 
href="https://www.bagevent.com/event/8432178";>CommunityOverCode Asia 2023</a> 
diff --git a/output/2024/08/20/python-datafusion-40.0.0/index.html 
b/output/2024/08/20/python-datafusion-40.0.0/index.html
index af3d973..6a0276e 100644
--- a/output/2024/08/20/python-datafusion-40.0.0/index.html
+++ b/output/2024/08/20/python-datafusion-40.0.0/index.html
@@ -105,7 +105,7 @@ to their Rust counterparts.</li>
 <p>The most significant difference is that we have added wrapper functions and 
classes for most of the
 user facing interface. These wrappers, written in Python, contain both 
documentation and type
 annotations.</p>
-<p>This documenation is now available on the <a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html";>DataFusion
 in Python API</a> website. There you can browse
+<p>This documentation is now available on the <a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html";>DataFusion
 in Python API</a> website. There you can browse
 the available functions and classes to see the breadth of available 
functionality.</p>
 <p>Modern IDEs use language servers such as
 <a 
href="https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance";>Pylance</a>
 or
diff --git 
a/output/2024/09/13/string-view-german-style-strings-part-2/index.html 
b/output/2024/09/13/string-view-german-style-strings-part-2/index.html
index 7712d8e..9f2ac71 100644
--- a/output/2024/09/13/string-view-german-style-strings-part-2/index.html
+++ b/output/2024/09/13/string-view-german-style-strings-part-2/index.html
@@ -107,8 +107,8 @@ Figure 1 illustrates the difference between the output of 
both string representa
 <h1 id="when-to-gc">When to GC?<a class="headerlink" href="#when-to-gc" 
title="Permanent link">¶</a></h1>
 <p>Zero-copy <code>take/filter</code> is great for generating large arrays 
quickly, but it is suboptimal for highly selective filters, where most of the 
strings are filtered out. When the cardinality drops, StringViewArray buffers 
become sparse—only a small subset of the bytes in the buffer’s memory are 
referred to by any <code>view</code>. This leads to excessive memory usage, 
especially in a <a 
href="https://github.com/apache/datafusion/issues/11628";>filter-then-coalesce 
scenario</a>.  [...]
 <p>To release unused memory, we implemented a <a 
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc";>garbage
 collection (GC)</a> routine to consolidate the data into a new buffer to 
release the old sparse buffer(s). As the GC operation copies strings, similarly 
to StringArray, we must be careful about when to call it. If we call GC too 
early, we cause unnecessary copying, losing much of the benefit of 
StringViewArray. If we call GC too late, we hold [...]
-<p><code>arrow-rs</code> implements the GC process, but it is up to users to 
decide when to call it. We leverage the semantics of the query engine and 
observed that the <a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html";><code>CoalseceBatchesExec</code></a>
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, which aligns perfectly with the 
scenario of G [...]
-We, therefore,<a href="https://github.com/apache/datafusion/pull/11587";> 
implemented the GC procedure</a> inside <code>CoalseceBatchesExec</code>[^5] 
with a heuristic that estimates when the buffers are too sparse.</p>
+<p><code>arrow-rs</code> implements the GC process, but it is up to users to 
decide when to call it. We leverage the semantics of the query engine and 
observed that the <a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html";><code>CoalesceBatchesExec</code></a>
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, which aligns perfectly with the 
scenario of G [...]
+We, therefore,<a href="https://github.com/apache/datafusion/pull/11587";> 
implemented the GC procedure</a> inside <code>CoalesceBatchesExec</code>[^5] 
with a heuristic that estimates when the buffers are too sparse.</p>
 <h2 id="the-art-of-function-inlining-not-too-much-not-too-little">The art of 
function inlining: not too much, not too little<a class="headerlink" 
href="#the-art-of-function-inlining-not-too-much-not-too-little" 
title="Permanent link">¶</a></h2>
 <p>Like string inlining, <em>function</em> inlining is the process of 
embedding a short function into the caller to avoid the overhead of function 
calls (caller/callee save). 
 Usually, the Rust compiler does a good job of deciding when to inline. 
However, it is possible to override its default using the <a 
href="https://doc.rust-lang.org/reference/attributes/codegen.html#the-inline-attribute";><code>#[inline(always)]</code>
 directive</a>. 
diff --git a/output/2024/11/19/datafusion-python-udf-comparisons/index.html 
b/output/2024/11/19/datafusion-python-udf-comparisons/index.html
index b6943f4..9a8443d 100644
--- a/output/2024/11/19/datafusion-python-udf-comparisons/index.html
+++ b/output/2024/11/19/datafusion-python-udf-comparisons/index.html
@@ -149,7 +149,7 @@ than a join can be significantly faster. This is worth 
profiling for your specif
 <p>I have a DataFrame with many values that I want to aggregate. I have 
already analyzed it and
 determined there is a noise level below which I do not want to include in my 
analysis. I want to
 compute a sum of only values that are above my noise threshold.</p>
-<p>This can be done fairly easy without leaning on a User Defined Aggegate 
Function (UDAF). You can
+<p>This can be done fairly easy without leaning on a User Defined Aggregate 
Function (UDAF). You can
 simply filter the DataFrame and then aggregate using the built-in 
<code>sum</code> function. Here, we
 demonstrate doing this as a UDF primarily as an example of how to write UDAFs. 
We will use the
 PyArrow compute approach.</p>
@@ -310,7 +310,7 @@ Python, is to primarily demonstrate how to make the Python 
to Rust with Python w
 transition. In the second implementation you can see how we can iterate 
through all of the arrays
 ourselves.</p>
 <p>In this first example, we are hard coding the values of interest, but in 
the following section
-we demonstrate passing these in during initalization.</p>
+we demonstrate passing these in during initialization.</p>
 <pre><code class="language-rust">#[pyfunction]
 pub fn tuple_filter_fn(
     py: Python&lt;'_&gt;,
@@ -533,13 +533,13 @@ how much they have ordered total. We want to ignore small 
orders, which we defin
 import pyarrow as pa
 import pyarrow.compute as pc
 
-IGNORE_THESHOLD = 5000.0
+IGNORE_THRESHOLD = 5000.0
 class AboveThresholdAccum(Accumulator):
     def __init__(self) -&gt; None:
         self._sum = 0.0
 
     def update(self, values: pa.Array) -&gt; None:
-        over_threshold = pc.greater(values, pa.scalar(IGNORE_THESHOLD))
+        over_threshold = pc.greater(values, pa.scalar(IGNORE_THRESHOLD))
         sum_above = pc.sum(values.filter(over_threshold)).as_py()
         if sum_above is None:
             sum_above = 0.0
diff --git a/output/2024/12/14/datafusion-python-43.1.0/index.html 
b/output/2024/12/14/datafusion-python-43.1.0/index.html
index ac438a2..1d61bc6 100644
--- a/output/2024/12/14/datafusion-python-43.1.0/index.html
+++ b/output/2024/12/14/datafusion-python-43.1.0/index.html
@@ -95,7 +95,7 @@ consistent method for exposing these data structures across 
libraries.</p>
 <p>In <a href="https://github.com/apache/datafusion-python/pull/825";>PR 
#825</a>, we introduced support for both importing and exporting Arrow data in
 <code>datafusion-python</code>. With this improvement, you can now use a 
single function call to import
 a table from <strong>any</strong> Python library that implements the <a 
href="https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html";>Arrow
 PyCapsule Interface</a>.
-Many popular libaries, such as <a href="https://pandas.pydata.org/";>Pandas</a> 
and <a href="https://pola.rs/";>Polars</a>
+Many popular libraries, such as <a 
href="https://pandas.pydata.org/";>Pandas</a> and <a 
href="https://pola.rs/";>Polars</a>
 already support these interfaces.</p>
 <p>Suppose you have a Pandas and Polars DataFrames named 
<code>df_pandas</code> or <code>df_polars</code>, respectively:</p>
 <pre><code class="language-python">ctx = SessionContext()
@@ -155,7 +155,7 @@ of the blog post describing how these enhancements can lead 
to 20-200% performan
 gains in some tests.</p>
 <p>During our testing we identified some cases where we needed to adjust 
workflows to
 account for the fact that StringView is now the default type for string based 
operations.
-First, when performing manipulations on string objects there is a perfomance 
loss when
+First, when performing manipulations on string objects there is a performance 
loss when
 needing to cast from string to string view or vice versa. To reap the best 
performance,
 ideally all of your string type data will use StringView. For most users this 
should be
 transparent. However if you specify a schema for reading or creating data, 
then you
diff --git a/output/2025/03/30/datafusion-python-46.0.0/index.html 
b/output/2025/03/30/datafusion-python-46.0.0/index.html
index b570e0f..7101db4 100644
--- a/output/2025/03/30/datafusion-python-46.0.0/index.html
+++ b/output/2025/03/30/datafusion-python-46.0.0/index.html
@@ -117,7 +117,7 @@ to register the view and then use it in another place:</p>
 <pre><code class="language-python">ctx.register_view("view1", df1)
 </code></pre>
 <p>And then in another portion of your code which has access to the same 
session context
-you can retrive the DataFrame with:</p>
+you can retrieve the DataFrame with:</p>
 <pre><code>df2 = ctx.table("view1")
 </code></pre>
 <h2 id="asynchronous-iteration-of-record-batches">Asynchronous Iteration of 
Record Batches<a class="headerlink" 
href="#asynchronous-iteration-of-record-batches" title="Permanent 
link">¶</a></h2>
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index b283d32..b42ac30 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -7056,7 +7056,7 @@ to register the view and then use it in another 
place:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx.register_view("view1", df1)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;p&gt;And then in another portion of your code which has access to the same 
session context
-you can retrive the DataFrame with:&lt;/p&gt;
+you can retrieve the DataFrame with:&lt;/p&gt;
 &lt;pre&gt;&lt;code&gt;df2 = ctx.table("view1")
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h2 id="asynchronous-iteration-of-record-batches"&gt;Asynchronous Iteration 
of Record Batches&lt;a class="headerlink" 
href="#asynchronous-iteration-of-record-batches" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
@@ -8690,7 +8690,7 @@ consistent method for exposing these data structures 
across libraries.&lt;/p&gt;
 &lt;p&gt;In &lt;a 
href="https://github.com/apache/datafusion-python/pull/825"&gt;PR 
#825&lt;/a&gt;, we introduced support for both importing and exporting Arrow 
data in
 &lt;code&gt;datafusion-python&lt;/code&gt;. With this improvement, you can now 
use a single function call to import
 a table from &lt;strong&gt;any&lt;/strong&gt; Python library that implements 
the &lt;a 
href="https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html"&gt;Arrow
 PyCapsule Interface&lt;/a&gt;.
-Many popular libaries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
+Many popular libraries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
 already support these interfaces.&lt;/p&gt;
 &lt;p&gt;Suppose you have a Pandas and Polars DataFrames named 
&lt;code&gt;df_pandas&lt;/code&gt; or &lt;code&gt;df_polars&lt;/code&gt;, 
respectively:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx = SessionContext()
@@ -8750,7 +8750,7 @@ of the blog post describing how these enhancements can 
lead to 20-200% performan
 gains in some tests.&lt;/p&gt;
 &lt;p&gt;During our testing we identified some cases where we needed to adjust 
workflows to
 account for the fact that StringView is now the default type for string based 
operations.
-First, when performing manipulations on string objects there is a perfomance 
loss when
+First, when performing manipulations on string objects there is a performance 
loss when
 needing to cast from string to string view or vice versa. To reap the best 
performance,
 ideally all of your string type data will use StringView. For most users this 
should be
 transparent. However if you specify a schema for reading or creating data, 
then you
@@ -8987,7 +8987,7 @@ than a join can be significantly faster. This is worth 
profiling for your specif
 &lt;p&gt;I have a DataFrame with many values that I want to aggregate. I have 
already analyzed it and
 determined there is a noise level below which I do not want to include in my 
analysis. I want to
 compute a sum of only values that are above my noise threshold.&lt;/p&gt;
-&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggegate Function (UDAF). You can
+&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggregate Function (UDAF). You can
 simply filter the DataFrame and then aggregate using the built-in 
&lt;code&gt;sum&lt;/code&gt; function. Here, we
 demonstrate doing this as a UDF primarily as an example of how to write UDAFs. 
We will use the
 PyArrow compute approach.&lt;/p&gt;
@@ -9148,7 +9148,7 @@ Python, is to primarily demonstrate how to make the 
Python to Rust with Python w
 transition. In the second implementation you can see how we can iterate 
through all of the arrays
 ourselves.&lt;/p&gt;
 &lt;p&gt;In this first example, we are hard coding the values of interest, but 
in the following section
-we demonstrate passing these in during initalization.&lt;/p&gt;
+we demonstrate passing these in during initialization.&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-rust"&gt;#[pyfunction]
 pub fn tuple_filter_fn(
     py: Python&amp;lt;'_&amp;gt;,
@@ -9371,13 +9371,13 @@ how much they have ordered total. We want to ignore 
small orders, which we defin
 import pyarrow as pa
 import pyarrow.compute as pc
 
-IGNORE_THESHOLD = 5000.0
+IGNORE_THRESHOLD = 5000.0
 class AboveThresholdAccum(Accumulator):
     def __init__(self) -&amp;gt; None:
         self._sum = 0.0
 
     def update(self, values: pa.Array) -&amp;gt; None:
-        over_threshold = pc.greater(values, pa.scalar(IGNORE_THESHOLD))
+        over_threshold = pc.greater(values, pa.scalar(IGNORE_THRESHOLD))
         sum_above = pc.sum(values.filter(over_threshold)).as_py()
         if sum_above is None:
             sum_above = 0.0
@@ -9952,8 +9952,8 @@ Figure 1 illustrates the difference between the output of 
both string representa
 &lt;h1 id="when-to-gc"&gt;When to GC?&lt;a class="headerlink" 
href="#when-to-gc" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h1&gt;
 &lt;p&gt;Zero-copy &lt;code&gt;take/filter&lt;/code&gt; is great for 
generating large arrays quickly, but it is suboptimal for highly selective 
filters, where most of the strings are filtered out. When the cardinality 
drops, StringViewArray buffers become sparse—only a small subset of the bytes 
in the buffer’s memory are referred to by any &lt;code&gt;view&lt;/code&gt;. 
This leads to excessive memory usage, especially in a &lt;a 
href="https://github.com/apache/datafusion/issues/11628"&gt [...]
 &lt;p&gt;To release unused memory, we implemented a &lt;a 
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc"&gt;garbage
 collection (GC)&lt;/a&gt; routine to consolidate the data into a new buffer to 
release the old sparse buffer(s). As the GC operation copies strings, similarly 
to StringArray, we must be careful about when to call it. If we call GC too 
early, we cause unnecessary copying, losing much of the benefit of 
StringViewArray. If we call GC [...]
-&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
-We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
+&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
+We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
 &lt;h2 id="the-art-of-function-inlining-not-too-much-not-too-little"&gt;The 
art of function inlining: not too much, not too little&lt;a class="headerlink" 
href="#the-art-of-function-inlining-not-too-much-not-too-little" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;p&gt;Like string inlining, &lt;em&gt;function&lt;/em&gt; inlining is the 
process of embedding a short function into the caller to avoid the overhead of 
function calls (caller/callee save). 
 Usually, the Rust compiler does a good job of deciding when to inline. 
However, it is possible to override its default using the &lt;a 
href="https://doc.rust-lang.org/reference/attributes/codegen.html#the-inline-attribute"&gt;&lt;code&gt;#[inline(always)]&lt;/code&gt;
 directive&lt;/a&gt;. 
@@ -10152,7 +10152,7 @@ to their Rust counterparts.&lt;/li&gt;
 &lt;p&gt;The most significant difference is that we have added wrapper 
functions and classes for most of the
 user facing interface. These wrappers, written in Python, contain both 
documentation and type
 annotations.&lt;/p&gt;
-&lt;p&gt;This documenation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
+&lt;p&gt;This documentation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
 the available functions and classes to see the breadth of available 
functionality.&lt;/p&gt;
 &lt;p&gt;Modern IDEs use language servers such as
 &lt;a 
href="https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance"&gt;Pylance&lt;/a&gt;
 or
@@ -11076,7 +11076,7 @@ LIMIT 3;
 3 rows in set. Query took 0.053 seconds.
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h3 id="growth-of-datafusion"&gt;Growth of DataFusion 📈&lt;a 
class="headerlink" href="#growth-of-datafusion" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
-&lt;p&gt;DataFusion has been appearing more publically in the wild. For example
+&lt;p&gt;DataFusion has been appearing more publicly in the wild. For example
 * New projects built using DataFusion such as &lt;a 
href="https://lancedb.com/"&gt;lancedb&lt;/a&gt;, &lt;a 
href="https://glaredb.com/"&gt;GlareDB&lt;/a&gt;, &lt;a 
href="https://www.arroyo.dev/"&gt;Arroyo&lt;/a&gt;, and &lt;a 
href="https://github.com/cmu-db/optd"&gt;optd&lt;/a&gt;.
 * Public talks such as &lt;a 
href="https://www.youtube.com/watch?v=AJU9rdRNk9I"&gt;Apache Arrow Datafusion: 
Vectorized
   Execution Framework For Maximum Performance&lt;/a&gt; in &lt;a 
href="https://www.bagevent.com/event/8432178"&gt;CommunityOverCode Asia 
2023&lt;/a&gt; 
@@ -11828,7 +11828,7 @@ required synchronous access to all relevant catalog 
information.&lt;/p&gt;
 &lt;li&gt;Automatic coercions ast between Date and Timestamp &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4726"&gt;#4726&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Support type coercion for timestamp and utf8 &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4312"&gt;#4312&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Full support for time32 and time64 literal values 
(&lt;code&gt;ScalarValue&lt;/code&gt;) &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4156"&gt;#4156&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;New functions, incuding &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;New functions, including &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Compressed CSV/JSON support &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/3642"&gt;#3642&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 &lt;p&gt;The community has also invested in new &lt;a 
href="https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/README.md"&gt;sqllogic
 based&lt;/a&gt; tests to keep improving DataFusion's quality with less 
effort.&lt;/p&gt;
@@ -12680,7 +12680,7 @@ git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli 
datafusion-examples | wc
 &lt;li&gt;Switch from &lt;code&gt;std::sync::Mutex&lt;/code&gt; to 
&lt;code&gt;parking_lot::Mutex&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1720"&gt;#1720&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New Features&lt;/li&gt;
 &lt;li&gt;Support for memory tracking and spilling to disk&lt;ul&gt;
-&lt;li&gt;MemoryMananger and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;MemoryManager and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Out of core sort &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New metrics&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;Gauge&lt;/code&gt; and 
&lt;code&gt;CurrentMemoryUsage&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1682"&gt;#1682&lt;/a&gt;&lt;/li&gt;
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index aeeec57..df7fbb2 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -7056,7 +7056,7 @@ to register the view and then use it in another 
place:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx.register_view("view1", df1)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;p&gt;And then in another portion of your code which has access to the same 
session context
-you can retrive the DataFrame with:&lt;/p&gt;
+you can retrieve the DataFrame with:&lt;/p&gt;
 &lt;pre&gt;&lt;code&gt;df2 = ctx.table("view1")
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h2 id="asynchronous-iteration-of-record-batches"&gt;Asynchronous Iteration 
of Record Batches&lt;a class="headerlink" 
href="#asynchronous-iteration-of-record-batches" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
@@ -8690,7 +8690,7 @@ consistent method for exposing these data structures 
across libraries.&lt;/p&gt;
 &lt;p&gt;In &lt;a 
href="https://github.com/apache/datafusion-python/pull/825"&gt;PR 
#825&lt;/a&gt;, we introduced support for both importing and exporting Arrow 
data in
 &lt;code&gt;datafusion-python&lt;/code&gt;. With this improvement, you can now 
use a single function call to import
 a table from &lt;strong&gt;any&lt;/strong&gt; Python library that implements 
the &lt;a 
href="https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html"&gt;Arrow
 PyCapsule Interface&lt;/a&gt;.
-Many popular libaries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
+Many popular libraries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
 already support these interfaces.&lt;/p&gt;
 &lt;p&gt;Suppose you have a Pandas and Polars DataFrames named 
&lt;code&gt;df_pandas&lt;/code&gt; or &lt;code&gt;df_polars&lt;/code&gt;, 
respectively:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx = SessionContext()
@@ -8750,7 +8750,7 @@ of the blog post describing how these enhancements can 
lead to 20-200% performan
 gains in some tests.&lt;/p&gt;
 &lt;p&gt;During our testing we identified some cases where we needed to adjust 
workflows to
 account for the fact that StringView is now the default type for string based 
operations.
-First, when performing manipulations on string objects there is a perfomance 
loss when
+First, when performing manipulations on string objects there is a performance 
loss when
 needing to cast from string to string view or vice versa. To reap the best 
performance,
 ideally all of your string type data will use StringView. For most users this 
should be
 transparent. However if you specify a schema for reading or creating data, 
then you
@@ -8987,7 +8987,7 @@ than a join can be significantly faster. This is worth 
profiling for your specif
 &lt;p&gt;I have a DataFrame with many values that I want to aggregate. I have 
already analyzed it and
 determined there is a noise level below which I do not want to include in my 
analysis. I want to
 compute a sum of only values that are above my noise threshold.&lt;/p&gt;
-&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggegate Function (UDAF). You can
+&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggregate Function (UDAF). You can
 simply filter the DataFrame and then aggregate using the built-in 
&lt;code&gt;sum&lt;/code&gt; function. Here, we
 demonstrate doing this as a UDF primarily as an example of how to write UDAFs. 
We will use the
 PyArrow compute approach.&lt;/p&gt;
@@ -9148,7 +9148,7 @@ Python, is to primarily demonstrate how to make the 
Python to Rust with Python w
 transition. In the second implementation you can see how we can iterate 
through all of the arrays
 ourselves.&lt;/p&gt;
 &lt;p&gt;In this first example, we are hard coding the values of interest, but 
in the following section
-we demonstrate passing these in during initalization.&lt;/p&gt;
+we demonstrate passing these in during initialization.&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-rust"&gt;#[pyfunction]
 pub fn tuple_filter_fn(
     py: Python&amp;lt;'_&amp;gt;,
@@ -9371,13 +9371,13 @@ how much they have ordered total. We want to ignore 
small orders, which we defin
 import pyarrow as pa
 import pyarrow.compute as pc
 
-IGNORE_THESHOLD = 5000.0
+IGNORE_THRESHOLD = 5000.0
 class AboveThresholdAccum(Accumulator):
     def __init__(self) -&amp;gt; None:
         self._sum = 0.0
 
     def update(self, values: pa.Array) -&amp;gt; None:
-        over_threshold = pc.greater(values, pa.scalar(IGNORE_THESHOLD))
+        over_threshold = pc.greater(values, pa.scalar(IGNORE_THRESHOLD))
         sum_above = pc.sum(values.filter(over_threshold)).as_py()
         if sum_above is None:
             sum_above = 0.0
@@ -9952,8 +9952,8 @@ Figure 1 illustrates the difference between the output of 
both string representa
 &lt;h1 id="when-to-gc"&gt;When to GC?&lt;a class="headerlink" 
href="#when-to-gc" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h1&gt;
 &lt;p&gt;Zero-copy &lt;code&gt;take/filter&lt;/code&gt; is great for 
generating large arrays quickly, but it is suboptimal for highly selective 
filters, where most of the strings are filtered out. When the cardinality 
drops, StringViewArray buffers become sparse—only a small subset of the bytes 
in the buffer’s memory are referred to by any &lt;code&gt;view&lt;/code&gt;. 
This leads to excessive memory usage, especially in a &lt;a 
href="https://github.com/apache/datafusion/issues/11628"&gt [...]
 &lt;p&gt;To release unused memory, we implemented a &lt;a 
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc"&gt;garbage
 collection (GC)&lt;/a&gt; routine to consolidate the data into a new buffer to 
release the old sparse buffer(s). As the GC operation copies strings, similarly 
to StringArray, we must be careful about when to call it. If we call GC too 
early, we cause unnecessary copying, losing much of the benefit of 
StringViewArray. If we call GC [...]
-&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
-We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
+&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
+We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
 &lt;h2 id="the-art-of-function-inlining-not-too-much-not-too-little"&gt;The 
art of function inlining: not too much, not too little&lt;a class="headerlink" 
href="#the-art-of-function-inlining-not-too-much-not-too-little" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;p&gt;Like string inlining, &lt;em&gt;function&lt;/em&gt; inlining is the 
process of embedding a short function into the caller to avoid the overhead of 
function calls (caller/callee save). 
 Usually, the Rust compiler does a good job of deciding when to inline. 
However, it is possible to override its default using the &lt;a 
href="https://doc.rust-lang.org/reference/attributes/codegen.html#the-inline-attribute"&gt;&lt;code&gt;#[inline(always)]&lt;/code&gt;
 directive&lt;/a&gt;. 
@@ -10152,7 +10152,7 @@ to their Rust counterparts.&lt;/li&gt;
 &lt;p&gt;The most significant difference is that we have added wrapper 
functions and classes for most of the
 user facing interface. These wrappers, written in Python, contain both 
documentation and type
 annotations.&lt;/p&gt;
-&lt;p&gt;This documenation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
+&lt;p&gt;This documentation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
 the available functions and classes to see the breadth of available 
functionality.&lt;/p&gt;
 &lt;p&gt;Modern IDEs use language servers such as
 &lt;a 
href="https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance"&gt;Pylance&lt;/a&gt;
 or
@@ -11076,7 +11076,7 @@ LIMIT 3;
 3 rows in set. Query took 0.053 seconds.
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h3 id="growth-of-datafusion"&gt;Growth of DataFusion 📈&lt;a 
class="headerlink" href="#growth-of-datafusion" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
-&lt;p&gt;DataFusion has been appearing more publically in the wild. For example
+&lt;p&gt;DataFusion has been appearing more publicly in the wild. For example
 * New projects built using DataFusion such as &lt;a 
href="https://lancedb.com/"&gt;lancedb&lt;/a&gt;, &lt;a 
href="https://glaredb.com/"&gt;GlareDB&lt;/a&gt;, &lt;a 
href="https://www.arroyo.dev/"&gt;Arroyo&lt;/a&gt;, and &lt;a 
href="https://github.com/cmu-db/optd"&gt;optd&lt;/a&gt;.
 * Public talks such as &lt;a 
href="https://www.youtube.com/watch?v=AJU9rdRNk9I"&gt;Apache Arrow Datafusion: 
Vectorized
   Execution Framework For Maximum Performance&lt;/a&gt; in &lt;a 
href="https://www.bagevent.com/event/8432178"&gt;CommunityOverCode Asia 
2023&lt;/a&gt; 
@@ -11828,7 +11828,7 @@ required synchronous access to all relevant catalog 
information.&lt;/p&gt;
 &lt;li&gt;Automatic coercions ast between Date and Timestamp &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4726"&gt;#4726&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Support type coercion for timestamp and utf8 &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4312"&gt;#4312&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Full support for time32 and time64 literal values 
(&lt;code&gt;ScalarValue&lt;/code&gt;) &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4156"&gt;#4156&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;New functions, incuding &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;New functions, including &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Compressed CSV/JSON support &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/3642"&gt;#3642&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 &lt;p&gt;The community has also invested in new &lt;a 
href="https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/README.md"&gt;sqllogic
 based&lt;/a&gt; tests to keep improving DataFusion's quality with less 
effort.&lt;/p&gt;
@@ -12680,7 +12680,7 @@ git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli 
datafusion-examples | wc
 &lt;li&gt;Switch from &lt;code&gt;std::sync::Mutex&lt;/code&gt; to 
&lt;code&gt;parking_lot::Mutex&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1720"&gt;#1720&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New Features&lt;/li&gt;
 &lt;li&gt;Support for memory tracking and spilling to disk&lt;ul&gt;
-&lt;li&gt;MemoryMananger and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;MemoryManager and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Out of core sort &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New metrics&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;Gauge&lt;/code&gt; and 
&lt;code&gt;CurrentMemoryUsage&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1682"&gt;#1682&lt;/a&gt;&lt;/li&gt;
diff --git a/output/feeds/pmc.atom.xml b/output/feeds/pmc.atom.xml
index 498ee90..470aa0e 100644
--- a/output/feeds/pmc.atom.xml
+++ b/output/feeds/pmc.atom.xml
@@ -3840,7 +3840,7 @@ LIMIT 3;
 3 rows in set. Query took 0.053 seconds.
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h3 id="growth-of-datafusion"&gt;Growth of DataFusion 📈&lt;a 
class="headerlink" href="#growth-of-datafusion" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
-&lt;p&gt;DataFusion has been appearing more publically in the wild. For example
+&lt;p&gt;DataFusion has been appearing more publicly in the wild. For example
 * New projects built using DataFusion such as &lt;a 
href="https://lancedb.com/"&gt;lancedb&lt;/a&gt;, &lt;a 
href="https://glaredb.com/"&gt;GlareDB&lt;/a&gt;, &lt;a 
href="https://www.arroyo.dev/"&gt;Arroyo&lt;/a&gt;, and &lt;a 
href="https://github.com/cmu-db/optd"&gt;optd&lt;/a&gt;.
 * Public talks such as &lt;a 
href="https://www.youtube.com/watch?v=AJU9rdRNk9I"&gt;Apache Arrow Datafusion: 
Vectorized
   Execution Framework For Maximum Performance&lt;/a&gt; in &lt;a 
href="https://www.bagevent.com/event/8432178"&gt;CommunityOverCode Asia 
2023&lt;/a&gt; 
@@ -4269,7 +4269,7 @@ required synchronous access to all relevant catalog 
information.&lt;/p&gt;
 &lt;li&gt;Automatic coercions ast between Date and Timestamp &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4726"&gt;#4726&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Support type coercion for timestamp and utf8 &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4312"&gt;#4312&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Full support for time32 and time64 literal values 
(&lt;code&gt;ScalarValue&lt;/code&gt;) &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4156"&gt;#4156&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;New functions, incuding &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;New functions, including &lt;code&gt;uuid()&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4041"&gt;#4041&lt;/a&gt;,
 &lt;code&gt;current_time&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4054"&gt;#4054&lt;/a&gt;,
 &lt;code&gt;current_date&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/4022"&gt;#4022&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Compressed CSV/JSON support &lt;a 
href="https://github.com/apache/arrow-datafusion/issues/3642"&gt;#3642&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 &lt;p&gt;The community has also invested in new &lt;a 
href="https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/README.md"&gt;sqllogic
 based&lt;/a&gt; tests to keep improving DataFusion's quality with less 
effort.&lt;/p&gt;
@@ -5121,7 +5121,7 @@ git shortlog -sn 5.0.0..6.0.0 datafusion datafusion-cli 
datafusion-examples | wc
 &lt;li&gt;Switch from &lt;code&gt;std::sync::Mutex&lt;/code&gt; to 
&lt;code&gt;parking_lot::Mutex&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1720"&gt;#1720&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New Features&lt;/li&gt;
 &lt;li&gt;Support for memory tracking and spilling to disk&lt;ul&gt;
-&lt;li&gt;MemoryMananger and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;MemoryManager and DiskManager &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Out of core sort &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1526"&gt;#1526&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;New metrics&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;Gauge&lt;/code&gt; and 
&lt;code&gt;CurrentMemoryUsage&lt;/code&gt; &lt;a 
href="https://github.com/apache/arrow-datafusion/pull/1682"&gt;#1682&lt;/a&gt;&lt;/li&gt;
diff --git a/output/feeds/timsaucer.atom.xml b/output/feeds/timsaucer.atom.xml
index dab474a..268635c 100644
--- a/output/feeds/timsaucer.atom.xml
+++ b/output/feeds/timsaucer.atom.xml
@@ -75,7 +75,7 @@ to register the view and then use it in another 
place:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx.register_view("view1", df1)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;p&gt;And then in another portion of your code which has access to the same 
session context
-you can retrive the DataFrame with:&lt;/p&gt;
+you can retrieve the DataFrame with:&lt;/p&gt;
 &lt;pre&gt;&lt;code&gt;df2 = ctx.table("view1")
 &lt;/code&gt;&lt;/pre&gt;
 &lt;h2 id="asynchronous-iteration-of-record-batches"&gt;Asynchronous Iteration 
of Record Batches&lt;a class="headerlink" 
href="#asynchronous-iteration-of-record-batches" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
@@ -275,7 +275,7 @@ consistent method for exposing these data structures across 
libraries.&lt;/p&gt;
 &lt;p&gt;In &lt;a 
href="https://github.com/apache/datafusion-python/pull/825"&gt;PR 
#825&lt;/a&gt;, we introduced support for both importing and exporting Arrow 
data in
 &lt;code&gt;datafusion-python&lt;/code&gt;. With this improvement, you can now 
use a single function call to import
 a table from &lt;strong&gt;any&lt;/strong&gt; Python library that implements 
the &lt;a 
href="https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html"&gt;Arrow
 PyCapsule Interface&lt;/a&gt;.
-Many popular libaries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
+Many popular libraries, such as &lt;a 
href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; and &lt;a 
href="https://pola.rs/"&gt;Polars&lt;/a&gt;
 already support these interfaces.&lt;/p&gt;
 &lt;p&gt;Suppose you have a Pandas and Polars DataFrames named 
&lt;code&gt;df_pandas&lt;/code&gt; or &lt;code&gt;df_polars&lt;/code&gt;, 
respectively:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-python"&gt;ctx = SessionContext()
@@ -335,7 +335,7 @@ of the blog post describing how these enhancements can lead 
to 20-200% performan
 gains in some tests.&lt;/p&gt;
 &lt;p&gt;During our testing we identified some cases where we needed to adjust 
workflows to
 account for the fact that StringView is now the default type for string based 
operations.
-First, when performing manipulations on string objects there is a perfomance 
loss when
+First, when performing manipulations on string objects there is a performance 
loss when
 needing to cast from string to string view or vice versa. To reap the best 
performance,
 ideally all of your string type data will use StringView. For most users this 
should be
 transparent. However if you specify a schema for reading or creating data, 
then you
@@ -472,7 +472,7 @@ than a join can be significantly faster. This is worth 
profiling for your specif
 &lt;p&gt;I have a DataFrame with many values that I want to aggregate. I have 
already analyzed it and
 determined there is a noise level below which I do not want to include in my 
analysis. I want to
 compute a sum of only values that are above my noise threshold.&lt;/p&gt;
-&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggegate Function (UDAF). You can
+&lt;p&gt;This can be done fairly easy without leaning on a User Defined 
Aggregate Function (UDAF). You can
 simply filter the DataFrame and then aggregate using the built-in 
&lt;code&gt;sum&lt;/code&gt; function. Here, we
 demonstrate doing this as a UDF primarily as an example of how to write UDAFs. 
We will use the
 PyArrow compute approach.&lt;/p&gt;
@@ -633,7 +633,7 @@ Python, is to primarily demonstrate how to make the Python 
to Rust with Python w
 transition. In the second implementation you can see how we can iterate 
through all of the arrays
 ourselves.&lt;/p&gt;
 &lt;p&gt;In this first example, we are hard coding the values of interest, but 
in the following section
-we demonstrate passing these in during initalization.&lt;/p&gt;
+we demonstrate passing these in during initialization.&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-rust"&gt;#[pyfunction]
 pub fn tuple_filter_fn(
     py: Python&amp;lt;'_&amp;gt;,
@@ -856,13 +856,13 @@ how much they have ordered total. We want to ignore small 
orders, which we defin
 import pyarrow as pa
 import pyarrow.compute as pc
 
-IGNORE_THESHOLD = 5000.0
+IGNORE_THRESHOLD = 5000.0
 class AboveThresholdAccum(Accumulator):
     def __init__(self) -&amp;gt; None:
         self._sum = 0.0
 
     def update(self, values: pa.Array) -&amp;gt; None:
-        over_threshold = pc.greater(values, pa.scalar(IGNORE_THESHOLD))
+        over_threshold = pc.greater(values, pa.scalar(IGNORE_THRESHOLD))
         sum_above = pc.sum(values.filter(over_threshold)).as_py()
         if sum_above is None:
             sum_above = 0.0
@@ -996,7 +996,7 @@ to their Rust counterparts.&lt;/li&gt;
 &lt;p&gt;The most significant difference is that we have added wrapper 
functions and classes for most of the
 user facing interface. These wrappers, written in Python, contain both 
documentation and type
 annotations.&lt;/p&gt;
-&lt;p&gt;This documenation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
+&lt;p&gt;This documentation is now available on the &lt;a 
href="https://datafusion.apache.org/python/autoapi/datafusion/index.html"&gt;DataFusion
 in Python API&lt;/a&gt; website. There you can browse
 the available functions and classes to see the breadth of available 
functionality.&lt;/p&gt;
 &lt;p&gt;Modern IDEs use language servers such as
 &lt;a 
href="https://marketplace.visualstudio.com/items?itemName=ms-python.vscode-pylance"&gt;Pylance&lt;/a&gt;
 or
diff --git a/output/feeds/xiangpeng-hao-andrew-lamb.atom.xml 
b/output/feeds/xiangpeng-hao-andrew-lamb.atom.xml
index 155166a..eba0839 100644
--- a/output/feeds/xiangpeng-hao-andrew-lamb.atom.xml
+++ b/output/feeds/xiangpeng-hao-andrew-lamb.atom.xml
@@ -193,8 +193,8 @@ Figure 1 illustrates the difference between the output of 
both string representa
 &lt;h1 id="when-to-gc"&gt;When to GC?&lt;a class="headerlink" 
href="#when-to-gc" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h1&gt;
 &lt;p&gt;Zero-copy &lt;code&gt;take/filter&lt;/code&gt; is great for 
generating large arrays quickly, but it is suboptimal for highly selective 
filters, where most of the strings are filtered out. When the cardinality 
drops, StringViewArray buffers become sparse—only a small subset of the bytes 
in the buffer’s memory are referred to by any &lt;code&gt;view&lt;/code&gt;. 
This leads to excessive memory usage, especially in a &lt;a 
href="https://github.com/apache/datafusion/issues/11628"&gt [...]
 &lt;p&gt;To release unused memory, we implemented a &lt;a 
href="https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc"&gt;garbage
 collection (GC)&lt;/a&gt; routine to consolidate the data into a new buffer to 
release the old sparse buffer(s). As the GC operation copies strings, similarly 
to StringArray, we must be careful about when to call it. If we call GC too 
early, we cause unnecessary copying, losing much of the benefit of 
StringViewArray. If we call GC [...]
-&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
-We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalseceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
+&lt;p&gt;&lt;code&gt;arrow-rs&lt;/code&gt; implements the GC process, but it 
is up to users to decide when to call it. We leverage the semantics of the 
query engine and observed that the &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_plan/coalesce_batches/struct.CoalesceBatchesExec.html"&gt;&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;&lt;/a&gt;
 operator, which merge smaller batches to a larger batch, is often used after 
the record cardinality is expected to shrink, whi [...]
+We, therefore,&lt;a href="https://github.com/apache/datafusion/pull/11587"&gt; 
implemented the GC procedure&lt;/a&gt; inside 
&lt;code&gt;CoalesceBatchesExec&lt;/code&gt;[^5] with a heuristic that 
estimates when the buffers are too sparse.&lt;/p&gt;
 &lt;h2 id="the-art-of-function-inlining-not-too-much-not-too-little"&gt;The 
art of function inlining: not too much, not too little&lt;a class="headerlink" 
href="#the-art-of-function-inlining-not-too-much-not-too-little" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;p&gt;Like string inlining, &lt;em&gt;function&lt;/em&gt; inlining is the 
process of embedding a short function into the caller to avoid the overhead of 
function calls (caller/callee save). 
 Usually, the Rust compiler does a good job of deciding when to inline. 
However, it is possible to override its default using the &lt;a 
href="https://doc.rust-lang.org/reference/attributes/codegen.html#the-inline-attribute"&gt;&lt;code&gt;#[inline(always)]&lt;/code&gt;
 directive&lt;/a&gt;. 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-site) branch asf-site updated: Commit build products

Reply via email to