This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e0f0b4c Commit build products
e0f0b4c is described below
commit e0f0b4c2ef77f6de159930b7f103c6a42c9921ec
Author: Build Pelican (action) <[email protected]>
AuthorDate: Thu Dec 18 12:03:12 2025 +0000
Commit build products
---
output/2025/12/15/avoid-consecutive-repartitions/index.html | 6 +++---
output/author/gene-bordegaray.html | 2 +-
output/category/blog.html | 2 +-
output/feed.xml | 2 +-
output/feeds/all-en.atom.xml | 4 ++--
output/feeds/blog.atom.xml | 4 ++--
output/feeds/gene-bordegaray.atom.xml | 4 ++--
output/feeds/gene-bordegaray.rss.xml | 2 +-
output/index.html | 2 +-
9 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/output/2025/12/15/avoid-consecutive-repartitions/index.html
b/output/2025/12/15/avoid-consecutive-repartitions/index.html
index db09d48..951b523 100644
--- a/output/2025/12/15/avoid-consecutive-repartitions/index.html
+++ b/output/2025/12/15/avoid-consecutive-repartitions/index.html
@@ -4,7 +4,7 @@
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
- <title>Optimizing Repartitions in DataFusion: How I Went From Database
Nood to Core Contribution - Apache DataFusion Blog</title>
+ <title>Optimizing Repartitions in DataFusion: How I Went From Database
Noob to Core Contribution - Apache DataFusion Blog</title>
<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
<link href="/blog/css/headerlink.css" rel="stylesheet">
@@ -40,7 +40,7 @@
<div class="row justify-content-center">
<div class="col-12 col-md-8 main-content">
<h1>
- Optimizing Repartitions in DataFusion: How I Went From Database Nood
to Core Contribution
+ Optimizing Repartitions in DataFusion: How I Went From Database Noob
to Core Contribution
</h1>
<p>Posted on: Mon 15 December 2025 by Gene Bordegaray</p>
@@ -154,7 +154,7 @@ Hash repartitioning is useful when working with grouped
data. Imagine you have a
<hr/>
<h2 id="why-dont-we-want-consecutive-repartitions"><strong>Why Don’t We
Want Consecutive Repartitions?</strong><a class="headerlink"
href="#why-dont-we-want-consecutive-repartitions" title="Permanent
link">¶</a></h2>
<p>Repartitions would appear back-to-back in query plans, specifically a
round-robin followed by a hash repartition.</p>
-<p>Why is this such a big deal? Well, repartitions do not process the data;
their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches, the [...]
+<p>Why is this such a big deal? Well, repartitions do not process the data;
their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches, the [...]
<div class="text-center">
<img alt="Consecutive Repartition Query Plan With Data" class="img-responsive"
src="/blog/images/avoid-consecutive-repartitions/in_depth_before_query_plan.png"
width="65%"/>
</div>
diff --git a/output/author/gene-bordegaray.html
b/output/author/gene-bordegaray.html
index 8a76aa6..b1ac65e 100644
--- a/output/author/gene-bordegaray.html
+++ b/output/author/gene-bordegaray.html
@@ -21,7 +21,7 @@
<ol id="post-list">
<li><article class="hentry">
- <header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions"
rel="bookmark" title="Permalink to Optimizing Repartitions in DataFusion: How
I Went From Database Nood to Core Contribution">Optimizing Repartitions in
DataFusion: How I Went From Database Nood to Core Contribution</a></h2>
</header>
+ <header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions"
rel="bookmark" title="Permalink to Optimizing Repartitions in DataFusion: How
I Went From Database Noob to Core Contribution">Optimizing Repartitions in
DataFusion: How I Went From Database Noob to Core Contribution</a></h2>
</header>
<footer class="post-info">
<time class="published"
datetime="2025-12-15T00:00:00+00:00"> Mon 15 December 2025 </time>
<address class="vcard author">By
diff --git a/output/category/blog.html b/output/category/blog.html
index 47de167..c027263 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -22,7 +22,7 @@
<ol id="post-list">
<li><article class="hentry">
- <header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions"
rel="bookmark" title="Permalink to Optimizing Repartitions in DataFusion: How
I Went From Database Nood to Core Contribution">Optimizing Repartitions in
DataFusion: How I Went From Database Nood to Core Contribution</a></h2>
</header>
+ <header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions"
rel="bookmark" title="Permalink to Optimizing Repartitions in DataFusion: How
I Went From Database Noob to Core Contribution">Optimizing Repartitions in
DataFusion: How I Went From Database Noob to Core Contribution</a></h2>
</header>
<footer class="post-info">
<time class="published"
datetime="2025-12-15T00:00:00+00:00"> Mon 15 December 2025 </time>
<address class="vcard author">By
diff --git a/output/feed.xml b/output/feed.xml
index b2797fe..d95ba53 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
15 Dec 2025 00:00:00 +0000</lastBuildDate><item><title>Optimizing Repartitions
in DataFusion: How I Went From Database Nood to Core
Contribution</title><link>https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
15 Dec 2025 00:00:00 +0000</lastBuildDate><item><title>Optimizing Repartitions
in DataFusion: How I Went From Database Noob to Core
Contribution</title><link>https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 7e1b898..1182ba7 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Nood to Core
Contribution</title><link href="https://datafusion.ap [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Noob to Core
Contribution</title><link href="https://datafusion.ap [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -129,7 +129,7 @@ Hash repartitioning is useful when working with grouped
data. Imagine you have a
<hr/>
<h2 id="why-dont-we-want-consecutive-repartitions"><strong>Why
Don&rsquo;t We Want Consecutive Repartitions?</strong><a
class="headerlink" href="#why-dont-we-want-consecutive-repartitions"
title="Permanent link">&para;</a></h2>
<p>Repartitions would appear back-to-back in query plans, specifically a
round-robin followed by a hash repartition.</p>
-<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
+<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
<div class="text-center">
<img alt="Consecutive Repartition Query Plan With Data"
class="img-responsive"
src="/blog/images/avoid-consecutive-repartitions/in_depth_before_query_plan.png"
width="65%"/>
</div>
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index fa53f1a..6154174 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Nood to Core
Contribution</title><link href="https://datafusi [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Noob to Core
Contribution</title><link href="https://datafusi [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -129,7 +129,7 @@ Hash repartitioning is useful when working with grouped
data. Imagine you have a
<hr/>
<h2 id="why-dont-we-want-consecutive-repartitions"><strong>Why
Don&rsquo;t We Want Consecutive Repartitions?</strong><a
class="headerlink" href="#why-dont-we-want-consecutive-repartitions"
title="Permanent link">&para;</a></h2>
<p>Repartitions would appear back-to-back in query plans, specifically a
round-robin followed by a hash repartition.</p>
-<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
+<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
<div class="text-center">
<img alt="Consecutive Repartition Query Plan With Data"
class="img-responsive"
src="/blog/images/avoid-consecutive-repartitions/in_depth_before_query_plan.png"
width="65%"/>
</div>
diff --git a/output/feeds/gene-bordegaray.atom.xml
b/output/feeds/gene-bordegaray.atom.xml
index 037fd47..b07c39c 100644
--- a/output/feeds/gene-bordegaray.atom.xml
+++ b/output/feeds/gene-bordegaray.atom.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - Gene
Bordegaray</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/gene-bordegaray.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Nood to Core
Contribution</title><link [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - Gene
Bordegaray</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/gene-bordegaray.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-12-15T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Optimizing
Repartitions in DataFusion: How I Went From Database Noob to Core
Contribution</title><link [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -129,7 +129,7 @@ Hash repartitioning is useful when working with grouped
data. Imagine you have a
<hr/>
<h2 id="why-dont-we-want-consecutive-repartitions"><strong>Why
Don&rsquo;t We Want Consecutive Repartitions?</strong><a
class="headerlink" href="#why-dont-we-want-consecutive-repartitions"
title="Permanent link">&para;</a></h2>
<p>Repartitions would appear back-to-back in query plans, specifically a
round-robin followed by a hash repartition.</p>
-<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
+<p>Why is this such a big deal? Well, repartitions do not process the
data; their purpose is to redistribute it in ways that enable more efficient
computation for other operators. Having consecutive repartitions is
counterintuitive because we are redistributing data, then immediately
redistributing it again, making the first repartition pointless. While this
didn't create extreme overhead for queries, since round-robin repartitioning
does not copy data, just the pointers to batches [...]
<div class="text-center">
<img alt="Consecutive Repartition Query Plan With Data"
class="img-responsive"
src="/blog/images/avoid-consecutive-repartitions/in_depth_before_query_plan.png"
width="65%"/>
</div>
diff --git a/output/feeds/gene-bordegaray.rss.xml
b/output/feeds/gene-bordegaray.rss.xml
index 6e80a8b..6a87e03 100644
--- a/output/feeds/gene-bordegaray.rss.xml
+++ b/output/feeds/gene-bordegaray.rss.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog - Gene
Bordegaray</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
15 Dec 2025 00:00:00 +0000</lastBuildDate><item><title>Optimizing Repartitions
in DataFusion: How I Went From Database Nood to Core
Contribution</title><link>https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion Blog - Gene
Bordegaray</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
15 Dec 2025 00:00:00 +0000</lastBuildDate><item><title>Optimizing Repartitions
in DataFusion: How I Went From Database Noob to Core
Contribution</title><link>https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/index.html b/output/index.html
index 17fa6f1..19481fa 100644
--- a/output/index.html
+++ b/output/index.html
@@ -51,7 +51,7 @@
<article class="post">
<header>
<div class="title">
- <h1><a
href="/blog/2025/12/15/avoid-consecutive-repartitions">Optimizing Repartitions
in DataFusion: How I Went From Database Nood to Core Contribution</a></h1>
+ <h1><a
href="/blog/2025/12/15/avoid-consecutive-repartitions">Optimizing Repartitions
in DataFusion: How I Went From Database Noob to Core Contribution</a></h1>
<p>Posted on: Mon 15 December 2025 by Gene
Bordegaray</p>
<p><!--
{% comment %}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]