This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/main by this push:
new 12d5ee2 Site/gene.bordegaray/2025/12/consecutive repartitions blog
post title (#129)
12d5ee2 is described below
commit 12d5ee2e39b8113643f07ddc48f8dedc4c12e2d8
Author: Gene Bordegaray <[email protected]>
AuthorDate: Thu Dec 18 04:02:48 2025 -0800
Site/gene.bordegaray/2025/12/consecutive repartitions blog post title (#129)
* initial blog post
* better images and formatting
* realigned some images
* added links for Nga and Andrew's github
* added links for Nga and Andrew's github
* fixed to DataFusion and some word selection
* reformatted some images for clarity and minor changes to punctuation
* Update file name to match publish date
* updated images
* fix title
---------
Co-authored-by: Andrew Lamb <[email protected]>
---
content/blog/2025-12-15-avoid-consecutive-repartitions.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
index a236977..c3d3c4d 100644
--- a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
+++ b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
@@ -1,6 +1,6 @@
---
layout: post
-title: Optimizing Repartitions in DataFusion: How I Went From Database Nood to
Core Contribution
+title: Optimizing Repartitions in DataFusion: How I Went From Database Noob to
Core Contribution
date: 2025-12-15
author: Gene Bordegaray
categories: [tutorial]
@@ -198,7 +198,7 @@ SELECT a, SUM(b) FROM data.parquet GROUP BY a;
Repartitions would appear back-to-back in query plans, specifically a
round-robin followed by a hash repartition.
-Why is this such a big deal? Well, repartitions do not process the data; their
purpose is to redistribute it in ways that enable more efficient computation
for other operators. Having consecutive repartitions is counterintuitive
because we are redistributing data, then immediately redistributing it again,
making the first repartition pointless. While this didn't create extreme
overhead for queries, since round-robin repartitioning does not copy data, just
the pointers to batches, the beh [...]
+Why is this such a big deal? Well, repartitions do not process the data; their
purpose is to redistribute it in ways that enable more efficient computation
for other operators. Having consecutive repartitions is counterintuitive
because we are redistributing data, then immediately redistributing it again,
making the first repartition pointless. While this didn't create extreme
overhead for queries, since round-robin repartitioning does not copy data, just
the pointers to batches, the beh [...]
<div class="text-center">
<img
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]