This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/main by this push:
new b7352a5 Replace inline styles with Bootstrap utility classes (#165)
b7352a5 is described below
commit b7352a5678d0028ad3dffbd274eedb174e6c46b8
Author: Kevin Liu <[email protected]>
AuthorDate: Wed Apr 1 09:21:36 2026 -0700
Replace inline styles with Bootstrap utility classes (#165)
---
content/blog/2024-01-19-datafusion-34.0.0.md | 8 ++++----
content/blog/2024-03-06-comet-donation.md | 3 +--
content/blog/2024-08-20-python-datafusion-40.0.0.md | 6 ++----
content/blog/2025-03-11-ordering-analysis.md | 12 ++++++------
content/blog/2025-03-30-datafusion-python-46.0.0.md | 3 +--
content/blog/2025-04-10-fastest-tpch-generator.md | 20 +++++---------------
content/blog/2025-06-30-cancellation.md | 2 +-
content/blog/2025-07-11-datafusion-47.0.0.md | 2 +-
.../blog/2025-07-14-user-defined-parquet-indexes.md | 14 +++++++-------
.../2025-12-15-avoid-consecutive-repartitions.md | 21 +++++++++------------
10 files changed, 37 insertions(+), 54 deletions(-)
diff --git a/content/blog/2024-01-19-datafusion-34.0.0.md
b/content/blog/2024-01-19-datafusion-34.0.0.md
index 9b49d94..11e7a8e 100644
--- a/content/blog/2024-01-19-datafusion-34.0.0.md
+++ b/content/blog/2024-01-19-datafusion-34.0.0.md
@@ -112,8 +112,8 @@ more than 2x faster on [ClickBench] compared to version
`25.0.0`, as shown below
[ClickBench]: https://benchmark.clickhouse.com/
-<figure style="text-align: center;">
- <img src="/blog/images/datafusion-34.0.0/compare-new.png" width="100%"
class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
+<figure class="text-center">
+ <img src="/blog/images/datafusion-34.0.0/compare-new.png" class="img-fluid"
alt="Fig 1: Adaptive Arrow schema architecture overview.">
<figcaption>
<b>Figure 1</b>: Performance improvement between <code>25.0.0</code> and
<code>34.0.0</code> on ClickBench.
Note that DataFusion <code>25.0.0</code>, could not run several queries
due to
@@ -121,8 +121,8 @@ more than 2x faster on [ClickBench] compared to version
`25.0.0`, as shown below
</figcaption>
</figure>
-<figure style="text-align: center;">
- <img src="/blog/images/datafusion-34.0.0/compare.png" width="100%"
class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
+<figure class="text-center">
+ <img src="/blog/images/datafusion-34.0.0/compare.png" class="img-fluid"
alt="Fig 1: Adaptive Arrow schema architecture overview.">
<figcaption>
<b>Figure 2</b>: Total query runtime for DataFusion <code>34.0.0</code>
and DataFusion <code>25.0.0</code>.
</figcaption>
diff --git a/content/blog/2024-03-06-comet-donation.md
b/content/blog/2024-03-06-comet-donation.md
index 44728f2..4e95762 100644
--- a/content/blog/2024-03-06-comet-donation.md
+++ b/content/blog/2024-03-06-comet-donation.md
@@ -35,10 +35,9 @@ accelerate Spark workloads. It is designed as a drop-in
replacement for Spark's JVM based SQL execution engine and offers significant
performance improvements for some workloads as shown below.
-<figure style="text-align: center;">
+<figure class="text-center">
<img
src="/blog/images/datafusion-comet/comet-architecture.png"
- width="100%"
class="img-fluid"
alt="Fig 1: Adaptive Arrow schema architecture overview."
>
diff --git a/content/blog/2024-08-20-python-datafusion-40.0.0.md
b/content/blog/2024-08-20-python-datafusion-40.0.0.md
index be484ff..b8f1902 100644
--- a/content/blog/2024-08-20-python-datafusion-40.0.0.md
+++ b/content/blog/2024-08-20-python-datafusion-40.0.0.md
@@ -68,10 +68,9 @@ Modern IDEs use language servers such as
hints, and identify usage errors. These are major tools in the python user
community. With this
release, users can fully use these tools in their workflow.
-<figure style="text-align: center;">
+<figure class="text-center">
<img
src="/blog/images/python-datafusion-40.0.0/vscode_hover_tooltip.png"
- width="100%"
class="img-fluid"
alt="Fig 1: Enhanced tooltips in an IDE."
>
@@ -84,10 +83,9 @@ release, users can fully use these tools in their workflow.
By having the type annotations, these IDEs can also identify quickly when a
user has incorrectly
used a function's arguments as shown in Figure 2.
-<figure style="text-align: center;">
+<figure class="text-center">
<img
src="/blog/images/python-datafusion-40.0.0/pylance_error_checking.png"
- width="100%"
class="img-fluid"
alt="Fig 2: Error checking in static analysis"
>
diff --git a/content/blog/2025-03-11-ordering-analysis.md
b/content/blog/2025-03-11-ordering-analysis.md
index 72a71bb..1fe46e0 100644
--- a/content/blog/2025-03-11-ordering-analysis.md
+++ b/content/blog/2025-03-11-ordering-analysis.md
@@ -31,7 +31,7 @@ limitations under the License.
## Introduction
In this blog post, we explain when an ordering requirement of an operator is
satisfied by its input data. This analysis is essential for order-based
optimizations and is often more complex than one might initially think.
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<strong>Ordering Requirement</strong> for an operator describes how the
input data to that operator must be sorted for the operator to compute the
correct result. It is the job of the planner to make sure that these
requirements are satisfied during execution (See DataFusion <a
href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/enforce_sorting/struct.EnforceSorting.html"
target="_blank">EnforceSorting</a> for an implementation of such a rule).
</blockquote>
@@ -134,7 +134,7 @@ Let's start by creating an example table that we will refer
throughout the post.
<br>
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<strong>How can a table have multiple orderings?</strong> At first glance it
may seem counterintuitive for a table to have more than one valid ordering.
However, during query execution such scenarios can arise.
For example consider the following query:
@@ -197,7 +197,7 @@ To solve the shortcomings above DataFusion needs to track
of following propertie
- Equivalent Expression Groups (will be explained shortly)
- Succinct Valid Orderings (will be explained shortly)
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<strong>Note:</strong> These properties are implemented in the
<code>EquivalenceProperties</code> structure in <code>DataFusion</code>, please
see the <a
href="https://github.com/apache/datafusion/blob/f47ea73b87eec4af044f9b9923baf042682615b2/datafusion/physical-expr/src/equivalence/properties/mod.rs#L134"
target="_blank">source</a> for more details<br>
</blockquote>
@@ -210,7 +210,7 @@ For instance in the example table:
- Columns `hostname` and `currency` are constant because every row in the
table has the same value (`'app.example.com'` for `hostname`, and `'USD'` for
`currency`) for these columns.
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<strong>Note:</strong> Constant expressions can arise during query
execution. For example, in following query:<br>
<code>SELECT hostname FROM logs</code><br><code>WHERE
hostname='app.example.com'</code> <br>
after filtering is done, for subsequent operators the
<code>hostname</code> column will be constant.
@@ -221,7 +221,7 @@ Equivalent expression groups are expressions that always
hold the same value acr
In the example table, the expressions `price` and `price_cloned` form one
equivalence group, and `time` and `time_cloned` form another equivalence group.
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<strong>Note:</strong> Equivalent expression groups can arise during the
query execution. For example, in the following query:<br>
<code>SELECT time, time as time_cloned FROM logs</code> <br>
after the projection is done, for subsequent operators <code>time</code>
and <code>time_cloned</code> will form an equivalence group. As another
example, in the following query:<br>
@@ -293,7 +293,7 @@ Following third and fourth constraints for the simplified
table, the succinct va
`[time_bin ASC]`,
`[time ASC]`
-<blockquote style="border-left: 4px solid #007bff; padding: 10px;
background-color: #f8f9fa;">
+<blockquote class="border-start border-primary border-4 ps-3 py-2 bg-light">
<p><strong>How can DataFusion find orderings?</strong></p>
DataFusion's <code>CREATE EXTERNAL TABLE</code> has a <code>WITH ORDER</code>
clause (see <a
href="https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table">docs</a>)
to specify the known orderings of the table during table creation. For example
the following query:<br>
<pre><code>
diff --git a/content/blog/2025-03-30-datafusion-python-46.0.0.md
b/content/blog/2025-03-30-datafusion-python-46.0.0.md
index f854621..d5114b6 100644
--- a/content/blog/2025-03-30-datafusion-python-46.0.0.md
+++ b/content/blog/2025-03-30-datafusion-python-46.0.0.md
@@ -177,10 +177,9 @@ for the user to click on it to expand the data.
In the below view you can see an example of some of these features such as the
expandable text and scroll bars.
-<figure style="text-align: center;">
+<figure class="text-center">
<img
src="/blog/images/python-datafusion-46.0.0/html_rendering.png"
- width="100%"
class="img-fluid"
alt="Fig 1: Example html rendering in a jupyter notebook."
>
diff --git a/content/blog/2025-04-10-fastest-tpch-generator.md
b/content/blog/2025-04-10-fastest-tpch-generator.md
index 1718221..3736c74 100644
--- a/content/blog/2025-04-10-fastest-tpch-generator.md
+++ b/content/blog/2025-04-10-fastest-tpch-generator.md
@@ -25,16 +25,6 @@ limitations under the License.
[TOC]
-<style>
-/* Table borders */
-table, th, td {
- border: 1px solid black;
- border-collapse: collapse;
-}
-th, td {
- padding: 3px;
-}
-</style>
**TLDR: TPC-H SF=100 in 1min using tpchgen-rs vs 30min+ with dbgen**.
3 members of the [Apache DataFusion] community used Rust and open source
@@ -135,7 +125,7 @@ bound on the Scale Factor.
**Figure 2**: Example TBL formatted output of `dbgen` for the `LINEITEM` table
-<table>
+<table class="table table-bordered">
<tr>
<td><strong>Scale Factor</strong>
</td>
@@ -308,7 +298,7 @@ compatible port, and knew about the performance
shortcomings and how to approach
them.
-<table>
+<table class="table table-bordered">
<tr>
<td><strong>Scale Factor</strong>
</td>
@@ -356,7 +346,7 @@ list of optimizations:
At the time of writing, single threaded performance is now 2.5x-2.7x faster
than the initial version, as shown in Table 3.
-<table>
+<table class="table table-bordered">
<tr>
<td><strong>Scale Factor</strong>
</td>
@@ -412,7 +402,7 @@ When writing to `/dev/null` tpchgen generates the entire
dataset in 25 seconds
(4 GB/s).
-<table>
+<table class="table table-bordered">
<tr>
<td><strong>Scale Factor</strong>
</td>
@@ -516,7 +506,7 @@ or CSV, tpchgen-cli creates the full SF=100 parquet format
dataset in less than
[a small 300 line PR]: https://github.com/clflushopt/tpchgen-rs/pull/61
[Rust Parquet writer]: https://crates.io/crates/parquet
-<table>
+<table class="table table-bordered">
<tr>
<td><strong>Scale Factor</strong>
</td>
diff --git a/content/blog/2025-06-30-cancellation.md
b/content/blog/2025-06-30-cancellation.md
index 76ee19f..fb2d465 100644
--- a/content/blog/2025-06-30-cancellation.md
+++ b/content/blog/2025-06-30-cancellation.md
@@ -359,7 +359,7 @@ To illustrate what this process looks like, let's have a
look at the execution o
If we assume a task budget of 1 unit, each time Tokio schedules the task would
result in the following sequence of function calls.
<figure>
-<img src="/blog/images/task-cancellation/tokio_budget.png" style="width: 100%;
max-width: 100%" class="img-fluid" alt="Sequence diagram showing how the tokio
task budget is used and reset."
+<img src="/blog/images/task-cancellation/tokio_budget.png" class="img-fluid"
alt="Sequence diagram showing how the tokio task budget is used and reset."
/>
<figcaption>Tokio task budget system, assuming the task budget is set to 1,
for the plan above.</figcaption>
</figure>
diff --git a/content/blog/2025-07-11-datafusion-47.0.0.md
b/content/blog/2025-07-11-datafusion-47.0.0.md
index 2cf56ca..95b4730 100644
--- a/content/blog/2025-07-11-datafusion-47.0.0.md
+++ b/content/blog/2025-07-11-datafusion-47.0.0.md
@@ -193,7 +193,7 @@ logging, or metrics) across thread boundaries without
depending on any specific
You can use the [JoinSetTracer] API to instrument DataFusion plans with your
own tracing or logging libraries, or
use pre-integrated community crates such as the [datafusion-tracing] crate.
-<div style="text-align: center;">
+<div class="text-center">
<a href="https://github.com/datafusion-contrib/datafusion-tracing">
<img
src="/blog/images/datafusion-47.0.0/datafusion-telemetry.png"
diff --git a/content/blog/2025-07-14-user-defined-parquet-indexes.md
b/content/blog/2025-07-14-user-defined-parquet-indexes.md
index 7f4fd08..cc809bb 100644
--- a/content/blog/2025-07-14-user-defined-parquet-indexes.md
+++ b/content/blog/2025-07-14-user-defined-parquet-indexes.md
@@ -173,24 +173,24 @@ A **distinct value index** stores the unique values of a
specific column. This t
For example, if the files contain a column named `Category` like this:
-<table style="border-collapse:collapse;">
+<table class="table table-bordered">
<tr>
- <td style="border:1px solid #888;padding:2px
6px;"><b><code>Category</code></b></td>
+ <td><b><code>Category</code></b></td>
</tr>
<tr>
- <td style="border:1px solid #888;padding:2px 6px;"><code>foo</code></td>
+ <td><code>foo</code></td>
</tr>
<tr>
- <td style="border:1px solid #888;padding:2px 6px;"><code>bar</code></td>
+ <td><code>bar</code></td>
</tr>
<tr>
- <td style="border:1px solid #888;padding:2px 6px;"><code>...</code></td>
+ <td><code>...</code></td>
</tr>
<tr>
- <td style="border:1px solid #888;padding:2px 6px;"><code>baz</code></td>
+ <td><code>baz</code></td>
</tr>
<tr>
- <td style="border:1px solid #888;padding:2px 6px;"><code>foo</code></td>
+ <td><code>foo</code></td>
</tr>
</table>
diff --git a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
index 7b1d8ad..d8a1cab 100644
--- a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
+++ b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
@@ -25,18 +25,17 @@ limitations under the License.
{% endcomment %}
-->
-<div style="display: flex; align-items: center; gap: 20px; margin-bottom:
20px;">
-<div style="flex: 1;">
+<div class="row align-items-center mb-3">
+<div class="col-md-7">
Databases are some of the most complex yet interesting pieces of software.
They are amazing pieces of abstraction: query engines optimize and execute
complex plans, storage engines provide sophisticated infrastructure as the
backbone of the system, while intricate file formats lay the groundwork for
particular workloads. All of this is exposed by a user-friendly interface and
query languages (typically a dialect of SQL).
<br><br>
Starting a journey learning about database internals can be daunting. With so
many topics that are whole PhD degrees themselves, finding a place to start is
difficult. In this blog post, I will share my early journey in the database
world and a quick lesson on one of the first topics I dove into. If you are new
to the space, this post will help you get your first foot into the database
world, and if you are already a veteran, you may still learn something new.
</div>
-<div style="flex: 0 0 40%; text-align: center;">
+<div class="col-md-5 text-center">
<img
src="/blog/images/avoid-consecutive-repartitions/database_system_diagram.png"
- width="100%"
class="img-fluid"
alt="Database System Components"
/>
@@ -122,18 +121,17 @@ Partitioning is a "divide-and-conquer" approach to
executing a query. Each parti
#### **Round-Robin Repartitioning**
-<div style="display: flex; align-items: top; gap: 20px; margin-bottom: 20px;">
-<div style="flex: 1;">
+<div class="row align-items-start mb-3">
+<div class="col-md-9">
Round-robin repartitioning is the simplest partitioning strategy. Incoming
data is processed in batches (chunks of rows), and these batches are
distributed across partitions cyclically or sequentially, with each new batch
assigned to the next available partition.
<br><br>
Round-robin repartitioning is useful when the data grouping isn't known or
when aiming for an even distribution across partitions. Because it simply
assigns batches in order without inspecting their contents, it is a
low-overhead way to increase parallelism for downstream operations.
</div>
-<div style="flex: 0 0 25%; text-align: center;">
+<div class="col-md-3 text-center">
<img
src="/blog/images/avoid-consecutive-repartitions/round_robin_repartitioning.png"
- width="100%"
class="img-fluid"
alt="Round-Robin Repartitioning"
/>
@@ -142,18 +140,17 @@ Round-robin repartitioning is useful when the data
grouping isn't known or when
#### **Hash Repartitioning**
-<div style="display: flex; align-items: top; gap: 20px; margin-bottom: 20px;">
-<div style="flex: 1;">
+<div class="row align-items-start mb-3">
+<div class="col-md-9">
Hash repartitioning distributes data based on a hash function applied to one
or more columns, called the partitioning key. Rows with the same hash value are
placed in the same partition.
<br><br>
Hash repartitioning is useful when working with grouped data. Imagine you have
a database containing information on company sales, and you are looking to find
the total revenue each store produced. Hash repartitioning would make this
query much more efficient. Rather than iterating over the data on a single
thread and keeping a running sum for each store, it would be better to hash
repartition on the store column and have multiple threads calculate individual
store sales.
</div>
-<div style="flex: 0 0 25%; text-align: center;">
+<div class="col-md-3 text-center">
<img
src="/blog/images/avoid-consecutive-repartitions/hash_repartitioning.png"
- width="100%"
class="img-fluid"
alt="Hash Repartitioning"
/>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]