This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/main by this push:
new 10ed01c s/img-responsive/img-fluid/g (#164)
10ed01c is described below
commit 10ed01c46781e81e230b49f20d69663ee7bd455f
Author: Kevin Liu <[email protected]>
AuthorDate: Tue Mar 31 10:46:34 2026 -0700
s/img-responsive/img-fluid/g (#164)
---
content/blog/2024-01-19-datafusion-34.0.0.md | 4 ++--
content/blog/2024-03-06-comet-donation.md | 2 +-
content/blog/2024-07-20-datafusion-comet-0.1.0.md | 4 ++--
.../blog/2024-08-20-python-datafusion-40.0.0.md | 4 ++--
content/blog/2024-08-28-datafusion-comet-0.2.0.md | 4 ++--
...9-13-string-view-german-style-strings-part-1.md | 12 +++++-----
...9-13-string-view-german-style-strings-part-2.md | 4 ++--
...usion-fastest-single-node-parquet-clickbench.md | 14 ++++++------
content/blog/2025-01-17-datafusion-comet-0.5.0.md | 4 ++--
.../blog/2025-02-02-datafusion-ballista-43.0.0.md | 8 +++----
content/blog/2025-02-20-datafusion-45.0.0.md | 2 +-
content/blog/2025-03-11-ordering-analysis.md | 2 +-
content/blog/2025-03-20-datafusion-comet-0.7.0.md | 2 +-
content/blog/2025-03-20-parquet-pruning.md | 4 ++--
content/blog/2025-03-21-parquet-pushdown.md | 10 ++++-----
content/blog/2025-03-24-datafusion-46.0.0.md | 2 +-
.../blog/2025-03-30-datafusion-python-46.0.0.md | 2 +-
content/blog/2025-04-10-fastest-tpch-generator.md | 6 ++---
...025-06-15-optimizing-sql-dataframes-part-one.md | 10 ++++-----
...025-06-15-optimizing-sql-dataframes-part-two.md | 16 ++++++-------
content/blog/2025-06-30-cancellation.md | 2 +-
content/blog/2025-07-01-datafusion-comet-0.9.0.md | 2 +-
content/blog/2025-07-11-datafusion-47.0.0.md | 2 +-
.../2025-07-14-user-defined-parquet-indexes.md | 4 ++--
content/blog/2025-07-28-datafusion-49.0.0.md | 4 ++--
.../blog/2025-08-15-external-parquet-indexes.md | 16 ++++++-------
content/blog/2025-09-10-dynamic-filters.md | 10 ++++-----
.../blog/2025-09-21-custom-types-using-metadata.md | 2 +-
content/blog/2025-09-29-datafusion-50.0.0.md | 2 +-
content/blog/2025-10-21-datafusion-comet-0.11.0.md | 4 ++--
content/blog/2025-11-25-datafusion-51.0.0.md | 4 ++--
.../2025-12-15-avoid-consecutive-repartitions.md | 26 +++++++++++-----------
content/blog/2026-01-12-extending-sql.md | 2 +-
content/blog/2026-02-02-datafusion_case.md | 16 ++++++-------
34 files changed, 106 insertions(+), 106 deletions(-)
diff --git a/content/blog/2024-01-19-datafusion-34.0.0.md
b/content/blog/2024-01-19-datafusion-34.0.0.md
index 2f95ccc..9b49d94 100644
--- a/content/blog/2024-01-19-datafusion-34.0.0.md
+++ b/content/blog/2024-01-19-datafusion-34.0.0.md
@@ -113,7 +113,7 @@ more than 2x faster on [ClickBench] compared to version
`25.0.0`, as shown below
[ClickBench]: https://benchmark.clickhouse.com/
<figure style="text-align: center;">
- <img src="/blog/images/datafusion-34.0.0/compare-new.png" width="100%"
class="img-responsive" alt="Fig 1: Adaptive Arrow schema architecture
overview.">
+ <img src="/blog/images/datafusion-34.0.0/compare-new.png" width="100%"
class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
<figcaption>
<b>Figure 1</b>: Performance improvement between <code>25.0.0</code> and
<code>34.0.0</code> on ClickBench.
Note that DataFusion <code>25.0.0</code>, could not run several queries
due to
@@ -122,7 +122,7 @@ more than 2x faster on [ClickBench] compared to version
`25.0.0`, as shown below
</figure>
<figure style="text-align: center;">
- <img src="/blog/images/datafusion-34.0.0/compare.png" width="100%"
class="img-responsive" alt="Fig 1: Adaptive Arrow schema architecture
overview.">
+ <img src="/blog/images/datafusion-34.0.0/compare.png" width="100%"
class="img-fluid" alt="Fig 1: Adaptive Arrow schema architecture overview.">
<figcaption>
<b>Figure 2</b>: Total query runtime for DataFusion <code>34.0.0</code>
and DataFusion <code>25.0.0</code>.
</figcaption>
diff --git a/content/blog/2024-03-06-comet-donation.md
b/content/blog/2024-03-06-comet-donation.md
index 660a064..44728f2 100644
--- a/content/blog/2024-03-06-comet-donation.md
+++ b/content/blog/2024-03-06-comet-donation.md
@@ -39,7 +39,7 @@ performance improvements for some workloads as shown below.
<img
src="/blog/images/datafusion-comet/comet-architecture.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Fig 1: Adaptive Arrow schema architecture overview."
>
<figcaption>
diff --git a/content/blog/2024-07-20-datafusion-comet-0.1.0.md
b/content/blog/2024-07-20-datafusion-comet-0.1.0.md
index 8d41d8f..6167225 100644
--- a/content/blog/2024-07-20-datafusion-comet-0.1.0.md
+++ b/content/blog/2024-07-20-datafusion-comet-0.1.0.md
@@ -88,7 +88,7 @@ for details of the environment used for these benchmarks.
<img
src="/blog/images/comet-0.1.0/tpch_allqueries.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.1.0"
/>
@@ -105,7 +105,7 @@ The following chart shows how much Comet currently
accelerates each query from t
<img
src="/blog/images/comet-0.1.0/tpch_queries_speedup.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.1.0"
/>
diff --git a/content/blog/2024-08-20-python-datafusion-40.0.0.md
b/content/blog/2024-08-20-python-datafusion-40.0.0.md
index dd3b4e6..be484ff 100644
--- a/content/blog/2024-08-20-python-datafusion-40.0.0.md
+++ b/content/blog/2024-08-20-python-datafusion-40.0.0.md
@@ -72,7 +72,7 @@ release, users can fully use these tools in their workflow.
<img
src="/blog/images/python-datafusion-40.0.0/vscode_hover_tooltip.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Fig 1: Enhanced tooltips in an IDE."
>
<figcaption>
@@ -88,7 +88,7 @@ used a function's arguments as shown in Figure 2.
<img
src="/blog/images/python-datafusion-40.0.0/pylance_error_checking.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Fig 2: Error checking in static analysis"
>
<figcaption>
diff --git a/content/blog/2024-08-28-datafusion-comet-0.2.0.md
b/content/blog/2024-08-28-datafusion-comet-0.2.0.md
index ff17da6..bc46485 100644
--- a/content/blog/2024-08-28-datafusion-comet-0.2.0.md
+++ b/content/blog/2024-08-28-datafusion-comet-0.2.0.md
@@ -86,7 +86,7 @@ Comet 0.2.0 provides a 62% speedup compared to Spark. This is
slightly better th
<img
src="/blog/images/comet-0.2.0/tpch_allqueries.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.2.0"
/>
@@ -98,7 +98,7 @@ Comet 0.1.0, which did not provide any speedup for this
benchmark.
<img
src="/blog/images/comet-0.2.0/tpcds_allqueries.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-DS benchmark results for Comet 0.2.0"
/>
diff --git a/content/blog/2024-09-13-string-view-german-style-strings-part-1.md
b/content/blog/2024-09-13-string-view-german-style-strings-part-1.md
index 6f8770b..f9f5571 100644
--- a/content/blog/2024-09-13-string-view-german-style-strings-part-1.md
+++ b/content/blog/2024-09-13-string-view-german-style-strings-part-1.md
@@ -47,7 +47,7 @@ StringView support was released as part of [arrow-rs
v52.2.0](https://crates.io/
<img
src="/blog/images/string-view-1/figure1-performance.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="End to end performance improvements for ClickBench queries"
/>
@@ -61,7 +61,7 @@ Figure 1: StringView improves string-intensive ClickBench
query performance by 2
<img
src="/blog/images/string-view-1/figure2-string-view.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Diagram of using StringArray and StringViewArray to represent the same
string content"
/>
@@ -121,7 +121,7 @@ On the other hand, reading Parquet data as a
StringViewArray can re-use the same
<img
src="/blog/images/string-view-1/figure4-copying.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Diagram showing how StringViewArray can avoid copying by reusing decoded
Parquet pages."
/>
@@ -147,7 +147,7 @@ Strings are stored as byte sequences. When reading data
from (potentially untrus
<img
src="/blog/images/string-view-1/figure5-loading-strings.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Figure showing time to load strings from Parquet and the effect of
optimized UTF-8 validation."
/>
@@ -162,7 +162,7 @@ UTF-8 validation in Rust is highly optimized and favors
longer strings (as shown
<img
src="/blog/images/string-view-1/figure6-utf8-validation.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Figure showing UTF-8 validation throughput vs string length."
/>
@@ -212,7 +212,7 @@ With StringViewArray we saw a 24% end-to-end performance
improvement, as shown i
<img
src="/blog/images/string-view-1/figure7-end-to-end.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Figure showing StringView improves end to end performance by 24 percent."
/>
diff --git a/content/blog/2024-09-13-string-view-german-style-strings-part-2.md
b/content/blog/2024-09-13-string-view-german-style-strings-part-2.md
index 7fb64f5..34114b1 100644
--- a/content/blog/2024-09-13-string-view-german-style-strings-part-2.md
+++ b/content/blog/2024-09-13-string-view-german-style-strings-part-2.md
@@ -66,7 +66,7 @@ Figure 1 illustrates the difference between the output of
both string representa
<img
src="/blog/images/string-view-2/figure1-zero-copy-take.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Diagram showing Zero-copy `take`/`filter` for StringViewArray"
/>
@@ -121,7 +121,7 @@ To eliminate the impact of the faster Parquet reading using
StringViewArray (see
<img
src="/blog/images/string-view-2/figure2-filter-time.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Figure showing StringViewArray reduces the filter time by 32% on
ClickBench query 22."
/>
diff --git
a/content/blog/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
b/content/blog/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
index a71b8a0..82762e3 100644
---
a/content/blog/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
+++
b/content/blog/2024-11-18-datafusion-fastest-single-node-parquet-clickbench.md
@@ -45,14 +45,14 @@ been held by traditional C/C++-based engines.
<img
src="/blog/images/2x_bgwhite_original.png"
width="80%"
-class="img-responsive"
+class="img-fluid"
alt="Apache DataFusion Logo"
/>
<img
src="/blog/images/clickbench-datafusion-43/perf.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="ClickBench performance for DataFusion 43.0.0"
/>
@@ -97,7 +97,7 @@ Figure 2.
<img
src="/blog/images/clickbench-datafusion-43/perf-over-time.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="ClickBench performance results over time for DataFusion"
/>
@@ -134,7 +134,7 @@ resulted in measurable performance improvements.
<img
src="/blog/images/clickbench-datafusion-43/string-view-take.png"
width="80%"
-class="img-responsive"
+class="img-fluid"
alt="Illustration of how take works with StringView"
/>
@@ -216,7 +216,7 @@ bypass the first phase when it is not working efficiently,
shown in Figure 4.
<img
src="/blog/images/clickbench-datafusion-43/skipping-partial-aggregation.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Two phase aggregation diagram from DataFusion API docs annotated to show
first phase not helping"
/>
@@ -253,7 +253,7 @@ length strings and binary data].
<img
src="/blog/images/clickbench-datafusion-43/row-based-storage.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Row based storage for multiple group columns"
/>
@@ -276,7 +276,7 @@ at the [one shipped in DataFusion `43.0.0`], shown in
Figure 6.
<img
src="/blog/images/clickbench-datafusion-43/column-based-storage.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Column based storage for multiple group columns"
/>
diff --git a/content/blog/2025-01-17-datafusion-comet-0.5.0.md
b/content/blog/2025-01-17-datafusion-comet-0.5.0.md
index bf3a956..dc9e439 100644
--- a/content/blog/2025-01-17-datafusion-comet-0.5.0.md
+++ b/content/blog/2025-01-17-datafusion-comet-0.5.0.md
@@ -52,14 +52,14 @@ Comet 0.5.0 achieves a 1.9x speedup for single-node TPC-H @
100 GB, an improveme
<img
src="/blog/images/comet-0.5.0/tpch_allqueries.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.5.0"
/>
<img
src="/blog/images/comet-0.5.0/tpch_queries_compare.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.5.0"
/>
diff --git a/content/blog/2025-02-02-datafusion-ballista-43.0.0.md
b/content/blog/2025-02-02-datafusion-ballista-43.0.0.md
index 7fac032..1a2d7cb 100644
--- a/content/blog/2025-02-02-datafusion-ballista-43.0.0.md
+++ b/content/blog/2025-02-02-datafusion-ballista-43.0.0.md
@@ -91,7 +91,7 @@ Per query comparison:
<img
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Per query comparison"
/>
@@ -100,7 +100,7 @@ Relative speedup:
<img
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Relative speedup graph"
/>
@@ -109,7 +109,7 @@ The overall speedup is 2.9x
<img
src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png"
width="50%"
-class="img-responsive"
+class="img-fluid"
alt="Overall speedup"
/>
@@ -120,7 +120,7 @@ Ballista now has a new logo, which is visually similar to
other DataFusion proje
<img
src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png"
width="50%"
-class="img-responsive"
+class="img-fluid"
alt="New logo"
/>
diff --git a/content/blog/2025-02-20-datafusion-45.0.0.md
b/content/blog/2025-02-20-datafusion-45.0.0.md
index 9471f5b..d887374 100644
--- a/content/blog/2025-02-20-datafusion-45.0.0.md
+++ b/content/blog/2025-02-20-datafusion-45.0.0.md
@@ -152,7 +152,7 @@ more improvements].
<img
src="/blog/images/datafusion-45.0.0/performance_over_time.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="ClickBench performance results over time for DataFusion"
/>
diff --git a/content/blog/2025-03-11-ordering-analysis.md
b/content/blog/2025-03-11-ordering-analysis.md
index 121adf4..72a71bb 100644
--- a/content/blog/2025-03-11-ordering-analysis.md
+++ b/content/blog/2025-03-11-ordering-analysis.md
@@ -332,7 +332,7 @@ using the orderings of the query intermediates.<br>
<img
src="/blog/images/ordering_analysis/query_window_plan.png"
width="80%"
-class="img-responsive"
+class="img-fluid"
alt="Window Query Datafusion Optimization"
/>
<figcaption><strong>Figure 1:</strong> DataFusion analyzes orderings of the
sources and query intermediates to generate efficient plans</figcaption>
diff --git a/content/blog/2025-03-20-datafusion-comet-0.7.0.md
b/content/blog/2025-03-20-datafusion-comet-0.7.0.md
index 45cc12c..1e6b228 100644
--- a/content/blog/2025-03-20-datafusion-comet-0.7.0.md
+++ b/content/blog/2025-03-20-datafusion-comet-0.7.0.md
@@ -56,7 +56,7 @@ CPU and RAM. Even with **half the resources**, Comet still
provides a measurable
<img
src="/blog/images/comet-0.7.0/performance.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Chart showing TPC-H benchmark results for Comet 0.7.0"
/>
diff --git a/content/blog/2025-03-20-parquet-pruning.md
b/content/blog/2025-03-20-parquet-pruning.md
index 5c268a9..0b78ea4 100644
--- a/content/blog/2025-03-20-parquet-pruning.md
+++ b/content/blog/2025-03-20-parquet-pruning.md
@@ -49,7 +49,7 @@ The diagram below illustrates the [Parquet reading pipeline]
in DataFusion, high
[Parquet reading pipeline]:
https://docs.rs/datafusion/46.0.0/datafusion/datasource/physical_plan/parquet/source/struct.ParquetSource.html
-<img src="/blog/images/parquet-pruning/read-parquet.jpg" alt="Parquet pruning
pipeline in DataFusion" width="100%" class="img-responsive">
+<img src="/blog/images/parquet-pruning/read-parquet.jpg" alt="Parquet pruning
pipeline in DataFusion" width="100%" class="img-fluid">
#### Background: Parquet file structure
@@ -106,7 +106,7 @@ So far we have discussed techniques that prune the Parquet
file using only the m
Filter pushdown, also known as predicate pushdown or late materialization, is
a technique that prunes data during scanning, with filters being generated and
applied in the Parquet reader.
-<img src="/blog/images/parquet-pruning/filter-pushdown.jpg" alt="Filter
pushdown in DataFusion" width="100%" class="img-responsive">
+<img src="/blog/images/parquet-pruning/filter-pushdown.jpg" alt="Filter
pushdown in DataFusion" width="100%" class="img-fluid">
Unlike metadata-based pruning which works at the row group or page level,
filter pushdown operates at the row level, allowing DataFusion to filter out
individual rows that don't match the query predicates during the decoding
process.
diff --git a/content/blog/2025-03-21-parquet-pushdown.md
b/content/blog/2025-03-21-parquet-pushdown.md
index 395d59d..1da5f82 100644
--- a/content/blog/2025-03-21-parquet-pushdown.md
+++ b/content/blog/2025-03-21-parquet-pushdown.md
@@ -77,7 +77,7 @@ WHERE date_time > '2025-03-11' AND location = 'office';
```
<figure>
- <img src="/blog/images/parquet-pushdown/pushdown-vs-no-pushdown.jpg"
alt="Parquet pruning skips irrelevant files/row_groups, while filter pushdown
skips irrelevant rows. Without filter pushdown, all rows from location, val,
and date_time columns are decoded before `location='office'` is evaluated.
Filter pushdown is especially useful when the filter is selective, i.e.,
removes many rows." width="80%" class="img-responsive">
+ <img src="/blog/images/parquet-pushdown/pushdown-vs-no-pushdown.jpg"
alt="Parquet pruning skips irrelevant files/row_groups, while filter pushdown
skips irrelevant rows. Without filter pushdown, all rows from location, val,
and date_time columns are decoded before `location='office'` is evaluated.
Filter pushdown is especially useful when the filter is selective, i.e.,
removes many rows." width="80%" class="img-fluid">
<figcaption>
Parquet pruning skips irrelevant files/row_groups, while filter pushdown
skips irrelevant rows. Without filter pushdown, all rows from location, val,
and date_time columns are decoded before `location='office'` is evaluated.
Filter pushdown is especially useful when the filter is selective, i.e.,
removes many rows.
</figcaption>
@@ -102,7 +102,7 @@ At a high level, the Parquet reader first builds a filter
mask -- essentially a
Let's dig into details of [how filter pushdown is
implemented](https://github.com/apache/arrow-rs/blob/d5339f31a60a4bd8a4256e7120fe32603249d88e/parquet/src/arrow/async_reader/mod.rs#L618-L712)
in the current Rust Parquet reader implementation, illustrated in the
following figure.
<figure>
- <img src="/blog/images/parquet-pushdown/baseline-impl.jpg"
alt="Implementation of filter pushdown in Rust Parquet readers"
class="img-responsive" with="70%">
+ <img src="/blog/images/parquet-pushdown/baseline-impl.jpg"
alt="Implementation of filter pushdown in Rust Parquet readers"
class="img-fluid" with="70%">
<figcaption>
Implementation of filter pushdown in Rust Parquet readers -- the first
phase builds the filter mask, the second phase applies the filter mask to the
other columns
</figcaption>
@@ -170,7 +170,7 @@ This section describes my [<700 LOC PR (with lots of
comments and tests)](https:
<figure>
- <img src="/blog/images/parquet-pushdown/new-pipeline.jpg" alt="New decoding
pipeline, building filter mask and output columns are interleaved in a single
pass, allowing us to cache minimal pages for minimal amount of time"
width="80%" class="img-responsive">
+ <img src="/blog/images/parquet-pushdown/new-pipeline.jpg" alt="New decoding
pipeline, building filter mask and output columns are interleaved in a single
pass, allowing us to cache minimal pages for minimal amount of time"
width="80%" class="img-fluid">
<figcaption>
New decoding pipeline, building filter mask and output columns are
interleaved in a single pass, allowing us to cache minimal pages for minimal
amount of time
</figcaption>
@@ -213,7 +213,7 @@ Parquet by default encodes data using [dictionary
encoding](https://parquet.apac
You can see this in action using
[parquet-viewer](https://parquet-viewer.xiangpeng.systems):
<figure>
- <img src="/blog/images/parquet-pushdown/parquet-viewer.jpg" alt="Parquet
viewer shows the page layout of a column chunk" width="80%"
class="img-responsive">
+ <img src="/blog/images/parquet-pushdown/parquet-viewer.jpg" alt="Parquet
viewer shows the page layout of a column chunk" width="80%" class="img-fluid">
<figcaption>
Parquet viewer shows the page layout of a column chunk
</figcaption>
@@ -225,7 +225,7 @@ This is why it caches 2 pages per column: one dictionary
page and one data page.
The data page slot will move forward as it reads the data; but the dictionary
page slot always references the first page.
<figure>
- <img src="/blog/images/parquet-pushdown/cached-pages.jpg" alt="Cached two
pages, one for dictionary (pinned), one for data (moves as it reads the data)"
width="80%" class="img-responsive">
+ <img src="/blog/images/parquet-pushdown/cached-pages.jpg" alt="Cached two
pages, one for dictionary (pinned), one for data (moves as it reads the data)"
width="80%" class="img-fluid">
<figcaption>
Cached two pages, one for dictionary (pinned), one for data (moves as it
reads the data)
</figcaption>
diff --git a/content/blog/2025-03-24-datafusion-46.0.0.md
b/content/blog/2025-03-24-datafusion-46.0.0.md
index 71ef758..8cae112 100644
--- a/content/blog/2025-03-24-datafusion-46.0.0.md
+++ b/content/blog/2025-03-24-datafusion-46.0.0.md
@@ -59,7 +59,7 @@ DataFusion 46.0.0 introduces a new [**SQL Diagnostics
framework**](https://gith
For example, if you reference an unknown table or miss a column in `GROUP BY`
the error message will include the query snippet causing the error. These
diagnostics are meant for end-users of applications built on DataFusion,
providing clearer messages instead of generic errors. Here’s an example:
-<img src="/blog/images/datafusion-46.0.0/diagnostic-example.png"
alt="diagnostic-example" width="80%" class="img-responsive">
+<img src="/blog/images/datafusion-46.0.0/diagnostic-example.png"
alt="diagnostic-example" width="80%" class="img-fluid">
Currently, diagnostics cover unresolved table/column references, missing
`GROUP BY` columns, ambiguous references, wrong number of UNION columns, type
mismatches, and a few others. Future releases will extend this to more error
types. This feature should greatly ease debugging of complex SQL by pinpointing
errors directly in the query text. We thank
[@eliaperantoni](https://github.com/eliaperantoni) for his contributions in
this project.
diff --git a/content/blog/2025-03-30-datafusion-python-46.0.0.md
b/content/blog/2025-03-30-datafusion-python-46.0.0.md
index 357aa8a..f854621 100644
--- a/content/blog/2025-03-30-datafusion-python-46.0.0.md
+++ b/content/blog/2025-03-30-datafusion-python-46.0.0.md
@@ -181,7 +181,7 @@ expandable text and scroll bars.
<img
src="/blog/images/python-datafusion-46.0.0/html_rendering.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Fig 1: Example html rendering in a jupyter notebook."
>
<figcaption>
diff --git a/content/blog/2025-04-10-fastest-tpch-generator.md
b/content/blog/2025-04-10-fastest-tpch-generator.md
index 639d434..1718221 100644
--- a/content/blog/2025-04-10-fastest-tpch-generator.md
+++ b/content/blog/2025-04-10-fastest-tpch-generator.md
@@ -48,7 +48,7 @@ which takes 30 minutes<sup>1</sup> (0.05GB/sec). On the same
machine, it takes l
It is finally convenient and efficient to run TPC-H queries locally when
testing
analytical engines such as DataFusion.
-<img src="/blog/images/fastest-tpch-generator/parquet-performance.png"
alt="Time to create TPC-H parquet dataset for Scale Factor 1, 10, 100 and
1000" width="80%" class="img-responsive">
+<img src="/blog/images/fastest-tpch-generator/parquet-performance.png"
alt="Time to create TPC-H parquet dataset for Scale Factor 1, 10, 100 and
1000" width="80%" class="img-fluid">
**Figure 1**: Time to create TPC-H dataset for Scale Factor (see below) 1, 10,
100 and 1000 as 8 individual SNAPPY compressed parquet files using a 22 core
GCP
@@ -206,7 +206,7 @@ load the data, using `dbgen`, which is not ideal for
several reasons:
[here is how to do so]:
https://github.com/apache/datafusion/blob/507f6b6773deac69dd9d90dbe60831f5ea5abed1/datafusion/sqllogictest/test_files/tpch/create_tables.slt.part#L24-L124
-<img src="/blog/images/fastest-tpch-generator/tbl-performance.png" alt="Time
to generate TPC-H data in TBL format" width="80%" class="img-responsive">
+<img src="/blog/images/fastest-tpch-generator/tbl-performance.png" alt="Time
to generate TPC-H data in TBL format" width="80%" class="img-fluid">
**Figure 3**: Time to generate TPC-H data in TBL format. `tpchgen` is
shown in blue. `tpchgen` restricted to a single core is shown in red.
Unmodified
@@ -266,7 +266,7 @@ strings.
[unsafe]:
https://github.com/search?q=repo%3Aclflushopt%2Ftpchgen-rs%20unsafe&type=code
[skip]:
https://github.com/clflushopt/tpchgen-rs/blob/c651da1fc309f9cb3872cbdf71e4796904dc62c6/tpchgen/src/text.rs#L72
-<img src="/blog/images/fastest-tpch-generator/lamb-theory.png" alt="Lamb
Theory on Evolution of Systems Languages" width="80%" class="img-responsive">
+<img src="/blog/images/fastest-tpch-generator/lamb-theory.png" alt="Lamb
Theory on Evolution of Systems Languages" width="80%" class="img-fluid">
**Figure 4**: Lamb Theory of System Language Evolution from [Boston University
MiDAS Fall 2024 (Data Systems Seminar)] [slides(pdf)], [recording]. Special
diff --git a/content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md
b/content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md
index a389fe9..7ac4aa8 100644
--- a/content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md
+++ b/content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md
@@ -79,7 +79,7 @@ language—it describes what answers are desired rather than an
*imperative*
language such as Python, where you describe how to do the computation as shown
in Figure 1.
-<img src="/blog/images/optimizing-sql-dataframes/query-execution.png"
width="80%" class="img-responsive" alt="Fig 1: Query Execution."/>
+<img src="/blog/images/optimizing-sql-dataframes/query-execution.png"
width="80%" class="img-fluid" alt="Fig 1: Query Execution."/>
**Figure 1**: Query Execution: Users describe the answer they want using either
SQL or a DataFrame. For SQL, a Query Planner translates the parsed query
@@ -112,7 +112,7 @@ modern APIs such as [Polars' lazy API], [Apache Spark's
DataFrame]. and
This section motivates the value of a Query Optimizer with an example. Let’s
say
you have some observations of animal behavior, as illustrated in Table 1.
-<img src="/blog/images/optimizing-sql-dataframes/table1.png" width="75%"
class="img-responsive" alt="Table 1: Observational Data."/>
+<img src="/blog/images/optimizing-sql-dataframes/table1.png" width="75%"
class="img-fluid" alt="Table 1: Observational Data."/>
**Table 1**: Example observational data.
@@ -148,7 +148,7 @@ Figure 2.
[LogicalPlan]:
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html
[this DataFusion overview video]: https://youtu.be/EzZTLiSJnhY
-<img src="/blog/images/optimizing-sql-dataframes/initial-logical-plan.png"
width="72%" class="img-responsive" alt="Fig 2: Initial Logical Plan."/>
+<img src="/blog/images/optimizing-sql-dataframes/initial-logical-plan.png"
width="72%" class="img-fluid" alt="Fig 2: Initial Logical Plan."/>
**Figure 2**: Example initial `LogicalPlan` for SQL and DataFrame query. The
plan is read from bottom to top, computing the results in each step.
@@ -157,7 +157,7 @@ The optimizer's job is to take this query plan and rewrite
it into an alternate
plan that computes the same results but faster, such as the one shown in Figure
3.
-<img src="/blog/images/optimizing-sql-dataframes/optimized-logical-plan.png"
width="80%" class="img-responsive" alt="Fig 3: Optimized Logical Plan."/>
+<img src="/blog/images/optimizing-sql-dataframes/optimized-logical-plan.png"
width="80%" class="img-fluid" alt="Fig 3: Optimized Logical Plan."/>
**Figure 3**: An example optimized plan that computes the same result as the
plan in Figure 2 more efficiently. The diagram highlights where the optimizer
@@ -184,7 +184,7 @@ A multi-pass design is standard because it helps:
1. Understand, implement, and test each pass in isolation
2. Easily extend the optimizer by adding new passes
-<img src="/blog/images/optimizing-sql-dataframes/optimizer-passes.png"
width="80%" class="img-responsive" alt="Fig 4: Query Optimizer Passes."/>
+<img src="/blog/images/optimizing-sql-dataframes/optimizer-passes.png"
width="80%" class="img-fluid" alt="Fig 4: Query Optimizer Passes."/>
**Figure 4**: Query Optimizers are implemented as a series of rules that each
rewrite the query plan. Each rule’s algorithm is expressed as a transformation
diff --git a/content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md
b/content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md
index 195d115..3dd27b2 100644
--- a/content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md
+++ b/content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md
@@ -95,7 +95,7 @@ Optimizers will evaluate the filter before the aggregation.
[evaluated after]: https://www.datacamp.com/tutorial/sql-order-of-execution
-<img src="/blog/images/optimizing-sql-dataframes/filter-pushdown.png"
width="80%" class="img-responsive" alt="Fig 1: Filter Pushdown."/>
+<img src="/blog/images/optimizing-sql-dataframes/filter-pushdown.png"
width="80%" class="img-fluid" alt="Fig 1: Filter Pushdown."/>
**Figure 1**: Filter Pushdown. In (**A**) without filter pushdown, the
operator
processes more rows, reducing efficiency. In (**B**) with filter pushdown, the
@@ -129,7 +129,7 @@ each column in each row must be parsed even if it is not
used in the plan.
[Apache Parquet]: https://parquet.apache.org/
[especially powerful in combination with filter pushdown]:
https://blog.xiangpeng.systems/posts/parquet-pushdown/
-<img src="/blog/images/optimizing-sql-dataframes/projection-pushdown.png"
width="80%" class="img-responsive" alt="Fig 2: Projection Pushdown."/>
+<img src="/blog/images/optimizing-sql-dataframes/projection-pushdown.png"
width="80%" class="img-fluid" alt="Fig 2: Projection Pushdown."/>
**Figure 2:** In (**A**) without projection pushdown, the operator receives
more
columns, reducing efficiency. In (**B**) with projection pushdown, the operator
@@ -156,7 +156,7 @@ opening additional files once the limit has been hit.
[TopK]:
https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
-<img src="/blog/images/optimizing-sql-dataframes/limit-pushdown.png"
width="80%" class="img-responsive" alt="Fig 3: Limit Pushdown."/>
+<img src="/blog/images/optimizing-sql-dataframes/limit-pushdown.png"
width="80%" class="img-fluid" alt="Fig 3: Limit Pushdown."/>
**Figure 3**: In (**A**), without limit pushdown all data is sorted and
everything except the first few rows are discarded. In (**B**), with limit
@@ -217,7 +217,7 @@ customer, but fills in the fields with `null`. All such
rows will be filtered
out by `customer.last_name = 'Lamb'`, and thus an INNER JOIN produces the same
answer. This is illustrated in Figure 4.
-<img src="/blog/images/optimizing-sql-dataframes/join-rewrite.png" width="80%"
class="img-responsive" alt="Fig 4: Join Rewrite."/>
+<img src="/blog/images/optimizing-sql-dataframes/join-rewrite.png" width="80%"
class="img-fluid" alt="Fig 4: Join Rewrite."/>
**Figure 4**: Rewriting `OUTER JOIN` to `INNER JOIN`. In (A) the original query
contains an `OUTER JOIN` but also a filter on `customer.last_name`, which
@@ -326,7 +326,7 @@ ORDER BY time_chunk
```
-<img
src="/blog/images/optimizing-sql-dataframes/common-subexpression-elimination.png"
width="80%" class="img-responsive" alt="Fig 5: Common Subquery Elimination."/>
+<img
src="/blog/images/optimizing-sql-dataframes/common-subexpression-elimination.png"
width="80%" class="img-fluid" alt="Fig 5: Common Subquery Elimination."/>
**Figure 5:** Adding a Projection to evaluate common complex sub expression
decreases complexity for later stages.
@@ -349,7 +349,7 @@ group keys or a `MergeJoin`
[source]:
https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
-<img src="/blog/images/optimizing-sql-dataframes/specialized-grouping.png"
width="80%" class="img-responsive" alt="Fig 6: Specialized Grouping."/>
+<img src="/blog/images/optimizing-sql-dataframes/specialized-grouping.png"
width="80%" class="img-fluid" alt="Fig 6: Specialized Grouping."/>
**Figure 6: **An example of specialized operation for grouping. In (**A**),
input data has no specified ordering and DataFusion uses a hashing-based
grouping operator
([source](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/row_hash.rs))
to determine distinct groups. In (**B**), when the input data is ordered by
the group keys, DataFusion uses a specialized grouping operator
([source](https://github.com/apache/datafusion/tree/main/datafusion/physic [...]
@@ -371,7 +371,7 @@ and statistics are commonly stored in analytic file
formats. For example, the
[Metadata]: https://docs.rs/parquet/latest/parquet/file/metadata/index.html
-<img src="/blog/images/optimizing-sql-dataframes/using-statistics.png"
width="80%" class="img-responsive" alt="Fig 7: Using Statistics."/>
+<img src="/blog/images/optimizing-sql-dataframes/using-statistics.png"
width="80%" class="img-fluid" alt="Fig 7: Using Statistics."/>
**Figure 7: **When the aggregation result is already stored in the statistics,
the query can be evaluated using the values from statistics without looking at
@@ -392,7 +392,7 @@ potentially (very) different performance. The major options
in this category are
[Materialized View]: https://en.wikipedia.org/wiki/Materialized_view
-<img
src="/blog/images/optimizing-sql-dataframes/access-path-and-join-order.png"
width="80%" class="img-responsive" alt="Fig 8: Access Path and Join Order."/>
+<img
src="/blog/images/optimizing-sql-dataframes/access-path-and-join-order.png"
width="80%" class="img-fluid" alt="Fig 8: Access Path and Join Order."/>
**Figure 8:** Access Path and Join Order Selection in Query Optimizers.
Optimizers use heuristics to enumerate some subset of potential join orders
(shape) and access paths (color). The plan with the smallest estimated cost
according to some cost model is chosen. In this case, Plan 2 with a cost of
180,000 is chosen for execution as it has the lowest estimated cost.
diff --git a/content/blog/2025-06-30-cancellation.md
b/content/blog/2025-06-30-cancellation.md
index 244019b..76ee19f 100644
--- a/content/blog/2025-06-30-cancellation.md
+++ b/content/blog/2025-06-30-cancellation.md
@@ -359,7 +359,7 @@ To illustrate what this process looks like, let's have a
look at the execution o
If we assume a task budget of 1 unit, each time Tokio schedules the task would
result in the following sequence of function calls.
<figure>
-<img src="/blog/images/task-cancellation/tokio_budget.png" style="width: 100%;
max-width: 100%" class="img-responsive" alt="Sequence diagram showing how the
tokio task budget is used and reset."
+<img src="/blog/images/task-cancellation/tokio_budget.png" style="width: 100%;
max-width: 100%" class="img-fluid" alt="Sequence diagram showing how the tokio
task budget is used and reset."
/>
<figcaption>Tokio task budget system, assuming the task budget is set to 1,
for the plan above.</figcaption>
</figure>
diff --git a/content/blog/2025-07-01-datafusion-comet-0.9.0.md
b/content/blog/2025-07-01-datafusion-comet-0.9.0.md
index cd3f24b..4c73ebd 100644
--- a/content/blog/2025-07-01-datafusion-comet-0.9.0.md
+++ b/content/blog/2025-07-01-datafusion-comet-0.9.0.md
@@ -144,7 +144,7 @@ Comet now provides a tracing feature for analyzing
performance and off-heap vers
<img
src="/blog/images/comet-0.9.0/tracing.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Comet Tracing"
/>
diff --git a/content/blog/2025-07-11-datafusion-47.0.0.md
b/content/blog/2025-07-11-datafusion-47.0.0.md
index 27c3221..2cf56ca 100644
--- a/content/blog/2025-07-11-datafusion-47.0.0.md
+++ b/content/blog/2025-07-11-datafusion-47.0.0.md
@@ -198,7 +198,7 @@ use pre-integrated community crates such as the
[datafusion-tracing] crate.
<img
src="/blog/images/datafusion-47.0.0/datafusion-telemetry.png"
width="50%"
- class="img-responsive"
+ class="img-fluid"
alt="DataFusion telemetry project logo"
/>
</a>
diff --git a/content/blog/2025-07-14-user-defined-parquet-indexes.md
b/content/blog/2025-07-14-user-defined-parquet-indexes.md
index e2f1452..7f4fd08 100644
--- a/content/blog/2025-07-14-user-defined-parquet-indexes.md
+++ b/content/blog/2025-07-14-user-defined-parquet-indexes.md
@@ -88,7 +88,7 @@ The Parquet format includes three main
types<sup>[2](#footnote2)</sup> of option
<!-- Source:
https://docs.google.com/presentation/d/1aFjTLEDJyDqzFZHgcmRxecCvLKKXV2OvyEpTQFCNZPw
-->
-<img
src="/blog/images/user-defined-parquet-indexes/standard_index_structures.png"
width="80%" class="img-responsive" alt="Parquet File layout with standard index
structures."/>
+<img
src="/blog/images/user-defined-parquet-indexes/standard_index_structures.png"
width="80%" class="img-fluid" alt="Parquet File layout with standard index
structures."/>
**Figure 1**: Parquet file layout with standard index structures (as written
by arrow-rs).
@@ -116,7 +116,7 @@ Figure 2 shows the resulting file layout.
<!-- Source:
https://docs.google.com/presentation/d/1aFjTLEDJyDqzFZHgcmRxecCvLKKXV2OvyEpTQFCNZPw
-->
-<img
src="/blog/images/user-defined-parquet-indexes/custom_index_structures.png"
width="80%" class="img-responsive" alt="Parquet File layout with custom index
structures."/>
+<img
src="/blog/images/user-defined-parquet-indexes/custom_index_structures.png"
width="80%" class="img-fluid" alt="Parquet File layout with custom index
structures."/>
**Figure 2**: Parquet file layout with user-defined indexes.
diff --git a/content/blog/2025-07-28-datafusion-49.0.0.md
b/content/blog/2025-07-28-datafusion-49.0.0.md
index 9632230..a2148f7 100644
--- a/content/blog/2025-07-28-datafusion-49.0.0.md
+++ b/content/blog/2025-07-28-datafusion-49.0.0.md
@@ -46,7 +46,7 @@ DataFusion continues to focus on enhancing performance, as
shown in the ClickBen
<img
src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="ClickBench performance results over time for DataFusion"
/>
@@ -61,7 +61,7 @@ NOTE: Andrew is working on gathering these numbers
<img
src="/blog/images/datafusion-49.0.0/performance_over_time_planning.png"
width="80%"
-class="img-responsive"
+class="img-fluid"
alt="Planning benchmark performance results over time for DataFusion"
/>
diff --git a/content/blog/2025-08-15-external-parquet-indexes.md
b/content/blog/2025-08-15-external-parquet-indexes.md
index 53002cc..58566a3 100644
--- a/content/blog/2025-08-15-external-parquet-indexes.md
+++ b/content/blog/2025-08-15-external-parquet-indexes.md
@@ -80,7 +80,7 @@ needs<sup>[1](#footnote1)</sup>.
<img
src="/blog/images/external-parquet-indexes/external-index-overview.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Using External Indexes to Accelerate Queries"
/>
</div>
@@ -211,7 +211,7 @@ The standard approach is shown in Figure 2:
<img
src="/blog/images/external-parquet-indexes/processing-pipeline.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Standard Pruning Layers."
/>
</div>
@@ -245,7 +245,7 @@ shown below.
<img
src="/blog/images/external-parquet-indexes/parquet-layout.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Logical Parquet File layout: Row Groups and Column Chunks."
/>
</div>
@@ -262,7 +262,7 @@ stored at the end of the file (in the footer), as shown
below.
<img
src="/blog/images/external-parquet-indexes/parquet-metadata.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Physical Parquet File layout: Metadata and Footer."
/>
</div>
@@ -289,7 +289,7 @@ The high level mechanics of Parquet predicate pushdown is
shown below:
<img
src="/blog/images/external-parquet-indexes/parquet-filter-pushdown.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Parquet Filter Pushdown: use filter predicate to skip pages."
/>
</div>
@@ -326,7 +326,7 @@ most recent 7 days.
<img
src="/blog/images/external-parquet-indexes/prune-files.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Data Skipping: Pruning Files."
/>
</div>
@@ -471,7 +471,7 @@ indexes for filtering *WITHIN* Parquet files as shown below.
<img
src="/blog/images/external-parquet-indexes/prune-row-groups.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Data Skipping: Pruning Row Groups and DataPages"
/>
@@ -724,7 +724,7 @@ Come Join Us! 🎣
<img
src="/blog/images/logo_original4x.png"
width="20%"
- class="img-responsive"
+ class="img-fluid"
alt="https://datafusion.apache.org/"
/>
</a>
diff --git a/content/blog/2025-09-10-dynamic-filters.md
b/content/blog/2025-09-10-dynamic-filters.md
index 84293a9..ddf7789 100644
--- a/content/blog/2025-09-10-dynamic-filters.md
+++ b/content/blog/2025-09-10-dynamic-filters.md
@@ -70,7 +70,7 @@ SELECT * FROM hits WHERE "URL" LIKE '%google%' ORDER BY
"EventTime" LIMIT 10;
<img
src="/blog/images/dynamic-filters/execution-time.svg"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Q23 Performance Improvement with Dynamic Filters and Late
Materialization"
/>
</div>
@@ -105,7 +105,7 @@ A straightforward, though slow, plan to answer this query
is shown in Figure 2.
<img
src="/blog/images/dynamic-filters/query-plan-naive.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Naive Query Plan"
/>
</div>
@@ -132,7 +132,7 @@ DuckDB]. The plan for Q23 using this specialized operator
is shown in Figure 3.
<img
src="/blog/images/dynamic-filters/query-plan-topk.png"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="TopK Query Plan"
/>
</div>
@@ -161,7 +161,7 @@ of files. The plan for Q23 with dynamic filters is shown in
Figure 4.
<img
src="/blog/images/dynamic-filters/query-plan-topk-dynamic-filters.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="TopK Query Plan with Dynamic Filters"
/>
</div>
@@ -372,7 +372,7 @@ other optimizations as shown in Figure 7.
<img
src="/blog/images/dynamic-filters/join-performance.svg"
width="80%"
- class="img-responsive"
+ class="img-fluid"
alt="Join Performance Improvements with Dynamic Filters"
/>
</div>
diff --git a/content/blog/2025-09-21-custom-types-using-metadata.md
b/content/blog/2025-09-21-custom-types-using-metadata.md
index b835a5b..65142b0 100644
--- a/content/blog/2025-09-21-custom-types-using-metadata.md
+++ b/content/blog/2025-09-21-custom-types-using-metadata.md
@@ -86,7 +86,7 @@ implementation, during processing of all user defined
functions we pass the inpu
field information.
<figure>
- <img src="/blog/images/metadata-handling/arrow_record_batch.png"
alt="Relationship between a Record Batch, it's schema, and the underlying
arrays. There is a one to one relationship between each Field in the Schema and
Array entry in the Columns." width="100%" class="img-responsive">
+ <img src="/blog/images/metadata-handling/arrow_record_batch.png"
alt="Relationship between a Record Batch, it's schema, and the underlying
arrays. There is a one to one relationship between each Field in the Schema and
Array entry in the Columns." width="100%" class="img-fluid">
<figcaption>
<b>Figure 1:</b> Relationship between a Record Batch, it's schema, and the
underlying arrays. There is a one to one relationship between each Field in the
Schema and Array entry in the Columns.
</figcaption>
diff --git a/content/blog/2025-09-29-datafusion-50.0.0.md
b/content/blog/2025-09-29-datafusion-50.0.0.md
index e8548b7..5582f3c 100644
--- a/content/blog/2025-09-29-datafusion-50.0.0.md
+++ b/content/blog/2025-09-29-datafusion-50.0.0.md
@@ -48,7 +48,7 @@ DataFusion continues to focus on enhancing performance, as
shown in ClickBench
and other benchmark results.
<img src="/blog/images/datafusion-50.0.0/performance_over_time_clickbench.png"
- width="100%" class="img-responsive" alt="ClickBench performance results over
time for DataFusion" />
+ width="100%" class="img-fluid" alt="ClickBench performance results over time
for DataFusion" />
**Figure 1**: Average and median normalized query execution times for
ClickBench queries for each git revision.
Query times are normalized using the ClickBench definition. See the
diff --git a/content/blog/2025-10-21-datafusion-comet-0.11.0.md
b/content/blog/2025-10-21-datafusion-comet-0.11.0.md
index dd22a08..1991ee2 100644
--- a/content/blog/2025-10-21-datafusion-comet-0.11.0.md
+++ b/content/blog/2025-10-21-datafusion-comet-0.11.0.md
@@ -109,7 +109,7 @@ Comet 0.11.0 continues to deliver significant performance
improvements over Spar
<img
src="/blog/images/comet-0.11.0/tpch_allqueries.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="TPC-H Overall Performance"
/>
@@ -118,7 +118,7 @@ The performance gains are consistent across individual
queries, with most querie
<img
src="/blog/images/comet-0.11.0/tpch_queries_compare.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="TPC-H Query-by-Query Comparison"
/>
diff --git a/content/blog/2025-11-25-datafusion-51.0.0.md
b/content/blog/2025-11-25-datafusion-51.0.0.md
index 58a23aa..2cb8bcb 100644
--- a/content/blog/2025-11-25-datafusion-51.0.0.md
+++ b/content/blog/2025-11-25-datafusion-51.0.0.md
@@ -46,7 +46,7 @@ the core engine and in the Parquet reader.
<img
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"
-class="img-responsive"
+class="img-fluid"
alt="Performance over time"
/>
@@ -91,7 +91,7 @@ where startup time or low latency is important. You can read
more about the upst
<img
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Metadata Parsing Performance Improvements in Arrow/Parquet 57"
/>
diff --git a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
index c3d3c4d..7b1d8ad 100644
--- a/content/blog/2025-12-15-avoid-consecutive-repartitions.md
+++ b/content/blog/2025-12-15-avoid-consecutive-repartitions.md
@@ -37,7 +37,7 @@ Starting a journey learning about database internals can be
daunting. With so ma
<img
src="/blog/images/avoid-consecutive-repartitions/database_system_diagram.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Database System Components"
/>
</div>
@@ -81,7 +81,7 @@ This will give you familiarity with the codebase and using
your tools, like your
<img
src="/blog/images/avoid-consecutive-repartitions/noot_noot_database_meme.png"
width="50%"
- class="img-responsive"
+ class="img-fluid"
alt="Noot Noot Database Meme"
/>
</div>
@@ -109,7 +109,7 @@ DataFusion implements a vectorized <a
href="https://dl.acm.org/doi/10.1145/93605
<img
src="/blog/images/avoid-consecutive-repartitions/volcano_model_diagram.png"
width="60%"
- class="img-responsive"
+ class="img-fluid"
alt="Vectorized Volcano Model Example"
/>
</div>
@@ -134,7 +134,7 @@ Round-robin repartitioning is useful when the data grouping
isn't known or when
<img
src="/blog/images/avoid-consecutive-repartitions/round_robin_repartitioning.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Round-Robin Repartitioning"
/>
</div>
@@ -154,7 +154,7 @@ Hash repartitioning is useful when working with grouped
data. Imagine you have a
<img
src="/blog/images/avoid-consecutive-repartitions/hash_repartitioning.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Hash Repartitioning"
/>
</div>
@@ -166,7 +166,7 @@ Note, the benefit of hash opposed to round-robin
partitioning in this scenario.
<img
src="/blog/images/avoid-consecutive-repartitions/hash_repartitioning_example.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Hash Repartitioning Example"
/>
</div>
@@ -187,7 +187,7 @@ SELECT a, SUM(b) FROM data.parquet GROUP BY a;
<img
src="/blog/images/avoid-consecutive-repartitions/basic_before_query_plan.png"
width="65%"
- class="img-responsive"
+ class="img-fluid"
alt="Consecutive Repartition Query Plan"
/>
</div>
@@ -204,7 +204,7 @@ Why is this such a big deal? Well, repartitions do not
process the data; their p
<img
src="/blog/images/avoid-consecutive-repartitions/in_depth_before_query_plan.png"
width="65%"
- class="img-responsive"
+ class="img-fluid"
alt="Consecutive Repartition Query Plan With Data"
/>
</div>
@@ -219,7 +219,7 @@ Optimally the plan should do one of two things:
<img
src="/blog/images/avoid-consecutive-repartitions/optimal_query_plans.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Optimal Query Plans"
/>
</div>
@@ -294,7 +294,7 @@ This logic takes place in the main loop of this rule. I
find it helpful to draw
<img
src="/blog/images/avoid-consecutive-repartitions/logic_tree_before.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Incorrect Logic Tree"
/>
</div>
@@ -321,7 +321,7 @@ The new logic tree looks like this:
<img
src="/blog/images/avoid-consecutive-repartitions/logic_tree_after.png"
width="100%"
- class="img-responsive"
+ class="img-fluid"
alt="Correct Logic Tree"
/>
</div>
@@ -388,7 +388,7 @@ For the benchmarking standard, TPCH, speedups were small
but consistent:
<img
src="/blog/images/avoid-consecutive-repartitions/tpch_benchmark.png"
width="60%"
- class="img-responsive"
+ class="img-fluid"
alt="TPCH Benchmark Results"
/>
</div>
@@ -400,7 +400,7 @@ For the benchmarking standard, TPCH, speedups were small
but consistent:
<img
src="/blog/images/avoid-consecutive-repartitions/tpch10_benchmark.png"
width="60%"
- class="img-responsive"
+ class="img-fluid"
alt="TPCH10 Benchmark Results"
/>
</div>
diff --git a/content/blog/2026-01-12-extending-sql.md
b/content/blog/2026-01-12-extending-sql.md
index d4d9e7a..374842e 100644
--- a/content/blog/2026-01-12-extending-sql.md
+++ b/content/blog/2026-01-12-extending-sql.md
@@ -76,7 +76,7 @@ DataFusion turns SQL into executable work in stages:
Each stage has extension points.
<figure>
- <img src="/blog/images/extending-sql/architecture.svg" alt="DataFusion SQL
processing pipeline: SQL String flows through Parser to AST, then SqlToRel
(with Extension Planners) to LogicalPlan, then PhysicalPlanner to
ExecutionPlan" width="100%" class="img-responsive">
+ <img src="/blog/images/extending-sql/architecture.svg" alt="DataFusion SQL
processing pipeline: SQL String flows through Parser to AST, then SqlToRel
(with Extension Planners) to LogicalPlan, then PhysicalPlanner to
ExecutionPlan" width="100%" class="img-fluid">
<figcaption>
<b>Figure 1:</b> SQL flows through three stages: parsing, logical planning
(via <code>SqlToRel</code>, where the Extension Planners hook in), and physical
planning. Each stage has extension points: wrap the parser, implement planner
traits, or add physical operators.
</figcaption>
diff --git a/content/blog/2026-02-02-datafusion_case.md
b/content/blog/2026-02-02-datafusion_case.md
index 2f133bd..f8dee65 100644
--- a/content/blog/2026-02-02-datafusion_case.md
+++ b/content/blog/2026-02-02-datafusion_case.md
@@ -164,7 +164,7 @@ END
Schematically, it will look as follows:
<figure>
-<img src="/blog/images/case/original_loop.svg" alt="Schematic representation
of data flow in the original CASE implementation" width="100%"
class="img-responsive">
+<img src="/blog/images/case/original_loop.svg" alt="Schematic representation
of data flow in the original CASE implementation" width="100%"
class="img-fluid">
<figcaption>One iteration of the `CASE` evaluation loop</figcaption>
</figure>
@@ -192,7 +192,7 @@ pub trait PhysicalExpr {
Going back to the same example as before, the data flow in
`evaluate_selection` looks like this:
<figure>
-<img src="/blog/images/case/evaluate_selection.svg" alt="Schematic
representation of `evaluate_selection` evaluation" width="100%"
class="img-responsive">
+<img src="/blog/images/case/evaluate_selection.svg" alt="Schematic
representation of `evaluate_selection` evaluation" width="100%"
class="img-fluid">
<figcaption>evaluate_selection data flow</figcaption>
</figure>
@@ -279,7 +279,7 @@ The second optimization fundamentally restructures how the
results of each loop
The diagram below illustrates the optimized data flow when evaluating the
`CASE WHEN col = 'b' THEN 100 ELSE 200 END` from before:
<figure>
-<img src="/blog/images/case/merging.svg" alt="Schematic representation of
optimized evaluation loop" width="100%" class="img-responsive">
+<img src="/blog/images/case/merging.svg" alt="Schematic representation of
optimized evaluation loop" width="100%" class="img-fluid">
<figcaption>optimized evaluation loop</figcaption>
</figure>
@@ -299,7 +299,7 @@ The diagram below illustrates how `merge_n` works for an
example where three `WH
The first branch produced the result `A` for row 2, the second produced `B`
for row 1, and the third produced `C` and `D` for rows 4 and 5.
<figure>
-<img src="/blog/images/case/merge_n.svg" alt="Schematic illustration of the
merge_n algorithm" width="100%" class="img-responsive">
+<img src="/blog/images/case/merge_n.svg" alt="Schematic illustration of the
merge_n algorithm" width="100%" class="img-fluid">
<figcaption>merge_n example</figcaption>
</figure>
@@ -329,7 +329,7 @@ FROM mailing_address
You can see that the `CASE` expression only references the columns `country`
and `state`, but because all columns are being queried, projection pushdown
cannot reduce the number of columns being fed in to the projection operator.
<figure>
-<img src="/blog/images/case/no_projection.svg" alt="Schematic illustration of
CASE evaluation without projection" width="100%" class="img-responsive">
+<img src="/blog/images/case/no_projection.svg" alt="Schematic illustration of
CASE evaluation without projection" width="100%" class="img-fluid">
<figcaption>CASE evaluation without projection</figcaption>
</figure>
@@ -339,7 +339,7 @@ As the diagram above shows, this filtering creates a
reduced copy of all columns
This unnecessary copying can be avoided by first narrowing the batch to only
include the columns that are actually needed.
<figure>
-<img src="/blog/images/case/projection.svg" alt="Schematic illustration of
CASE evaluation with projection" width="100%" class="img-responsive">
+<img src="/blog/images/case/projection.svg" alt="Schematic illustration of
CASE evaluation with projection" width="100%" class="img-fluid">
<figcaption>CASE evaluation with projection</figcaption>
</figure>
@@ -378,7 +378,7 @@ In contrast to `zip`, `merge` does not require both of its
value inputs to have
Instead it requires that the sum of the length of the value inputs matches the
length of the mask array.
<figure>
-<img src="/blog/images/case/merge.svg" alt="Schematic illustration of the
merge algorithm" width="100%" class="img-responsive">
+<img src="/blog/images/case/merge.svg" alt="Schematic illustration of the
merge algorithm" width="100%" class="img-fluid">
<figcaption>merge example</figcaption>
</figure>
@@ -435,7 +435,7 @@ The green series shows the time measurement for the `SELECT
* FROM orders` to gi
All measurements were made with a target partition count of `1`.
<figure>
-<img src="/blog/images/case/results.png" alt="Performance measurements chart"
width="100%" class="img-responsive">
+<img src="/blog/images/case/results.png" alt="Performance measurements chart"
width="100%" class="img-fluid">
<figcaption>Performance measurements</figcaption>
</figure>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]