This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a0a163f6bc6f Revert "docs(blog): JD Hudi blog (#17791)" (#17792)
a0a163f6bc6f is described below
commit a0a163f6bc6f6bd0400b576fc16aaa3f183c2430
Author: Shiyan Xu <[email protected]>
AuthorDate: Tue Jan 6 20:08:15 2026 -0600
Revert "docs(blog): JD Hudi blog (#17791)" (#17792)
This reverts commit c383dfcc0d7dc63cf2d46c69c714e2e93285a032.
---
.../2026-01-06-jd-hudi-architecture-evolution.mdx | 238 ---------------------
.../01.webp | Bin 59248 -> 0 bytes
.../02.webp | Bin 17168 -> 0 bytes
.../03.webp | Bin 30114 -> 0 bytes
.../04.webp | Bin 25490 -> 0 bytes
.../05.webp | Bin 60474 -> 0 bytes
.../06.webp | Bin 21932 -> 0 bytes
.../07.webp | Bin 8514 -> 0 bytes
.../08.webp | Bin 44626 -> 0 bytes
.../09.webp | Bin 86788 -> 0 bytes
.../10.webp | Bin 68654 -> 0 bytes
.../11.webp | Bin 30960 -> 0 bytes
.../12.webp | Bin 38512 -> 0 bytes
13 files changed, 238 deletions(-)
diff --git a/website/blog/2026-01-06-jd-hudi-architecture-evolution.mdx
b/website/blog/2026-01-06-jd-hudi-architecture-evolution.mdx
deleted file mode 100644
index 93c40de5087f..000000000000
--- a/website/blog/2026-01-06-jd-hudi-architecture-evolution.mdx
+++ /dev/null
@@ -1,238 +0,0 @@
----
-title: "Apache Hudi's Latest Architecture Evolution at JD.com"
-excerpt: ""
-author: "Team at JD.com"
-category: blog
-image: /assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/01.webp
-tags:
- - hudi
- - meetup
- - lakehouse
- - community
----
-
----
-
-_This post, translated from the [original blog in
Chinese](https://mp.weixin.qq.com/s/35-i_gSrCLz1kYugQpoqQA), is based on
content shared at the JD.com & Hudi Meetup Asia event. It covers JD.com's data
lake status, in-house technology features, business practices, and community
contributions and future plans, demonstrating how JD.com leverages data lake
technology to drive near-real-time data asset transformation and enhance data
asset value._
-
----
-
-**Table of Contents:**
-
-1. JD.com Data Lake Status
-2. In-House Technology Features
-3. Business Practices
-4. Community Contributions and Future Plans
-
-## 01 JD.com Data Lake Status
-
-The JD.com data lake team has been working to enable near-real-time data
processing across the entire organization, helping JD Retail, JD Technology, JD
Logistics, and other subsidiaries improve data timeliness for core data assets.
The data lake now exceeds 500PB in scale, supported by robust underlying
technology and platform product capabilities.
-
-
-
-JD.com's data lake is built on top of HDFS and other distributed storage
systems, leveraging Hudi's high-performance read/write capabilities and file
organization to support near-real-time processing pipelines. It ingests data
from message queues, Binlog, Hive, and Hudi sources, supporting
high-performance streaming, batch, and OLAP queries. On top of this
infrastructure, the JD Big Data Platform provides full lifecycle product
capabilities—from table creation, data integration, data dev [...]
-
-JD-Hudi is based on community version 0.13.1, developed to meet JD.com's
specific business requirements. The team has implemented a series of in-house
optimizations across multiple key modules in the Hudi core, including the
organization protocol layer and I/O transport layer.
-
-### Organization Protocol Layer
-
-1. **Indexing**: The community provides Bucket Index, BloomFilter, and other
indexes. JD.com developed a partition-level Bucket Index that enables
differentiated bucket counts per partition, effectively solving partition skew
issues. Additionally, they innovated a foreign key index and implemented
streaming foreign key join capabilities based on it.
-
-2. **Merge Engine**: In version 0.13.1, PayLoad is the primary approach. The
community provides `OverwriteWithLatest`, `EventTimeBased`, `PartialUpdate`,
etc. JD.com developed Multiple Ts to support multi-field merge logic and
enhanced Partial Update. As the community now promotes RecordMerger, JD.com
will also align with this implementation going forward.
-
-3. **Table Format**: Both community CoW and MoR are in production at JD.com.
Community MoR uses a combination of base files and log files, with log files
stored in Avro row format. JD.com found performance bottlenecks in community
MoR and developed an LSM-Tree-based MoR with tiered management, where all log
files are stored in Parquet format, achieving 2-10x read/write performance
improvement.
-
-4. **Concurrency Control**: The community provides OCC and MVCC for
concurrency control. JD.com's LSM-Tree organization protocol achieves lock-free
concurrent updates.
-
-5. **File Layout**: JD.com developed a hybrid storage layout solution that
supports distributing a single table's data across multiple storage locations.
The buffer layer stores high-frequency writes like deltacommits on low-latency,
high-performance storage systems (HDFS or ChubaoFS). The persistent layer
receives data migrated from the buffer layer via table services. During reads,
the system combines buffer and persistent layers to generate a unified file
view.
-
-6. **Table Services**: JD.com implemented incremental table services on top of
the community version, effectively avoiding increased table service duration as
partition counts grow, significantly reducing write-side blocking time.
-
-### I/O Transport Layer
-
-The core optimization strategy for the I/O transport layer is reducing
serialization/deserialization overhead. JD.com developed binary stream copy
Clustering, bypassing row-column conversion between compute engines and Parquet
files for direct binary stream copying to achieve file aggregation. They use
Engine-Native optimization to work with compute engine data formats directly,
avoiding row format conversions. ZSTD compression reduces CPU overhead during
file compression.
-
-## 02 In-House Technology Features
-
-### 01. Hudi MoR LSM-Tree
-
-In JD.com's business practices, as data scale continues to grow and real-time
requirements intensify, Hudi MoR tables facing high-concurrency,
high-throughput real-time data processing have revealed several performance and
stability bottlenecks:
-
-1. **Write Performance Bottleneck**: During high-traffic data writes, MoR
tables require merging incremental data with base files and writing back,
making it difficult to balance low latency with high throughput concurrent
writes—a key constraint for real-time data ingestion.
-
-2. **Index Inefficiency**: When using Bucket Index with many buckets, each log
file write requires frequent, time-consuming List operations, severely
impacting write performance.
-
-3. **Query Efficiency Issues**: The underlying Avro row storage format cannot
effectively push down queries. Even accessing only a few columns requires
reading entire rows, generating significant unnecessary I/O and degrading query
efficiency.
-
-4. **Resource Contention**: In streaming tasks, write operations and
Compaction share resources, with contention often causing job failures. This
typically requires deploying Compaction as a separate job, increasing
operational complexity and cost.
-
-5. **Lack of Concurrent Updates**: Data backfills or fault recovery require
pausing real-time tasks, affecting business continuity. Table-level concurrent
write limitations force multiple processing tasks on the same table to execute
serially, causing cumulative data output delays that fail to meet
high-timeliness requirements.
-
-To address JD.com's real-time data business needs—especially the higher
demands for performance, stability, and concurrent processing in scenarios like
real-time data warehousing and multi-stream fusion during traffic peaks—the
JD.com data lake team upgraded Hudi's architecture based on LSM-Tree
principles, aiming to break through existing bottlenecks and more efficiently
support growing real-time data business requirements.
-
-#### Basic Design Principles
-
-LSM-Tree is a storage structure that optimizes read/write performance through
sequential writes and tiered merging. In the underlying Hudi transformation:
-
-Using Bucket Index as the foundation, files within each FileGroup are
organized into a two-level LSM-Tree. New data first enters L0, with two types
of compaction mechanisms improving query efficiency and reducing read-time
merge pressure:
-
-**Minor Compaction**: Merges small L0 files into larger L0 files, efficiently
controlling small file counts with high frequency and fast execution.
-
-**Major Compaction**: Periodically merges all files into a single L1 file,
maintaining global data consistency while reducing compaction frequency.
-
-
-
-During data flow, JD-Hudi uses compute engine native row formats (like Spark
InternalRow, Flink RowData), avoiding the overhead of Avro serialization and
deserialization. Additionally, incremental updates create new Parquet files
instead of appending to log files, avoiding the heavy file listing operations
before writes while supporting multi-task concurrent updates.
-
-### FileSystemView Construction Logic
-
-FileSystemView (FSView) is Hudi's metadata abstraction layer that provides a
unified query interface, hiding underlying file organization details so engines
can efficiently build query splits based on logical views—providing critical
support for snapshot isolation and data consistency. The transformed FSView
construction logic can be expressed as:
-
-```
-FSView at T = CommittedLogs - (Log.baseInstant>T) -
(ReplacedLogs.baseInstant<=T)
-```
-
-
-
-Using the multi-task concurrent update scenario shown in the diagram as an
example: all committed files include File 1-10 and File 15. At time T_i, files
committed after that time include File 15, while files committed before or at
that time that were replaced by Compaction operations are Files 1-5. Therefore,
the logical snapshot view at T_i can be calculated using the formula,
consisting of Files 6-10.
-
-### Write Process Design
-
-Taking Flink writes as an example, the write process mainly includes
**repartitioning, sorting, deduplication, and I/O**—four core steps.
-
-
-
-- **Unified Data Representation**: The entire data flow uses Flink's native
RowData format, reducing serialization and format conversion overhead.
-- **Balanced Data Distribution**: To address uneven Bucket distribution, a
Remote Partitioner mechanism based on a global view was proposed, achieving
dynamic load balancing for writes.
-- **Asynchronous Processing Architecture**: A Disruptor ring buffer was
introduced in the Sink operator, decoupling data production and consumption,
significantly improving processing performance and effectively handling
scenarios where production rate exceeds consumption.
-- **Efficient Memory Management**: Integration of Flink's built-in
MemorySegmentPool and BinaryInMemorySortBuffer enables fine-grained memory
management and efficient sorting, substantially reducing GC pressure and
sorting overhead.
-
-### Read Process Design
-
-In LSM-Tree structures, data records within underlying files are all sorted,
so efficient multi-way merge sort algorithms are key to query performance.
Min-heap and loser tree are the most commonly used data structures for merge
sorting. When the number of merge paths (K) is large, loser tree requires only
about `log₂K` comparisons per adjustment, while min-heap needs about
`2log₂K`—theoretically reducing comparisons by nearly half, with significant
advantages at larger data scales.
-
-To further improve merge efficiency, the JD.com data lake team adopted a
**state machine-based loser tree** implementation. By defining clear states for
each node (such as "ready", "selected out", etc.) and driving state transitions
during each comparison, while dynamically recording leaf node indexes for
identical primary keys, the benefits are:
-
-- **Avoid Repeated Adjustments**: When identical primary keys are detected,
index-based path reuse allows skipping tree structure re-traversal.
-- **Batch Merge Output**: In a single adjustment, all records with identical
primary keys in the current tree can be located and output, enabling merge
functions to complete aggregation in one pass.
-- **Zero-Copy Reference**: Direct index references to data positions avoid
object copying, substantially reducing memory and CPU overhead.
-
-This approach achieved approximately 15% read performance improvement in
actual testing, effectively supporting high-throughput, low-latency query
requirements under the LSM-Tree architecture.
-
-
-
-### Compaction Optimization
-
-Hudi Compaction is a core background service designed for MoR tables. Its main
function is merging row-oriented incremental log files with columnar base files
to generate new, more efficient base files. During Hudi-LSM Tree development,
two main optimizations were made to address existing performance and stability
bottlenecks in the scheduling phase:
-
-**Incremental Scheduling Optimization**
-
-The scheduling phase's main work is determining which partitions' file slices
will participate in compaction, then generating a plan (Compaction Plan) saved
to the Timeline.
-
-In practice, only partitions with new data need processing. However, the
original Compaction strategy scanned all partitions regardless of whether they
contained new data, causing unnecessary resource consumption and potential task
failures as partition counts increased. The LSM version introduced an
Incremental Compaction strategy with the following core workflow:
-
-
-
-First, the time period from the last completed Compaction to the current
moment is treated as an incremental window. Partitions committed via
deltacommit during this period have received incremental data. As shown in the
diagram, if scheduling occurs at T5, then (T1, T5) is the incremental window,
with Partition4 and Partition5 written at T2 and T3 needing inclusion in this
compaction plan.
-
-Second, attention must be paid to partitions from the previous scheduling that
weren't fully processed due to I/O limits—this information is recorded in the
Compaction plan's MissingPartitions. As shown, when scheduling at T5,
Partition3 recorded in the T1 plan must be considered.
-
-Additionally, in concurrent scenarios, data generated before the scheduling
moment but not yet committed must be considered—this is saved in the Compaction
plan's MissingInstants and should be included during scheduling. For example,
Partition1 from T0 recorded as uncommitted in the T1 plan, and at T5
scheduling, the uncommitted T4 moment is recorded in this plan.
-
-Finally, at T5 scheduling, all partitions participating in compaction include:
Partition1, Partition3, Partition4, and Partition5.
-
-**Flink Streaming Scheduling**
-
-The original scheduling mechanism was timed to execute after Commit operations
completed and before new Instant generation, with the entire process running on
the JobManager (JM) node. This pattern had two prominent drawbacks: first, it
blocked normal new Instant generation, causing data consumption pipeline
interruptions; second, if the scheduling phase encountered exceptions, the
entire task would fail.
-
-To address these pain points, the new streaming scheduling approach makes
critical optimizations: the core scheduling logic is extracted and encapsulated
as an independent operator deployed to TaskManager (TM) nodes for execution.
This architectural-level fundamental optimization completely overcomes the two
core pain points of the original pattern. The diagram below shows the Flink
streaming scheduling topology structure, where obtaining scheduling partitions,
listing files to merge, an [...]
-
-
-
-### Benchmark
-
-Based on TPCDS and Nexmark standard datasets, benchmark evaluations were
conducted under unified test environments for three table formats: MoR-LSM
(JD-Hudi optimized version), MoR-Avro (JD-Hudi original version), and PK Table
(Paimon 1.0.1), using task execution time or data consumption time as
evaluation metrics. The results are as follows:
-
-
-
-## 03 Business Practices
-
-### 02. Partial Update Foreign-Key Join
-
-Based on earlier near-real-time data asset transformation, upstream product
basic detail pipelines now have near-real-time capabilities. The current goal
is to continue advancing near-real-time transformation of downstream product
wide tables. Product wide tables use SKU information as the core foundation and
need to extend related dimension information through streaming processing,
involving non-primary-key association update scenarios.
-
-
-
-**Existing approaches:**
-
-- **Real-time**: Using Flink streaming Join results in massive state storage
at scale, with high maintenance costs and operational complexity
-- **Offline**: Periodic Spark Join degrades data timeliness to scheduling
cycle + execution time
-
-These two approaches struggle to simultaneously guarantee pipeline stability
and data timeliness in massive data Join scenarios. Therefore, JD.com designed
and introduced a foreign key index solution.
-
-**Foreign Key Index Solution**
-
-In Hudi, Partial Update in streaming primary key association scenarios can
push large state down to the storage layer, ensuring real-time task stability.
With foreign key index capability, primary keys can be efficiently queried by
foreign key values to meet business requirements.
-
-JD.com designed a foreign key index with: the ability to quickly locate
primary key values by foreign key values, support for concurrent update
operations, efficient point query performance, and extensible, pluggable index
storage mechanisms. The overall pipeline flow is as follows:
-
-
-
-- SKU data stream maintains the foreign key index in real-time and performs
Partial Update to the Hudi product wide table
-- SPU data stream queries the foreign key index in real-time, retrieves all
related primary keys, expands them, and performs Partial Update to the Hudi
product wide table
-- Achieves minute-level timeliness for business requirements
-
-### 03. Data Lake + AI Exploration: Hudi NativeIO SDK
-
-Current training engine I/O layers lack native data lake adaptation capability
("NativeIO" capability), causing I/O amplification during data reads with high
serialization/deserialization overhead, further limiting training efficiency
improvements.
-
-As business scenarios demand increasingly real-time and accurate models,
there's an urgent need to establish direct connectivity between sample training
engines and data lakes. By building a Hudi-compatible NativeIO SDK, training
engines can bypass intermediate sync steps to read samples directly from Hudi
lake tables, deeply leveraging data lake capabilities in incremental updates
and efficient filtering. Meanwhile, NativeIO-level optimizations—such as
columnar storage adaptation, batch [...]
-
-
-
-JD.com's internal Hudi NativeIO SDK adopts a layered, decoupled architecture
design, divided into four core modules: data invocation layer, cross-language
Transformation layer, Hudi view management layer, and high-performance query
layer. Each module has clear responsibilities and boundaries, working together
top-to-bottom to ensure user operation convenience while achieving efficient
lake table data reads and high-timeliness responses, ultimately supporting
sample training engines to di [...]
-
-Reading Parquet files via NativeIO SDK achieves approximately **2x performance
improvement** compared to Spark vectorized Parquet reads.
-
-This performance optimization effectively addresses key bottlenecks in AI
scenario sample engine data reads, significantly improving overall processing
timeliness.
-
-### Traffic Data Warehouse ADM Data Lake Upgrade
-
-Traffic business currently has separate real-time and offline development
pipelines, with high maintenance and development costs and inconsistent
metrics. Offline pipelines where each domain independently processes click data
lead to inconsistent metrics and redundant data storage. Additionally,
near-real-time data scenarios are not currently supported. Traffic business
faces challenges including severe data skew, massive data scale, and concurrent
data corrections. To address these issu [...]
-
-
-
-**Challenge**: Bucket Index-based MoR tables in traffic data production
pipelines have uneven bucket allocation to write tasks, causing some tasks to
be heavily loaded while others idle, with low parallel resource utilization
affecting traffic job write performance. Additionally, traffic business
partition data skew (TB to MB) causes Fixed Bucket mode to generate excessive
small files for small data partitions.
-
-**Solution**:
-- Developed Remote Partitioner functionality based on TimelineService, using
centralized partition + bucket allocation strategy to solve bucket allocation
skew
-- Developed partition-level bucketing, customizing bucket counts per partition
based on business partition data characteristics, with in-place bucket scaling
-
-**Challenge**: Traffic exposure business has hundreds of billions of daily
update records. MoR (Avro) compaction performance, read-time merge, and update
performance cannot meet requirements, with significant SLA pressure. The
community's native MoR doesn't support concurrent updates, and backfill
scenarios requiring production stoppage cannot meet traffic business
near-real-time requirements.
-
-**Solution**:
-- Developed LSM-Tree data organization format, leveraging LSM sequential
read/write and tiered merge core advantages to improve MoR read, write, and
merge performance
-- Based on LSM-Tree file isolation mechanism, resolved metadata conflicts,
JM-TM communication conflicts, and Hive metadata sync conflicts, achieving
lightweight lock-free concurrent Upsert
-
-**Challenge**: Real-time SKU dimension association in traffic business
pipeline where SKU changes in real-time causes inconsistent dimension
information for different primary keys with the same SKU within daily
partitions, requiring T+1 offline corrections.
-
-**Solution**:
-- Developed Hudi foreign key index + Partial Update to achieve primary key
refresh when SKU dimension information changes, maintaining SKU information
data consistency within the same partition
-
-## 04 Community Contributions and Future Plans
-
-### Community Contributions
-
-The JD.com Hudi team has contributed **109 PRs** to the community, including
significant work such as:
-- RFC-83: Incremental Table Service
-- RFC-89: Partition Level Bucket Index
-- HUDI-6212: Hudi Spark 3.0.x integration
-
-This fully demonstrates the team's important role and continuous contribution
capability in the Hudi project. Currently, the team includes **1 Hudi PMC
member** and **1 Hudi Committer**. Among Hudi's top 100 source code
contributors, **6 are from the JD.com team**—making them one of the most
important forces in the Hudi community.
-
-### Future Plans
-
-- **Upgrade to latest community version**: Continue advancing JD.com's
internal Hudi version based on the latest community releases
-- **Multi-modal data lake capabilities**: Support unstructured data storage
and vector indexing capabilities for AI scenarios
-- **Rust+Arrow NativeIO advancement**: Further develop NativeIO capabilities
based on Rust and Arrow
-- **Lake-stream integration exploration**: Explore unified architectures
bridging streaming and lakehouse paradigms
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/01.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/01.webp
deleted file mode 100644
index 1b11f16b14d8..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/01.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/02.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/02.webp
deleted file mode 100644
index 8bb9c0f0d790..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/02.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/03.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/03.webp
deleted file mode 100644
index 5ebeb8d902a8..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/03.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/04.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/04.webp
deleted file mode 100644
index 6a81f600f053..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/04.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/05.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/05.webp
deleted file mode 100644
index 474d41d21acd..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/05.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/06.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/06.webp
deleted file mode 100644
index 2470a65742fa..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/06.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/07.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/07.webp
deleted file mode 100644
index 7efd801639ef..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/07.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/08.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/08.webp
deleted file mode 100644
index be2365247d1a..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/08.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/09.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/09.webp
deleted file mode 100644
index 0e0ea1f01e91..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/09.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/10.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/10.webp
deleted file mode 100644
index 8aa887b6403d..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/10.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/11.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/11.webp
deleted file mode 100644
index f48d9414144a..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/11.webp
and /dev/null differ
diff --git
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/12.webp
b/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/12.webp
deleted file mode 100644
index e78e349237b8..000000000000
Binary files
a/website/static/assets/images/blog/2026-01-06-jd-hudi-architecture-evolution/12.webp
and /dev/null differ