Re: [PR] IGNITE-27061 Publish Schema-driven design blog post [ignite-website]

via GitHub Mon, 17 Nov 2025 06:53:52 -0800


jinxxxoid commented on code in PR #285:
URL: https://github.com/apache/ignite-website/pull/285#discussion_r2534405256



##########
_src/_blog/schema-design-for-distributed-systems-ai3.pug:
##########
@@ -0,0 +1,384 @@
+---
+title: " Schema Design for Distributed Systems: Why Data Placement Matters"
+author: "Michael Aglietti"
+date: 2025-11-18
+tags:
+    - apache
+    - ignite
+---
+
+p Discover how Apache Ignite 3 keeps related data together with schema-driven 
colocation, cutting cross-node traffic and making distributed queries fast, 
local and predictable.
+
+<!-- end -->
+
+h3 Schema Design for Distributed Systems: Why Data Placement Matters
+
+p You can scale out your database, add more nodes, and tune every index, but 
if your data isn’t in the right place, performance still hits a wall. Every 
distributed system eventually runs into this: joins that cross the network, 
caches that can’t keep up, and queries that feel slower the larger your cluster 
gets.
+
+p.
+  Most distributed SQL databases claim to solve scalability. They partition 
data evenly, replicate it across nodes, and promise linear performance. But 
#[em how] data is distributed and #[em which] records end up together matters 
more than most people realize.
+  If related data lands on different nodes, every query has to travel the 
network to fetch it, and each millisecond adds up.
+
+
+p.
+  That’s where #[strong data placement] becomes the real scaling strategy. 
Apache Ignite 3 takes a different path with #[strong schema-driven colocation] 
— a way to keep related data physically together. Instead of spreading rows 
randomly across nodes, Ignite uses your schema relationships to decide where 
data lives. The result: a 200 ms cross-node query becomes a 5 ms local read.
+
+hr
+
+h3 How Ignite 3 Differs from Other Distributed Databases
+
+p
+  strong Traditional Distributed SQL Databases:
+ul
+  li Hash-based partitioning ignores data relationships
+  li Related data scattered across nodes by default
+  li Cross-node joins create network bottlenecks
+  li Millisecond latencies due to disk-first architecture
+
+p
+  strong Ignite 3 Schema-Driven Approach:
+ul
+  li Colocation configuration in schema definitions
+  li Related data automatically placed together
+  li Local queries eliminate network overhead
+  li Microsecond latencies through memory-first storage
+
+hr
+
+h3 The Distributed Data Placement Problem
+
+p You’ve tuned indexes, optimized queries, and scaled your cluster—but latency 
still creeps in. The problem isn’t your SQL — it’s where your data lives.
+
+p Traditional hash-based partitioning distributes records randomly across 
nodes based on primary key values. While this ensures even data distribution, 
it scatters related records that applications frequently access together. It’s 
a clever approach — until you need to join data that doesn’t share the same 
key. Then every query turns into a distributed operation, and your network 
becomes the bottleneck.
+
+p Ignite 3 provides automatic colocation based on schema relationships. You 
define relationships directly in your schema, and Ignite automatically places 
related data on the same nodes using the specified colocation keys.
+
+p.
+  Using a #[a(href="https://github.com/lerocha/chinook-database/tree/master";) 
music catalog example], we’ll demonstrate how schema-driven data placement 
reduces query latency from 200 ms to 5 ms.
+
+blockquote
+  p.
+    This post assumes you have a basic understanding of how to get an Ignite 3 
cluster running and have worked with the Ignite 3 Java API. If you’re new to 
Ignite 3, start with the 
#[a(href="https://ignite.apache.org/docs/ignite3/latest/quick-start/java-api";) 
Java API quick start guide] to set up your development environment.
+
+
+hr
+
+h3 How Ignite 3 Places Data Differently
+
+p Tables are distributed across multiple nodes using consistent hashing, but 
with a key difference: your schema definitions control data placement. Instead 
of accepting random distribution of related records, you declare relationships 
in your schema and let Ignite handle placement automatically.
+
+p
+  strong Partitioning Fundamentals:
+ul
+  li Each table is divided into partitions (typically 64–1024 per table)
+  li Primary key hash determines which partition data goes into
+  li Partitions are distributed evenly across available nodes
+  li Each partition has configurable replicas for fault tolerance
+
+p
+  strong Data Placement Concepts:
+ul
+  li
+    strong Affinity
+    |  – the algorithm that determines which nodes store which partitions
+  li
+    strong Colocation
+    |  – ensuring related data from different tables gets placed on the same 
nodes
+
+p.
+  The diagram below shows how colocation works in practice. Artist and Album 
tables use different primary keys, but colocation strategy ensures albums are 
partitioned by #[code ArtistId] rather than #[code AlbumId]:
+
+
+// prettier-ignore

Review Comment:
   and done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] IGNITE-27061 Publish Schema-driven design blog post [ignite-website]

Reply via email to