GitHub user chiradip added a comment to the discussion: Iggy Connector
Architecture Involving Data/Stream Processors like Flink
# Connector Library Placement: Where Should Common Code Live?
**Author**: Chiradip
**Date**: 2025-10-16
**Context**: While architecting the Flink connector for Iggy, I realized we'd
have substantial reusable code (~88%) for future processors like Spark. This
led to a fundamental question about where this common connector code should
live.
---
## The Question
While working on `option_x_1.md`, I initially proposed:
```
external-processors/
└── iggy-connector-flink/
└── iggy-connector-library/ # Contains common + Flink code
└── org/apache/iggy/connector/
├── config/ # Framework-agnostic
├── serialization/ # Framework-agnostic
├── offset/ # Framework-agnostic
└── flink/ # Flink-specific
```
But then I thought: **What if the common connector code
(`org.apache.iggy.connector.*`) actually belongs in the Iggy java-sdk itself?**
This isn't just a file organization question - it's a philosophical one about
what "SDK" means for Iggy.
---
## Why I'm Even Considering This
### 1. These Aren't Really "Flink Patterns" - They're "Iggy Patterns"
When I look at what's in the common code:
- **Offset tracking** - This is about Iggy's consumer offset model, not Flink's
- **Partition discovery** - This is about discovering Iggy partitions
- **Connection pooling** - Managing connections to Iggy servers
- **Serialization schemas** - Converting Iggy message bytes to/from objects
All of these are **Iggy integration patterns**, not stream processor patterns.
They're reusable precisely because they encode knowledge about Iggy's
semantics, not Flink's or Spark's.
### 2. The Kafka Precedent
I looked at how Kafka does this:
```
kafka-clients/
├── org/apache/kafka/clients/
│ ├── consumer/ # Consumer client
│ ├── producer/ # Producer client
│ └── admin/ # Admin operations
└── org/apache/kafka/common/
├── serialization/ # SerDes (Serializer/Deserializer)
├── config/ # Configuration abstractions
└── errors/ # Common exceptions
kafka-streams/ # Higher-level stream processing API
flink-connector-kafka/ # Lives in Flink ecosystem, depends on
kafka-clients
spark-connector-kafka/ # Lives in Spark ecosystem, depends on
kafka-clients
```
Kafka includes serialization, configuration patterns, and common utilities **in
the core client library**. Flink and Spark connectors then depend on
`kafka-clients` and wrap it in their respective APIs.
This suggests there's precedent for "integration patterns" living in the SDK.
### 3. User Experience Simplification
**Current approach** (separate library):
```kotlin
dependencies {
implementation("org.apache.iggy:iggy:0.5.0") // SDK
implementation("org.apache.iggy:iggy-connector-library:0.5.0") // Common
connector code
compileOnly("org.apache.flink:flink-streaming-java:1.18.0") // Flink
}
```
**If in SDK**:
```kotlin
dependencies {
implementation("org.apache.iggy:iggy:0.5.0") // SDK +
common patterns
compileOnly("org.apache.flink:flink-streaming-java:1.18.0") // Flink
}
```
One less dependency to manage. One less version to coordinate.
---
## Why I'm Hesitant
### 1. SDK Bloat
The Iggy SDK is currently lean and focused: it's a client for talking to Iggy
servers. Adding ~1500 LOC of connector abstractions changes its character.
Not everyone who uses the SDK needs:
- Offset tracking strategies
- Partition discovery algorithms
- Sophisticated retry policies
- Multiple serialization formats (JSON, Avro, etc.)
A microservice that just publishes events to Iggy doesn't need this extra
weight.
### 2. Dependency Contamination
The connector abstractions would need dependencies:
```kotlin
dependencies {
// Core Iggy SDK needs these
implementation("org.apache.httpcomponents.client5:httpclient5:5.4.3")
implementation("com.fasterxml.jackson.core:jackson-databind:2.18.0")
// NEW: Connector patterns would add these
implementation("org.apache.avro:avro:1.11.3") // For Avro
serialization
implementation("io.micrometer:micrometer-core:1.12.0") // For metrics
collection
// ... more as serialization formats grow
}
```
Should every SDK user pull in Avro dependencies even if they never serialize to
Avro?
**Counter-argument**: We could use `compileOnly` or make them optional
dependencies, but that adds complexity.
### 3. Release Coupling
If the connector patterns need a breaking change (e.g., we discover offset
tracking needs a different interface), it forces the entire SDK to bump major
version.
**Example scenario**:
- SDK is stable at v0.6.0
- We discover offset tracking needs API changes for Spark integration
- Must release SDK v0.7.0 even though client code didn't change
- All users must upgrade, even those not using connectors
With separate artifacts:
- SDK stays at v0.6.0
- `iggy-connector-common` bumps to v0.4.0
- Only connector users are affected
### 4. Conceptual Boundary
I think there's value in maintaining a clear line:
**SDK = "How to communicate with Iggy"**
- Connect to server
- Send messages
- Receive messages
- Manage streams/topics
- Handle consumer groups
**Connector Library = "How to integrate Iggy with data processing frameworks"**
- Offset management strategies
- Partition assignment logic
- Serialization format handling
- Retry and error handling patterns
Different audiences:
- **SDK users**: Application developers building services
- **Connector users**: Data engineers building pipelines
Mixing these might confuse the story.
### 5. Portability Implications
If we eventually contribute the Flink connector to Apache Flink (or it moves to
a separate repo), having it depend on a lightweight `iggy-connector-common` is
cleaner than depending on the full Iggy SDK.
**Flink connector depending on Iggy SDK:**
- Pulls in HTTP client, full message models, admin APIs
- Heavier dependency footprint
**Flink connector depending on connector-common:**
- Just the abstractions needed for integration
- Lighter, more focused
---
## The Middle Ground I'm Considering
### Option 1: SDK with Modular Structure
Keep it in SDK but make it clearly optional:
```
java-sdk/
└── src/
└── main/java/org/apache/iggy/
├── client/ # Core client (required)
├── messages/ # Message models (required)
├── streams/ # Stream operations (required)
└── connector/ # Integration patterns (optional)
├── config/
├── serialization/
├── offset/
└── partition/
```
Publish single artifact but document that `connector.*` packages are optional
utilities for integration scenarios.
**Pros**:
- One artifact, simpler for users
- Natural discoverability
- Connector patterns get SDK-level stability guarantees
**Cons**:
- Still has bloat/dependency issues
- Can't version independently
- Mixed concerns in one artifact
### Option 2: Separate Module, Co-Located in java-sdk Project
```
foreign/java/
├── java-sdk/ # Core client
│ └── build.gradle.kts
├── java-connector-common/ # NEW: Connector abstractions
│ └── build.gradle.kts
│ dependencies {
│ api(project(":java-sdk")) # Depends on SDK
│ }
└── external-processors/
└── iggy-connector-flink/
└── build.gradle.kts
dependencies {
api("org.apache.iggy:iggy-connector-common:0.5.0")
}
```
**Published artifacts**:
- `org.apache.iggy:iggy` - Core SDK
- `org.apache.iggy:iggy-connector-common` - Connector abstractions
- Flink/Spark connectors depend on `iggy-connector-common`
**Pros**:
- Clean separation of concerns
- Independent versioning
- SDK stays lean
- Connector patterns live at Iggy level (not scattered in processor projects)
- Easy to find: still in `foreign/java/`
**Cons**:
- Two artifacts instead of one
- Users need both dependencies
- More complex dependency graph
### Option 3: Stay in iggy-connector-flink, Extract When Spark Arrives
The current plan in `option_x_1.md`:
- Start with `iggy-connector-library` inside `iggy-connector-flink/`
- Contains both common and Flink code
- When Spark is added, evaluate and extract common to `iggy-connector-common/`
if reuse >70%
**Pros**:
- Pragmatic: build first, extract later
- Allows patterns to stabilize before promoting
- YAGNI principle: don't create abstractions until second use case exists
- Flink-first approach keeps things simple initially
**Cons**:
- Requires refactoring when Spark arrives
- Flink connector temporarily "owns" common code
- Users might start depending on internal packages
---
## My Current Thinking
### Decision Framework
I'm leaning towards making this decision based on **pattern maturity**:
```
┌─────────────────────────────────────────────────────────────┐
│ Pattern Lifecycle │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Experimental (in processor-specific package) │
│ └─> Patterns are unproven, rapidly changing │
│ │
│ 2. Stable (extract to java-connector-common) │
│ └─> Battle-tested by 2+ processors, API stable │
│ │
│ 3. Foundational (promote to java-sdk) │
│ └─> Universal patterns, used by majority of users │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Phase 1: Start in iggy-connector-flink (Now)
Keep the current `option_x_1.md` structure:
```
external-processors/
└── iggy-connector-flink/
└── iggy-connector-library/
└── org/apache/iggy/connector/
├── config/ # Common abstractions
├── serialization/ # Common abstractions
├── offset/ # Common abstractions
└── flink/ # Flink-specific
```
**Why**:
- Patterns are still **experimental** - I haven't actually built them yet!
- Allows rapid iteration without breaking SDK compatibility
- No commitment until we see real-world usage
- YAGNI: don't extract until proven valuable by second consumer (Spark)
**Design discipline**:
- Keep `org.apache.iggy.connector.*` (non-flink) packages framework-agnostic
- No Flink imports in common code
- Design as if they'll be extracted later
### Phase 2: Extract to java-connector-common (When Spark Arrives)
After implementing Spark connector and validating >70% reuse:
```
foreign/java/
├── java-sdk/ # Lean core
├── java-connector-common/ # NEW: Proven patterns
│ └── org/apache/iggy/connector/
│ ├── config/
│ ├── serialization/
│ └── offset/
└── external-processors/
├── iggy-connector-flink/
│ └── iggy-flink-connector/ # Flink-specific only
└── iggy-connector-spark/
└── iggy-spark-connector/ # Spark-specific only
```
**Why**:
- Patterns proven stable by 2 implementations
- Clear reuse case justifies extraction
- Clean separation achieved
- Independent versioning established
**Migration**:
1. Create `java-connector-common/` module
2. Move `org.apache.iggy.connector.*` packages from Flink library
3. Both Flink and Spark depend on `connector-common`
4. Publish `org.apache.iggy:iggy-connector-common`
### Phase 3: Consider SDK Promotion (After 1+ Year Stability)
After `connector-common` has been stable for 1+ year and multiple processors
(Flink, Spark, maybe Beam):
**Evaluate**:
- Are these patterns used by >50% of Iggy users (not just processor
integrations)?
- Have APIs been stable (no breaking changes in a year)?
- Are dependencies minimal (no heavy libs like Avro required in common)?
- Do non-processor use cases benefit (e.g., custom consumers using offset
tracking)?
**If YES → Merge into SDK**:
```
java-sdk/
└── org/apache/iggy/
├── client/ # Core client
├── connector/ # Promoted patterns
└── ...
```
**If NO → Keep separate**:
Connector patterns remain at integration level, not SDK level.
---
## Kafka vs Iggy: Key Difference
One thing I realized: **Kafka's client already includes SerDes because that's
fundamental to Kafka's usage model.**
Every Kafka producer/consumer configures serializers:
```java
Properties props = new Properties();
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.JsonSerializer");
KafkaProducer<String, User> producer = new KafkaProducer<>(props);
```
Serialization is **core to Kafka's API**. So `kafka-clients` includes SerDes.
**Iggy is different**: The SDK sends/receives byte arrays. Serialization is
application-level concern:
```java
IggyClient client = IggyClient.create(config);
byte[] messageBytes = objectMapper.writeValueAsBytes(user);
client.sendMessage(streamId, topicId, messageBytes);
```
So for Iggy, serialization schemas aren't SDK-level - they're
**integration-level**. This supports keeping them separate.
---
## My Decision (For Now)
### Stick with option_x_1.md Structure
I'm going to:
1. **Keep `iggy-connector-library` in `iggy-connector-flink/`** as designed in
option_x_1.md
2. **Use clear package separation**: `connector.*` (common) vs
`connector.flink.*` (Flink-specific)
3. **Design for extraction**: Write common code as if it will be pulled out
later
4. **Document the evolution path**: Clearly state in README that common code
may be extracted when Spark is added
### Rationale
- **Pragmatic**: Build and prove the patterns first
- **Flexible**: Can extract to `connector-common` or even SDK later based on
real data
- **Low risk**: If extraction never happens (e.g., Spark reuse is <40%), we
haven't pre-optimized
- **Reversible**: Clear package boundaries make future refactoring
straightforward
### Design Principles to Follow
To make future extraction easy:
1. ✅ **No Flink imports in `org.apache.iggy.connector.*`** (only in
`connector.flink.*`)
2. ✅ **Minimal dependencies** in common code (avoid heavy libs)
3. ✅ **Interface-based design** (easy to swap implementations)
4. ✅ **Document extraction intent** in package-info.java
5. ✅ **Treat as "internal"** initially (no stability guarantees to external
users)
### When to Revisit
Revisit this decision when:
- Spark connector is being implemented (evaluate actual reuse %)
- Common patterns have been stable for 6+ months
- External users ask to use connector patterns standalone
- We discover non-processor use cases for these utilities
---
## Questions I'm Still Pondering
### 1. Should Serialization Live in SDK?
Even if offset tracking and partition discovery stay in connectors, should
serialization schemas be SDK-level?
**Argument for SDK**:
- Every Iggy user needs to serialize/deserialize
- Basic JSON/Avro helpers would benefit all users
- Natural place for `IggyMessage<T>` type-safe wrappers
**Argument against**:
- SDK is currently byte-array focused (clean, simple)
- Serialization format is application choice
- Adding Jackson/Avro deps to SDK is heavy
**Leaning**: Keep in connector library for now, but this is a strong candidate
for eventual SDK inclusion.
### 2. What About Client Pooling?
Connection pooling in `org.apache.iggy.connector.util.IggyClientPool` - is this
connector-level or SDK-level?
Many non-connector users would benefit from this. Maybe this should be in SDK
sooner?
### 3. Naming: "Connector" vs "Integration"?
`org.apache.iggy.connector.*` implies these are connector-specific patterns.
`org.apache.iggy.integration.*` might better signal "reusable integration
utilities."
But "connector" is more specific and matches industry terminology (Kafka uses
"connector").
**Decision**: Stick with "connector" for now.
---
## Conclusion
I'm documenting this deliberation because I think it reflects an important
architectural tension: **SDK minimalism vs. integration convenience**.
Kafka chose convenience (include SerDes in client). That works for them because
their usage model demands it.
Iggy's SDK is cleaner without these concerns. But as we build the integration
ecosystem (Flink, Spark, Beam), we need shared patterns.
The three-phase evolution (experimental → extracted → promoted) gives us time
to learn what belongs where without premature commitment.
**For option_x_1.md**: The current design stands. No changes needed. This
deliberation informs future decisions, not present ones.
---
**Status**: Decision made - stick with current structure, document evolution
path
**Next review**: When Spark connector implementation begins
**Owner**: Chiradip
**Stakeholders**: Iggy maintainers, future Flink/Spark connector contributors
GitHub link:
https://github.com/apache/iggy/discussions/2236#discussioncomment-14702253
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]