chiradip opened a new issue, #2276:
URL: https://github.com/apache/iggy/issues/2276

   # Add Framework-Agnostic Connector Library for Stream Processors
   
   ## Summary
   
   Create a reusable, framework-agnostic connector library for Apache Iggy that 
provides core abstractions for integrating with stream processing engines 
(Apache Flink, Apache Spark, Apache Beam, etc.). This library will serve as the 
foundation for the upcoming Flink connector implementation while enabling 
future integrations with other stream processors.
   
   ## Motivation
   
   Currently, Iggy has connectors implemented as "sources" and "sinks" in 
specific language ecosystems. As we expand support to stream processing 
frameworks like Apache Flink and Apache Spark, we need:
   
   1. **Reusability**: Common abstractions that can be shared across multiple 
stream processor integrations (estimated 85-95% code reuse)
   2. **Consistency**: Unified configuration and error handling across all 
stream processor connectors
   3. **Maintainability**: Single source of truth for core connector logic, 
reducing duplication
   4. **Future-proofing**: Clean separation between framework-agnostic and 
framework-specific code
   
   This issue introduces "external processors" as a third category of 
connectors, distinct from sources and sinks, specifically for stream processing 
engine integrations.
   
   ## Proposed Solution
   
   ### Directory Structure
   
   ```
   foreign/java/external-processors/iggy-connector-flink/
   ├── iggy-connector-library/          # Core reusable library
   │   ├── src/main/java/org/apache/iggy/connector/
   │   │   ├── config/                  # Configuration classes
   │   │   ├── error/                   # Exception hierarchy
   │   │   ├── partition/               # Partition metadata
   │   │   └── serialization/           # SerDe interfaces
   │   └── src/test/java/               # Comprehensive unit tests
   └── iggy-flink-examples/             # Example Flink jobs (future work)
   ```
   
   ### Package Design
   
   **Framework-Agnostic Packages** (`org.apache.iggy.connector.*`):
   - No dependencies on Flink, Spark, or any specific framework
   - Pure Java abstractions with well-defined interfaces
   - Serializable for distributed execution
   - Documented for external reuse
   
   **Future Framework-Specific Packages** (`org.apache.iggy.connector.flink.*`):
   - Adapters wrapping common abstractions
   - Framework-specific implementations (Source/Sink APIs, checkpointing, etc.)
   - Minimal framework-specific logic
   
   ## Scope of Work
   
   ### Phase 1: Core Abstractions (This Issue)
   
   Implement the following framework-agnostic classes:
   
   #### Configuration (`org.apache.iggy.connector.config`)
   
   1. **IggyConnectionConfig** - Connection settings
      - Server address, credentials, timeouts
      - Retry configuration (maxRetries, retryBackoff)
      - TLS settings
      - Builder pattern with sensible defaults
   
   2. **OffsetConfig** - Offset management
      - Reset strategies: EARLIEST, LATEST, NONE
      - Auto-commit configuration
      - Commit interval and batch size
   
   #### Serialization (`org.apache.iggy.connector.serialization`)
   
   3. **DeserializationSchema<T>** - Framework-agnostic deserialization
      - Independent of Flink's SerializationSchema
      - Supports metadata access during deserialization
      - Type descriptor for framework mapping
   
   4. **SerializationSchema<T>** - Framework-agnostic serialization
      - Optional partition key extraction
      - Nullable support
   
   5. **RecordMetadata** - Message metadata
      - Stream, topic, partition, offset information
      - Immutable value object
   
   6. **TypeDescriptor<T>** - Type information abstraction
      - Maps to Flink's TypeInformation or Spark's Encoder
      - Supports generic types
   
   #### Error Handling (`org.apache.iggy.connector.error`)
   
   7. **ConnectorException** - Base exception hierarchy
      - Error codes: CONNECTION_FAILED, AUTHENTICATION_FAILED, 
RESOURCE_NOT_FOUND, SERIALIZATION_ERROR, OFFSET_ERROR, PARTITION_ERROR, 
CONFIGURATION_ERROR, TIMEOUT, RESOURCE_EXHAUSTED
      - Retryability flag for transient vs permanent failures
   
   #### Partition Management (`org.apache.iggy.connector.partition`)
   
   8. **PartitionInfo** - Partition metadata
      - Stream, topic, partition identifiers
      - Current and end offsets
      - Available messages calculation
   
   ### Phase 2: Flink Integration (Future Issue)
   
   - Implement `org.apache.iggy.connector.flink.*` packages
   - Flink Source API v2 implementation (SplitEnumerator, SourceReader)
   - Flink Sink API v2 implementation
   - Checkpointing and state management
   - Integration tests with Testcontainers
   
   ### Phase 3: Examples and Documentation (Future Issue)
   
   - Example Flink jobs demonstrating connector usage
   - Docker Compose setup for testing (Flink + Iggy)
   - Comprehensive README and usage documentation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to