davidzollo opened a new issue, #10359:
URL: https://github.com/apache/seatunnel/issues/10359
## Background
IBM Informix is a robust relational database management system widely used
in financial institutions, retail chains, and legacy enterprise systems,
particularly in banking core systems and point-of-sale (POS) applications.
While SeaTunnel currently supports Informix through the JDBC connector with
dialect support, there is a need for a **dedicated Informix connector** that
provides Change Data Capture (CDC) capabilities and performance optimizations
tailored to Informix-specific features.
## Motivation
- **Legacy System Integration**: Many financial institutions and retail
chains still rely on Informix for mission-critical applications
- **CDC Requirements**: Real-time data replication from Informix to modern
data platforms requires CDC capabilities
- **Performance Limitations**: Generic JDBC connector cannot leverage
Informix-specific optimization techniques
- **Banking & Retail**: Core systems in banks and POS systems in retail
require reliable, high-performance data integration
## Current Status vs. Proposed Enhancement
### Current Status (JDBC Connector)
✅ Basic read/write operations
✅ SQL query support
❌ No CDC support
❌ No Informix-specific optimizations
❌ Limited support for Informix-specific data types
❌ No smart large object (SLOB) handling
### Proposed Enhancement (Dedicated Connector)
✅ All existing JDBC capabilities
✅ **Change Data Capture (CDC)** using Informix CDC API
✅ **Performance optimizations** (fragmentation-aware queries, parallel
reads)
✅ **Advanced data type support** (SERIAL8, BIGSERIAL, LVARCHAR, etc.)
✅ **Smart large object handling** (BLOB, CLOB with streaming)
✅ **Connection pooling** optimized for Informix
## Proposed Solution
Implement a dedicated Informix connector with the following capabilities:
### 1. Change Data Capture (CDC)
Use Informix's native CDC capabilities:
**Option A: CDC API (Enterprise Edition)**
- Use Informix CDC Java API
- Capture INSERT, UPDATE, DELETE operations
- Low latency change streaming
- Transaction-level consistency
**Option B: Log-Based CDC (Smart Log)**
- Read from Informix logical logs
- Parse transaction log records
- Supports Standard and Enterprise editions
**Option C: Trigger-Based CDC (Fallback)**
- Create triggers on source tables
- Write changes to shadow tables
- Compatible with all Informix versions
### 2. Performance Optimizations
- **Fragmentation-Aware Queries**: Leverage table fragmentation for parallel
reads
- **Smart Buffering**: Optimize fetch size based on data types and row size
- **Connection Pooling**: Informix-specific connection pool tuning
- **Query Optimization**: Use Informix hints and optimizer directives
- **Parallel Extraction**: Multi-threaded reads aligned with fragments
### Configuration Example - CDC Mode
```hocon
source {
Informix {
# Connection
host = "informix-server.example.com"
port = 9088
database = "stores_demo"
schema = "informix"
# Authentication
username = "informix"
password = "******"
# CDC configuration
mode = "cdc" # or "snapshot", "incremental"
cdc_method = "cdc_api" # or "logical_log", "trigger_based"
# Table selection
tables = ["customer", "orders", "order_items"]
# CDC API specific (for cdc_api method)
cdc_api {
session_name = "seatunnel_cdc_session"
capture_deletes = true
capture_updates = true
capture_inserts = true
# Checkpoint configuration
checkpoint_interval_ms = 5000
start_position = "latest" # or "earliest", timestamp
}
# Performance tuning
fetch_size = 1000
max_retry_attempts = 3
connection_timeout_ms = 30000
# Data type handling
enable_slob_streaming = true
max_blob_size_mb = 10
}
}
```
### Configuration Example - Optimized Snapshot
```hocon
source {
Informix {
host = "informix-server.example.com"
port = 9088
database = "stores_demo"
schema = "informix"
username = "informix"
password = "******"
mode = "snapshot"
table = "customer"
# Performance optimization
parallel_reads = true
parallelism = 4
# Fragmentation-aware splitting
use_fragmentation_info = true
split_strategy = "by_fragment" # or "by_rowid", "by_key"
# Query optimization
use_optimizer_hints = true
optimizer_hints = "USE_NL(t1 t2)"
# Buffer tuning
fetch_size = 5000
enable_prefetch = true
socket_buffer_size_kb = 64
# Data type handling
handle_serial8_as_bigint = true
handle_lvarchar_as_string = true
}
}
```
### Configuration Example - Incremental
```hocon
source {
Informix {
host = "informix-server.example.com"
port = 9088
database = "stores_demo"
username = "informix"
password = "******"
mode = "incremental"
table = "orders"
# Incremental configuration
incremental_column = "order_date"
incremental_column_type = "datetime" # or "serial", "timestamp"
start_value = "2024-01-01 00:00:00"
# For SERIAL-based incremental
# incremental_column = "order_num"
# incremental_column_type = "serial"
# start_value = "1001"
}
}
```
### Sink Configuration Example
```hocon
sink {
Informix {
host = "informix-server.example.com"
port = 9088
database = "stores_demo"
table = "customer_copy"
username = "informix"
password = "******"
# Write mode
write_mode = "insert" # or "upsert", "update"
# For upsert mode
primary_keys = ["customer_num"]
# Batch configuration
batch_size = 1000
batch_interval_ms = 5000
# Transaction handling
enable_transaction = true
transaction_isolation = "READ_COMMITTED"
# Error handling
max_retries = 3
enable_dead_letter_queue = true
}
}
```
## Expected Benefits
1. **Real-Time Data Replication**: CDC enables near-real-time
synchronization from Informix to modern data platforms
2. **Performance Improvement**: 2-5x faster than generic JDBC for
large-scale data extraction
3. **Financial Compliance**: Reliable CDC for audit trails and regulatory
reporting in banking systems
4. **Legacy Modernization**: Enable gradual migration from Informix to
modern databases
5. **Operational Analytics**: Real-time analytics on POS and transaction data
## Technical Considerations
### Dependencies
- **Informix JDBC Driver**: IBM Informix JDBC 4.50+
- **CDC API**: Informix CDC Java API (for CDC mode)
- **Connection Pool**: HikariCP or custom pool optimized for Informix
### Informix Version Support
- **Informix 12.10+**: Full support including CDC API
- **Informix 11.70+**: Logical log-based CDC
- **Informix 11.50+**: Basic support with trigger-based CDC
### CDC Implementation
- **CDC API**: Best performance, requires Enterprise Edition
- **Logical Log Parsing**: Good performance, works with Standard Edition
- **Trigger-Based**: Fallback option, works with all versions
### Performance Considerations
- **Fragmentation**: Query fragment-specific data for parallel processing
- **Indexes**: Use index-based splits for better parallelism
- **ROWID**: Leverage ROWID for efficient data splitting
- **Smart Large Objects**: Stream BLOBs/CLOBs instead of loading into memory
### Error Handling
- **Connection Failures**: Automatic reconnection with exponential backoff
- **Transaction Rollbacks**: Retry failed transactions
- **CDC Gaps**: Detect and handle missing log records
- **Data Type Errors**: Handle Informix-specific type conversion issues
### Testing
- **Informix Developer Edition**: Free download for development and testing
- **Docker Image**: Use official IBM Informix Docker image
- **Integration Tests**: Test CDC, snapshot, and incremental modes
- **Performance Tests**: Validate fragmentation-aware parallelism
## Implementation Phases
### Phase 1: Enhanced Snapshot & Incremental (MVP)
- Fragmentation-aware parallel reads
- Optimized connection pooling
- Advanced data type support
- Performance tuning options
### Phase 2: CDC Support
- CDC API integration (Enterprise Edition)
- Logical log parsing (Standard Edition)
- Checkpoint and state management
- Exactly-once semantics
### Phase 3: Production Hardening
- Advanced error handling and retry logic
- Monitoring and metrics
- Performance profiling and optimization
- Comprehensive integration tests
### Phase 4: Advanced Features
- Multi-table CDC sessions
- Schema evolution handling
- Compression and encryption support
- Integration with Informix High Availability (HDR)
## References
- [Informix Change Data
Capture](https://www.ibm.com/docs/en/informix-servers/14.10?topic=capture-change-data)
- [Informix JDBC Driver
Documentation](https://www.ibm.com/docs/en/informix-servers/14.10?topic=drivers-jdbc-driver)
- [Informix Data
Types](https://www.ibm.com/docs/en/informix-servers/14.10?topic=types-built-in-data)
- [Informix Performance
Guide](https://www.ibm.com/docs/en/informix-servers/14.10?topic=guide-performance)
## Community Impact
This connector will:
- Enable financial institutions to modernize their data infrastructure while
maintaining Informix systems
- Provide real-time analytics capabilities for retail POS systems
- Support regulatory compliance and audit requirements in banking
- Position SeaTunnel as a viable solution for legacy system integration
---
**Priority**: Medium
**Estimated Effort**: Medium
**Target Release**: 2.3.15 or 3.0.0
**Note**: This builds upon existing JDBC dialect support
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]