suxiaogang223 opened a new issue, #60505:
URL: https://github.com/apache/doris/issues/60505
## Background
Currently, Doris supports querying Paimon system tables (e.g.,
`table$snapshots`, `table$partitions`) through the Table-Valued Function (TVF)
path. The execution flow is:
```
SQL: SELECT * FROM catalog.db.table$snapshots
→ MetadataScanNode → TVF → JNI Scanner → Paimon SDK
```
This approach works well for **metadata-oriented system tables** (snapshots,
manifests, partitions, etc.) that return small result sets from metadata files.
However, for **data-oriented system tables** like `binlog`, `audit_log`, and
`ro` (read-optimized), the current JNI-based approach has significant
performance limitations:
1. **Data Source Difference**: Unlike metadata tables,
`binlog`/`audit_log`/`ro` read actual data files (ORC/Parquet), not metadata
files
2. **JNI Overhead**: Large data volumes suffer from JNI
serialization/deserialization overhead
3. **Missing Native Optimizations**: Cannot leverage Doris's native
vectorized ORC/Parquet readers and predicate pushdown optimizations
## Paimon System Table Classification
| Category | System Tables | Data Source | Current Path | Proposed Path |
|----------|--------------|-------------|--------------|---------------|
| Metadata | snapshots, manifests, partitions, schemas, options, tags,
branches, files, buckets, etc. | Metadata files | TVF + JNI | Keep as-is |
| Data | **binlog**, **audit_log**, **ro** | Actual data files (ORC/Parquet)
| TVF + JNI | **Native Read** |
## Paimon binlog/audit_log Implementation Analysis
In Paimon's codebase, `BinlogTable` and `AuditLogTable` are special:
- They implement `DataTable` interface (not just `ReadonlyTable`)
- They **wrap the underlying `FileStoreTable`** and read actual data files
- They reuse `DataSplit` from the source table
- The underlying storage format is ORC/Parquet
Key implementation files in Paimon:
- `paimon-core/.../table/system/BinlogTable.java` - extends AuditLogTable
- `paimon-core/.../table/system/AuditLogTable.java` - wraps FileStoreTable
This makes them ideal candidates for native reading in Doris.
## Proposal
Refactor the system table query path to support native reading for
data-oriented Paimon system tables.
### Phase 1: FE Refactoring
**Goal**: Route `binlog`/`audit_log`/`ro` queries through `PaimonScanNode`
instead of `MetadataScanNode`.
1. **Extend `SysTable` interface**
- Add `useNativeTablePath()` method to distinguish execution paths
- Add `getSchema()` method for native path schema retrieval
2. **Create `PaimonSysExternalTable`**
- New class extending `PaimonExternalTable`
- Wraps source table with system table type
- Returns Paimon `BinlogTable`/`AuditLogTable` instance from
`getPaimonTable()`
3. **Modify `BindRelation`**
- Check `useNativeTablePath()` before creating TVF relation
- Create `LogicalFileScan` for native-path system tables
4. **Adapt `PaimonScanNode`**
- Support `PaimonSysExternalTable` as scan source
- Generate splits from system table's `ReadBuilder`
- Pass system table type to BE via `TPaimonFileDesc`
### Phase 2: BE Native Reader Implementation
**Goal**: Implement native readers for binlog/audit_log with row
transformation logic.
1. **Extend Thrift definitions**
- Add `sys_table_type` field to `TPaimonFileDesc`
- Add `force_keep_delete` and `is_streaming` flags
2. **Implement `PaimonAuditLogReader`**
- Wrap native ORC/Parquet reader
- Add `rowkind` column based on delete vectors
- Support `forceKeepDelete` semantics
3. **Implement `PaimonBinlogReader`**
- Extend `PaimonAuditLogReader`
- Convert columns to array types
- Pack UPDATE_BEFORE/UPDATE_AFTER pairs (streaming mode)
## Expected Architecture
```
SQL: SELECT * FROM catalog.db.table$binlog
↓
BindRelation (useNativeTablePath=true)
↓
PaimonSysExternalTable (wraps source table)
↓
PaimonScanNode (reuse existing logic)
- Serialize BinlogTable
- Get DataSplits from BinlogTable
- Set sys_table_type="binlog"
↓
BE: PaimonBinlogReader
- Native ORC/Parquet reading
- Add rowkind column
- Array conversion for binlog
- Changelog packing (streaming)
```
## Benefits
1. **Performance**: Leverage native vectorized readers, avoid JNI overhead
2. **Predicate Pushdown**: Native readers support efficient filtering
3. **Resource Efficiency**: Reduced memory copying between Java and C++
4. **Consistency**: Unified execution path with regular Paimon tables
5. **Scalability**: Better performance for large-scale CDC scenarios
## Tasks
- [ ] Phase 1.1: Extend `SysTable` interface with `useNativeTablePath()`
- [ ] Phase 1.2: Implement `PaimonSysExternalTable` class
- [ ] Phase 1.3: Modify `BindRelation` to support native path routing
- [ ] Phase 1.4: Adapt `PaimonScanNode` for system tables
- [ ] Phase 2.1: Extend Thrift definitions for system table params
- [ ] Phase 2.2: Implement `PaimonAuditLogReader`
- [ ] Phase 2.3: Implement `PaimonBinlogReader`
- [ ] Phase 2.4: Add regression tests
## Related
- Paimon BinlogTable:
https://github.com/apache/paimon/blob/master/paimon-core/src/main/java/org/apache/paimon/table/system/BinlogTable.java
- Paimon AuditLogTable:
https://github.com/apache/paimon/blob/master/paimon-core/src/main/java/org/apache/paimon/table/system/AuditLogTable.java
## Use Case
This feature is particularly valuable for:
- Real-time CDC pipelines reading Paimon binlog
- Data auditing scenarios with large audit_log tables
- Read-optimized queries on Paimon tables with `ro` system table
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]