platinumhamburg opened a new issue, #2398: URL: https://github.com/apache/fluss/issues/2398
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar. ### Motivation Runtime code generation is essential for achieving high performance in data processing systems. Instead of using reflection or generic implementations, generated code can: 1. **Eliminate virtual method dispatch** - Direct field access and type-specific operations 2. **Enable JIT optimization** - Generated code can be better optimized by the JVM 3. **Reduce boxing/unboxing overhead** - Type-specific code avoids primitive boxing 4. **Support complex type comparisons** - Nested types (arrays, maps, rows) require specialized comparison logic The initial use case is `RecordEqualiser` for comparing `InternalRow` instances, which is needed for: - Change data capture (CDC) deduplication - Aggregation state management - Primary key table updates ### Solution ### Core Framework | Component | Description | |-----------|-------------| | `CodeGeneratorContext` | Manages reusable code fragments, member variables, and class-level declarations | | `JavaCodeBuilder` | Type-safe builder for constructing Java source code with fluent API | | `CompileUtils` | Compiles generated source code using Janino with LRU caching | | `GeneratedClass<T>` | Wrapper holding generated source code and compiled class | | `CodeGenException` | Exception type for code generation failures | ### Type-Safe API - `Modifier` enum: PUBLIC, PRIVATE, PROTECTED, STATIC, FINAL, ABSTRACT, SYNCHRONIZED, VOLATILE, TRANSIENT - `PrimitiveType` enum: BOOLEAN, BYTE, CHAR, SHORT, INT, LONG, FLOAT, DOUBLE, VOID - `Param` class: Type-safe method parameter representation - Helper methods: `mods()`, `params()`, `typeOf()`, `arrayOf()` ### Code Generators | Generator | Output Interface | Description | |-----------|------------------|-------------| | `EqualiserCodeGenerator` | `RecordEqualiser` | Generates code for comparing two `InternalRow` instances | ### Supported Data Types The `EqualiserCodeGenerator` supports all Fluss data types: - **Primitive types**: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE - **String types**: CHAR, VARCHAR, STRING - **Binary types**: BINARY, VARBINARY, BYTES - **Temporal types**: DATE, TIME, TIMESTAMP, TIMESTAMP_LTZ - **Numeric types**: DECIMAL (with precision/scale) - **Complex types**: ARRAY, MAP, ROW (nested) ### Features - Field projection support for partial row comparison - Compiled class caching with configurable cache size - Janino dependency shaded to `org.apache.fluss.shaded.org.codehaus.janino` to avoid classpath conflicts - Comprehensive Javadoc and package-info documentation ### Anything else? _No response_ ### Willingness to contribute - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
