GitHub user zhanglistar edited a comment on the discussion: Flink support
**Some Ideas:** 1. **Data Structure Adaptation:** Convert RowData to Vector, supporting RowKind. 2. **Operator Support:** Extend Velox Operator to support state and rollback. 3. **State Management:** Integrate Flink's StateBackend into Velox. 4. **Distributed Execution:** Connect Flink's networking and scheduling logic to Velox. 5. **Execution Flow:** Replace Flink StreamTask with Velox Task. 1. **State Engine** Velox does not have an internal state. We can introduce RocksDB as the state, which is not too difficult. However, this could become a potential performance bottleneck, so careful consideration is necessary. Since RocksDB involves disk operations, it is essential to orchestrate IO as asynchronous batch reads. 2. **Stream retract Functionality** This is one of the fundamental differences between stream and batch processing. Streams are updated, while batches are not. For example, a table from MySQL naturally has updates and deletes. Flink introduces a stream retract mechanism to address this issue, which involves carrying metadata on the data to indicate whether it is an update, delete, or insert. Clearly, Velox does not have this capability. Several modifications are needed: 1. Operators need to support rollback, such as window, stream join, stream aggregation, etc. This means that almost all operators will need to add this interface, which is a significant amount of work. 2. The data stream needs to include rollback indicators in the vector to indicate whether the current data is rollback data. Once both the operators and the data stream are modified to support rollback, the overall execution flow should be fine. 3. **Stateless Operators** such as Map, Filter, FlatMap, KeyBy, Project/Calc, Union, Split/Select, and some Sources and Sinks do not depend on state and only transform the current input. They are suitable for simple data processing and can be reused directly. This is similar to the approach taken by Ant Group, which offloads expression calculations to Velox using JNI. However, due to potential row-column conversion and low coverage, there may be performance issues, and the overall speedup may not be significant. GitHub link: https://github.com/apache/incubator-gluten/discussions/8849#discussioncomment-12395428 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
