GitHub user anukalp2804 added a comment to the discussion: [GSoC 2026] Flink connector for IoTDB table mode – prospective contributor
Hello Haonan, I am very excited to see the "Flink connector for IoTDB 2.X Table Mode" on the official GSoC ideas list! As discussed previously, I recently built a basic PoC using TableSessionBuilder to understand the 2.0 API, and now I'd like to outline my plan for the production-ready architecture. I have analyzed the existing iotdb-extras repository (flink-iotdb-connector and flink-sql-iotdb-connector) to identify the current limitations with table mode. I've also been studying the Flink and IoTDB official documentation regarding Modeling Scheme Design and Syntax Overview. Here is my proposed approach: 1. Architecture: I plan to build a utility that maps IoTDB's Table API data types (like INT32, INT64, FLOAT, DOUBLE, TEXT, BOOLEAN) directly to Flink's internal types. The mapper will explicitly handle IoTDB's schema structure, mapping the time column to Flink's TIMESTAMP, and properly distinguishing tags and fields. To ensure compatibility with Flink 1.18+, I will avoid legacy source functions and instead use the modern FLIP-27 architecture for the source, and SinkV2 for the sink. 2. Implement IoTDB Table Source: For reading data, I will implement the IoTDBTableSource. To optimize performance and avoid downloading unnecessary data over the network, I'll implement SupportsFilterPushDown. This ensures that Flink passes WHERE clauses (like time ranges and tags) down to IoTDB so filtering happens natively at the storage layer. 3. Implement IoTDB Table Sink: I will create the IoTDBTableSink to support both BOUNDED (batch) and UNBOUNDED (streaming) data. For a better user experience, if a user tries to insert data into a non-existent table, the sink will intercept the "Table Not Found" error and automatically send a CREATE TABLE command based on the Flink schema. I will also implement robust error handling if data violates constraints (like a TTL or a type mismatch), the sink will log a warning and drop the bad row without crashing the entire Flink job. 4. Testing & Documentation: I know the code is only half the project. I plan to write unit tests for the schema mapping. For integration testing, I will use Flink's MiniCluster alongside IoTDB's test clusters. I'll write clear documentation with SQL queries examples and a configuration table. Additionally, I plan to create a short video tutorial explaining how to use the connector, which should help both community and new open-source contributors. 5. Community Contributions: I will submit the PRs to the upstream iotdb-extras repository. Along with the PRs, I will include a fully runnable example Flink job (e.g., a simple factory temperature aggregation demo) so developers can clone it, run the main() method, and see the Table API working immediately. 6. Advanced: To prove the efficiency of the parallel read/write optimizations mentioned above, my final goal is to develop benchmarks comparing the new table-mode connector against the legacy tree-mode connector. I will focus on measuring throughput, latency, and resource usage under heavy IoT scenarios. My Question is Before I start drafting my formal proposal document, are there any specific changes regarding IoTDB's dynamic table changes or schema inference that you would like me to explicitly cover in the design? (P.S. I also created a small PR in the main IoTDB repo. It would really appreciate. https://github.com/apache/iotdb/pull/17129) Thanks for your valuable time and guidance! Anukalp Pandey GitHub link: https://github.com/apache/iotdb/discussions/17128#discussioncomment-15950183 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
