GitHub user anukalp2804 added a comment to the discussion: [GSoC 2026] Flink 
connector for IoTDB table mode – prospective contributor

Hello Haonan,

I am very excited to see the "Flink connector for IoTDB 2.X Table Mode" on the 
official GSoC ideas list! As discussed previously, I recently built a basic PoC 
using TableSessionBuilder to understand the 2.0 API, and now I'd like to 
outline my plan for the production-ready architecture.

I have analyzed the existing iotdb-extras repository (flink-iotdb-connector and 
flink-sql-iotdb-connector) to identify the current limitations with table mode. 
I've also been studying the Flink and IoTDB official documentation regarding 
Modeling Scheme Design and Syntax Overview. Here is my proposed approach:

1. Architecture: I plan to build a utility that maps IoTDB's Table API data 
types (like INT32, INT64, FLOAT, DOUBLE, TEXT, BOOLEAN) directly to Flink's 
internal types. The mapper will explicitly handle IoTDB's schema structure, 
mapping the time column to Flink's TIMESTAMP, and properly distinguishing tags 
and fields. To ensure compatibility with Flink 1.18+, I will avoid legacy 
source functions and instead use the modern FLIP-27 architecture for the 
source, and SinkV2 for the sink.

2. Implement IoTDB Table Source: For reading data, I will implement the 
IoTDBTableSource. To optimize performance and avoid downloading unnecessary 
data over the network, I'll implement SupportsFilterPushDown. This ensures that 
Flink passes WHERE clauses (like time ranges and tags) down to IoTDB so 
filtering happens natively at the storage layer.

3. Implement IoTDB Table Sink: I will create the IoTDBTableSink to support both 
BOUNDED (batch) and UNBOUNDED (streaming) data. For a better user experience, 
if a user tries to insert data into a non-existent table, the sink will 
intercept the "Table Not Found" error and automatically send a CREATE TABLE 
command based on the Flink schema. I will also implement robust error handling 
if data violates constraints (like a TTL or a type mismatch), the sink will log 
a warning and drop the bad row without crashing the entire Flink job.

4. Testing & Documentation: I know the code is only half the project. I plan to 
write unit tests for the schema mapping. For integration testing, I will use 
Flink's MiniCluster alongside IoTDB's test clusters. I'll write clear 
documentation with SQL queries examples and a configuration table. 
Additionally, I plan to create a short video tutorial explaining how to use the 
connector, which should help both community and new open-source contributors.

5. Community Contributions: I will submit the PRs to the upstream iotdb-extras 
repository. Along with the PRs, I will include a fully runnable example Flink 
job (e.g., a simple factory temperature aggregation demo) so developers can 
clone it, run the main() method, and see the Table API working immediately.

6. Advanced: To prove the efficiency of the parallel read/write optimizations 
mentioned above, my final goal is to develop benchmarks comparing the new 
table-mode connector against the legacy tree-mode connector. I will focus on 
measuring throughput, latency, and resource usage under heavy IoT scenarios.

My Question is
Before I start drafting my formal proposal document, are there any specific 
changes regarding IoTDB's dynamic table changes or schema inference that you 
would like me to explicitly cover in the design?

(P.S. I also created a small PR in the main IoTDB repo. It would really 
appreciate.
https://github.com/apache/iotdb/pull/17129)

Thanks for your valuable time and guidance!
Anukalp Pandey

GitHub link: 
https://github.com/apache/iotdb/discussions/17128#discussioncomment-15950183

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to