GitHub user A0R0P0I7T edited a discussion: [GSoC 2026 Aspiring Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table Mode
Hi Apache IoTDB Community, My name is Arpit Saha, and I am an undergrad Information and Communication Technology student applying for Google Summer of Code 2026. I am writing to introduce myself, share the technical research I have conducted so far, and present my current progress toward the Flink Connector for Apache IoTDB 2.X Table Mode project. I have been in contact with mentor Mr. Haonan Hou, who has been kind enough to guide me toward the relevant resources and codebase. While I have not yet made direct contributions to the IoTDB repository, I have been investing significant effort into understanding the codebase, the existing connector limitations, and the architectural requirements of this project. --- **Background** I have prior open source experience with Apache Gravitino, which gave me familiarity with Java-based codebases and the Apache contribution workflow. This has allowed me to navigate the IoTDB codebase comfortably from the start. --- **Codebase Analysis** I studied the existing Flink-IoTDB connectors in `iotdb-extras` along with the `flink-tsfile-connector` and identified the following gaps: - The `flink-iotdb-connector` (tree mode) uses the deprecated `SourceFunction` API with a hardcoded SQL string, no split-enumerator architecture, no TAG/FIELD awareness, and no fault tolerance — making it insufficient for table mode. - The `flink-sql-iotdb-connector` delegates execution to Flink's SQL planner via a factory/provider pattern and it deepened my understanding of how Flink's SQL/Table API layer integrates with IoTDB as a registered table source.. - I also went through the `flink-tsfile-connector` to understand how a more complete connector is structured — how the base classes, test infrastructure, execution environments, and input formats are organized. This gave me a much clearer picture of what a well-structured connector looks like before I began designing my own. The existing `flink-iotdb-connector` was built entirely around tree-mode path semantics and the deprecated `SourceFunction` API — it has no awareness of IoTDB 2.X's table model, TAG/FIELD structure, or modern streaming source architecture, which is precisely the gap this project addresses. --- **Understanding FLIP-27 and Why It Matters** Going through the FLIP-27 documentation was the most valuable part of my research. The core insight is the clean separation between the `SplitEnumerator` — which dynamically generates time-range splits as new IoTDB data arrives — and the `SourceReader` — which independently executes bounded queries per split and emits records downstream. Splitting by timestamp boundaries maps naturally onto IoTDB's time-series model and enables true parallelism — multiple readers processing different time windows simultaneously. Fault tolerance follows directly from this design: since each split tracks its own progress independently, the framework checkpoints every reader separately and resumes from exactly where it left off after a crash — something the older SourceFunction approach simply cannot do reliably. --- **Current Progress** I built a preliminary prototype using the older `SourceFunction` API first — not as the final approach, but to validate my understanding of IoTDB session interactions, table queries, and basic execution flow in a controlled setting: https://github.com/A0R0P0I7T/Flink-IoTDB-Table-Connector I have now begun the FLIP-27 based POC, starting with `IoTDBSplit` implementing `SourceSplit` with a unique `splitId` for identification, `startTime`/`endTime` boundaries as the unit of parallelism, and a `currentOffset` to track reader progress within each split for precise fault-tolerant recovery. The immediate next steps are building the `SourceReader` with proper `RowData` emission and the dynamic `SplitEnumerator` that continuously generates splits as new time ranges become available. After that the focus shifts to TAG-based filter and projection pushdown, proper schema mapping, and checkpointing validation. I will have a working end-to-end POC within the next 24 hours. --- I am excited about both the database internals and distributed systems aspects of this project and welcome any feedback or suggestions from the community. Thank you for your time. Arpit Saha P.S. — I am really enjoying diving deep into the core fundamentals of connector architecture through this research — understanding how each piece from split design to fault tolerance fits together is proving to be one of the more rewarding learning experiences I have had, and it is making me even more motivated to build this the right way. GitHub link: https://github.com/apache/iotdb/discussions/17248 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
