GitHub user A0R0P0I7T added a comment to the discussion: [GSoC 2026 Aspiring Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table Mode
Hi everyone, Final progress update — the FLIP-27 based POC is now complete, tested, and pushed to the repository with full setup instructions. The finished repository is here: https://github.com/A0R0P0I7T/Flink-IoTDB-Table-Connector Since the last update I completed `IoTDBSourceReader`, `IoTDBSource`, `VoidSerializer`, `FlinkJob`, a JUnit test class, and a `DataSetup` utility that automatically creates the required database, table, and sample sensor records so the connector can be run without any manual `IoTDB` setup. The `README` has also been updated with architecture documentation, component descriptions, quick start instructions, and annotated output screenshots. A few non-obvious issues came up during implementation worth documenting: - `pollNext()` must return `InputStatus` rather than void — `MORE_AVAILABLE` while rows remain in the current dataset, `NOTHING_AVAILABLE` when waiting for the next split, and `END_OF_INPUT` once `notifyNoMoreSplits()` is received and the queue is empty. - This three-state signaling is what allows Flink to schedule readers efficiently rather than polling blindly. - `isAvailable()` must return `CompletableFuture.completedFuture(null)` rather than null — returning null causes a `NullPointerException` inside Flink's internal future chaining, which is not obvious from the interface signature alone. - `closeOperationHandle()` must be called after exhausting each split's dataset — without it the server-side cursor stays open on `IoTDB` and connection pool exhaustion accumulates silently across parallel readers over time. - `pollNext()` emits one row per call rather than looping through the entire dataset in one invocation — blocking the task thread for a full query prevents Flink from checkpointing, handling watermarks, or responding to backpressure during that period. - `USE factory_db` is executed once in `start()` rather than per query — a small but meaningful efficiency distinction when many splits are processed per reader. The current splitting logic divides the global time range by parallelism, producing exactly as many splits as readers. This is intentional for a POC but the planned production improvement is fixed-size time splits independent of parallelism, ensuring splits >> parallelism so faster readers automatically pick up additional work and load balances naturally across uneven data distributions. This update was originally planned for Saturday — a Monday examination required significant preparation time over the weekend which pushed completion by a couple of days. I will be looking at open issues in the repository in the meantime while continuing to refine the connector toward production quality. Looking forward to any feedback from the community. GitHub link: https://github.com/apache/iotdb/discussions/17248#discussioncomment-16074598 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
