GitHub user A0R0P0I7T added a comment to the discussion: [GSoC 2026 Aspiring 
Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table 
Mode

Hi everyone,
Final progress update — the FLIP-27 based POC is now complete, tested, and 
pushed to the repository with full setup instructions.

The finished repository is here: 
https://github.com/A0R0P0I7T/Flink-IoTDB-Table-Connector

Since the last update I completed `IoTDBSourceReader`, `IoTDBSource`, 
`VoidSerializer`, `FlinkJob`, a JUnit test class, and a `DataSetup` utility 
that automatically creates the required database, table, and sample sensor 
records so the connector can be run without any manual `IoTDB` setup. The 
`README` has also been updated with architecture documentation, component 
descriptions, quick start instructions, and annotated output screenshots.
A few non-obvious issues came up during implementation worth documenting:

- `pollNext()` must return `InputStatus` rather than void — `MORE_AVAILABLE` 
while rows remain in the current dataset, `NOTHING_AVAILABLE` when waiting for 
the next split, and `END_OF_INPUT` once `notifyNoMoreSplits()` is received and 
the queue is empty. - This three-state signaling is what allows Flink to 
schedule readers efficiently rather than polling blindly.
- `isAvailable()` must return `CompletableFuture.completedFuture(null)` rather 
than null — returning null causes a `NullPointerException` inside Flink's 
internal future chaining, which is not obvious from the interface signature 
alone.
- `closeOperationHandle()` must be called after exhausting each split's dataset 
— without it the server-side cursor stays open on `IoTDB` and connection pool 
exhaustion accumulates silently across parallel readers over time.
- `pollNext()` emits one row per call rather than looping through the entire 
dataset in one invocation — blocking the task thread for a full query prevents 
Flink from checkpointing, handling watermarks, or responding to backpressure 
during that period.
- `USE factory_db` is executed once in `start()` rather than per query — a 
small but meaningful efficiency distinction when many splits are processed per 
reader.

The current splitting logic divides the global time range by parallelism, 
producing exactly as many splits as readers. This is intentional for a POC but 
the planned production improvement is fixed-size time splits independent of 
parallelism, ensuring splits >> parallelism so faster readers automatically 
pick up additional work and load balances naturally across uneven data 
distributions.

This update was originally planned for Saturday — a Monday examination required 
significant preparation time over the weekend which pushed completion by a 
couple of days. I will be looking at open issues in the repository in the 
meantime while continuing to refine the connector toward production quality.

Looking forward to any feedback from the community.

GitHub link: 
https://github.com/apache/iotdb/discussions/17248#discussioncomment-16074598

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to