Re: [GSoC 2026] Interest in GSOC-304: ThingsBoard Integration with IoTDB 2.x Table Model

Wang Critas Thu, 19 Mar 2026 01:23:55 -0700

Hi Zihan,

Thank you for your detailed and insightful email. Your understanding of the 
current ThingsBoard IoTDB integration and the challenges of migrating to the 
2.x Table Model is spot on. The work plan you outlined is clear and feasible, 
and your contribution experience across multiple Apache projects gives me 
confidence in your ability to succeed.

Here are my answers to your three specific questions.

1. Single generic table vs. per-device-profile tables
Recommendation: Start with a single generic table

* Preserves the flexibility of adding arbitrary telemetry keys at runtime
without DDL.

* Simple mapping, consistent with Cassandra/PostgreSQL integrations, easy
for users to understand.

* IoTDB's table model handles high‑cardinality tags (like key) efficiently,
and partitioning can further optimize performance.

* If bottlenecks arise later, we can consider a dedicated latest‑value
table or per‑entity tables as optimizations.

2. Aggregation queries: direct SQL translation vs. adapter
Recommendation: Prioritize direct use of IoTDB's built‑in SQL functions

* ThingsBoard's aggregation semantics (MIN/MAX/AVG/SUM/COUNT) align with
SQL standards and can be mapped directly.

* Use time window functions like date_bin to reduce code maintenance.

* During testing, focus on interval boundaries and null handling; add a
lightweight adapter only if discrepancies arise.

3. Retention and latest‑value reads
Recommendation: Leverage IoTDB's native features first

* Table TTL directly meets retention requirements, avoiding
re‑implementation.

* Use ORDER BY time DESC LIMIT 1 for latest‑value queries – simple and
efficient.

* Implement natively and benchmark performance; if findAllLatest becomes a
bottleneck, consider caching or a dedicated latest‑value table later.

If you have any further questions, feel free to continue the discussion on the
mailing list or chat channel. Looking forward to seeing your formal proposal!

Best regards,

Xuan Wang

发件人: Zh D <[email protected]>
日期: 星期四, 2026年3月19日 15:25
收件人: [email protected] <[email protected]>
主题: [GSoC 2026] Interest in GSOC-304: ThingsBoard Integration with IoTDB 2.x
Table Model

Hi all,

My name is Zihan Dai, a CS student at the University of Melbourne. I'm
writing about GSOC-304.

I've been reading the public ThingsBoard persistence layer around
TimeseriesDao, TimeseriesLatestDao, TimeseriesService, and
TsKvEntry/BasicTsKvEntry, along with the IoTDB ThingsBoard docs for
the current adapted build (DATABASE_TS_TYPE=iotdb,
DATABASE_TS_LATEST_TYPE=iotdb, IoTDB_DATABASE=root.thingsboard). That
setup is clearly tree-oriented: arbitrary telemetry keys fit naturally
as path segments under root.thingsboard, while ThingsBoard expects
storage and retrieval in terms of timestamped TsKvEntry values, latest
lookups (findLatest, findAllLatest, saveLatest), range reads
(findAllAsync over ReadTsKvQuery), and dashboard aggregations
(NONE/MIN/MAX/AVG/SUM/COUNT). The wider model still includes devices,
attributes, relations, and alarms, but the main storage pressure point
is telemetry/latest.

The 2.x migration looks interesting because it is not just replacing
insertRecord(deviceId, time, measurements, values) with another write
call. Table mode uses ITableSession / ITableSessionPool
(TableSessionBuilder, TableSessionPoolBuilder), CREATE TABLE ... TAG /
ATTRIBUTE / FIELD, and session.insert(tablet) backed by
insertRelationalTablet(), with SQL queries over built-in time and TAG
columns. The main design problem is mapping ThingsBoard's dynamic
telemetry keys and mixed value types (BOOLEAN, STRING, LONG, DOUBLE,
JSON, with JSON likely serialized as STRING/TEXT) onto fixed table
schemas. A single generic table keyed by tenant_id, entity_type,
entity_id, and key keeps TsKvEntry mapping simple and avoids DDL
churn, but it creates sparse typed columns and high-cardinality key
tags. Per-device-profile or per-key tables improve locality and query
shape, but they work against ThingsBoard's runtime key flexibility.

Here's how I'd phase the work over 12 weeks:

Weeks 1-3, Design & Prototype:
Study the existing ThingsBoard IoTDB adapter code and the
TimeseriesDao/TimeseriesLatestDao interfaces in depth. Build a minimal
prototype connecting ITableSession to ThingsBoard's saveTsKvEntity()
and findLatest() paths. Settle the schema design question (single
generic table vs per-profile tables) with mentor input. Deliver a
design doc and a working write+read PoC.

Weeks 4-7, Core DAO Implementation:
Implement the full TimeseriesDao interface against Table Model --
save(), saveLatest(), findAllAsync() over ReadTsKvQuery,
findLatest()/findAllLatest(), and deleteTs(). Handle type mapping
(BOOLEAN/STRING/LONG/DOUBLE/JSON -> Table Model column types). Add
unit tests against an embedded or dockerized IoTDB 2.x instance.

Weeks 8-10, Aggregation & Retention:
Implement dashboard aggregation queries (MIN/MAX/AVG/SUM/COUNT) using
Table Model SQL. Implement retention management (TTL or explicit
cleanup()/savePartition() equivalents). Integration tests with
ThingsBoard's telemetry subscription and dashboard rendering paths.

Weeks 11-12, Polish & Migration:
Write a migration guide for users upgrading from Tree Mode.
Performance benchmarking (write throughput, query latency) against the
existing Tree Mode adapter. Final code review, documentation, and blog
post.

On my side, I have two open IoTDB PRs (#17180 and #17212) on logging
and resource management. Across the broader Apache ecosystem and
beyond, I have 5 merged PRs in Apache Beam (resource leak fixes in
KafkaIO, serialization improvements, API changes), 2 merged in Apache
ShardingSphere (resource leak and configuration fixes, both merged
same-day by PMC), and 1 merged in OpenCV (#28502, documentation fix).

A few concrete questions:

- For telemetry, do you prefer a single generic table keyed by
tenant/entity/key, or separate tables per device profile / schema
domain?

- For ReadTsKvQuery aggregations used by dashboards, would you expect
direct SQL translation using date_bin / date_bin_gapfill, or an
adapter that first preserves current ThingsBoard aggregation
semantics?

- For retention and latest-value reads, should the table-model backend
rely on IoTDB table TTL and SQL latest queries, or keep explicit
equivalents of ThingsBoard's savePartition() / cleanup() and a
dedicated latest-value structure?

Thanks,
Zihan Dai
GitHub:
https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPDGGK&data=05%7C02%7C%7Ca4e46aa59aec4588bf6208de8588a5a7%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639095019130184891%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=QFoznMs7U7%2BEPFnttL1G3Eku8rGq1cXL%2BJCsgME77%2F0%3D&reserved=0<https://github.com/PDGGK>

Re: [GSoC 2026] Interest in GSOC-304: ThingsBoard Integration with IoTDB 2.x Table Model

Reply via email to