Hi all,

My name is Zihan Dai, a CS student at the University of Melbourne. I'm
writing about GSOC-304.

I've been reading the public ThingsBoard persistence layer around
TimeseriesDao, TimeseriesLatestDao, TimeseriesService, and
TsKvEntry/BasicTsKvEntry, along with the IoTDB ThingsBoard docs for
the current adapted build (DATABASE_TS_TYPE=iotdb,
DATABASE_TS_LATEST_TYPE=iotdb, IoTDB_DATABASE=root.thingsboard). That
setup is clearly tree-oriented: arbitrary telemetry keys fit naturally
as path segments under root.thingsboard, while ThingsBoard expects
storage and retrieval in terms of timestamped TsKvEntry values, latest
lookups (findLatest, findAllLatest, saveLatest), range reads
(findAllAsync over ReadTsKvQuery), and dashboard aggregations
(NONE/MIN/MAX/AVG/SUM/COUNT). The wider model still includes devices,
attributes, relations, and alarms, but the main storage pressure point
is telemetry/latest.

The 2.x migration looks interesting because it is not just replacing
insertRecord(deviceId, time, measurements, values) with another write
call. Table mode uses ITableSession / ITableSessionPool
(TableSessionBuilder, TableSessionPoolBuilder), CREATE TABLE ... TAG /
ATTRIBUTE / FIELD, and session.insert(tablet) backed by
insertRelationalTablet(), with SQL queries over built-in time and TAG
columns. The main design problem is mapping ThingsBoard's dynamic
telemetry keys and mixed value types (BOOLEAN, STRING, LONG, DOUBLE,
JSON, with JSON likely serialized as STRING/TEXT) onto fixed table
schemas. A single generic table keyed by tenant_id, entity_type,
entity_id, and key keeps TsKvEntry mapping simple and avoids DDL
churn, but it creates sparse typed columns and high-cardinality key
tags. Per-device-profile or per-key tables improve locality and query
shape, but they work against ThingsBoard's runtime key flexibility.

Here's how I'd phase the work over 12 weeks:

Weeks 1-3, Design & Prototype:
Study the existing ThingsBoard IoTDB adapter code and the
TimeseriesDao/TimeseriesLatestDao interfaces in depth. Build a minimal
prototype connecting ITableSession to ThingsBoard's saveTsKvEntity()
and findLatest() paths. Settle the schema design question (single
generic table vs per-profile tables) with mentor input. Deliver a
design doc and a working write+read PoC.

Weeks 4-7, Core DAO Implementation:
Implement the full TimeseriesDao interface against Table Model --
save(), saveLatest(), findAllAsync() over ReadTsKvQuery,
findLatest()/findAllLatest(), and deleteTs(). Handle type mapping
(BOOLEAN/STRING/LONG/DOUBLE/JSON -> Table Model column types). Add
unit tests against an embedded or dockerized IoTDB 2.x instance.

Weeks 8-10, Aggregation & Retention:
Implement dashboard aggregation queries (MIN/MAX/AVG/SUM/COUNT) using
Table Model SQL. Implement retention management (TTL or explicit
cleanup()/savePartition() equivalents). Integration tests with
ThingsBoard's telemetry subscription and dashboard rendering paths.

Weeks 11-12, Polish & Migration:
Write a migration guide for users upgrading from Tree Mode.
Performance benchmarking (write throughput, query latency) against the
existing Tree Mode adapter. Final code review, documentation, and blog
post.

On my side, I have two open IoTDB PRs (#17180 and #17212) on logging
and resource management. Across the broader Apache ecosystem and
beyond, I have 5 merged PRs in Apache Beam (resource leak fixes in
KafkaIO, serialization improvements, API changes), 2 merged in Apache
ShardingSphere (resource leak and configuration fixes, both merged
same-day by PMC), and 1 merged in OpenCV (#28502, documentation fix).

A few concrete questions:

- For telemetry, do you prefer a single generic table keyed by
tenant/entity/key, or separate tables per device profile / schema
domain?

- For ReadTsKvQuery aggregations used by dashboards, would you expect
direct SQL translation using date_bin / date_bin_gapfill, or an
adapter that first preserves current ThingsBoard aggregation
semantics?

- For retention and latest-value reads, should the table-model backend
rely on IoTDB table TTL and SQL latest queries, or keep explicit
equivalents of ThingsBoard's savePartition() / cleanup() and a
dedicated latest-value structure?

Thanks,
Zihan Dai
GitHub: https://github.com/PDGGK

Reply via email to