Hi all,
I'd like to propose that we finalize the scope of the next open-source
release 1.3.8 and start the release process as soon as possible. The main
motivation is a critical query bug that affects both v1.3.6 and v1.3.7.
The fix is already merged on dev/1.3, so 1.3.8 should at least include it.
== Bug description and severity ==
Under a specific but fairly common data pattern, a query on an aligned
device enters a livelock: it never returns and never errors out, while the
query driver thread spins at ~100% CPU repeatedly burning its full time
slice, until the query finally hits the timeout. I would rate this as
critical:
* The affected query always fails (by timeout), with no error message
that points to the cause, which makes it very hard for users to
diagnose.
* Each stuck query pins query driver threads at full CPU for the entire
timeout window. A handful of such queries can saturate the query
thread pool and CPU, degrading all other queries on the node.
* The trigger pattern is common in practice: an aligned device where the
queried measurement is sparse (contains nulls), combined with a time
range filter. Aggregations (e.g. count) and raw queries are both
affected, in both ASC and DESC order.
We hit this in production on v1.3.6: EXPLAIN ANALYZE snapshots showed the
scan operators' CPU time growing linearly (~60s per 15s wall time across
driver threads) while output rows and all I/O statistics stayed completely
frozen, and CPU flame graphs showed ~90% of samples inside
SeriesScanUtil.initFirstChunkMetadata (with ~1/3 of that in the
System.nanoTime() calls of the time-slice guard loop, i.e. a pure busy
wait).
== How to reproduce (verified on 1.3.6 / 1.3.7) ==
CREATE DATABASE root.sg1;
INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (1, 1, 1);
INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (2, null, 2);
INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (3, null, 3);
FLUSH;
SELECT s1 FROM root.sg1.d1 WHERE time >= 3 AND time <= 4 ORDER BY time
DESC;
Expected: an empty result set. Actual: the query hangs until timeout.
An ascending variant triggers the same livelock, e.g.
"SELECT count(s1) FROM ... WHERE time <= X" when s1's non-null values all
lie after X (this is the shape we hit in production).
== Root cause ==
Two statistics sources got out of sync:
* File-level pruning (TimeFilter#canSkip) used the *time-column*
statistics of the aligned timeseries metadata.
* SeriesScanUtil's overlap checks use
ITimeSeriesMetadata#getStatistics(),
which for a single-measurement aligned scan returns the *value-column*
statistics (the non-null range, a subset of the time-column range).
Since v1.3.6, the memtable scan optimization (commit dbc0133a on dev/1.3)
additionally clamps the overlap-check endpoint by the global time filter
* SeriesScanUtil's overlap checks use
ITimeSeriesMetadata#getStatistics(),
which for a single-measurement aligned scan returns the *value-column*
statistics (the non-null range, a subset of the time-column range).
Since v1.3.6, the memtable scan optimization (commit dbc0133a on dev/1.3)
additionally clamps the overlap-check endpoint by the global time filter
range. As a result, a file whose time-column range overlaps the filter but
whose queried measurement has no non-null value inside the filter range
passes canSkip() and gets loaded, yet the clamped endpoint can never
overlap the metadata's own statistics. initFirstChunkMetadata() then
neither unpacks nor discards firstTimeSeriesMetadata, hasNextChunk() keeps
returning Optional.empty(), and the operator's time-slice loop spins
forever. v1.3.5 and earlier are not affected because the overlap endpoint
was the metadata's own endTime, which always overlaps itself.
== The fix ==
Already on dev/1.3:
* apache/tsfile#716 — TimeFilter.canSkip()/allSatisfy() now use
getStatistics(), consistent with the scan-side overlap checks.
(develop-branch equivalent: apache/tsfile#715)
* apache/iotdb#17120 — bumps dev/1.3 to a tsfile version containing the
fix and adds a regression IT
(testQueryWithGlobalTimeFilterOrderByTimeDesc).
Note that dev/1.3 currently depends on tsfile 1.1.4-SNAPSHOT, so an
official tsfile 1.1.4 release is a prerequisite for releasing IoTDB 1.3.8.
== Proposal ==
1. Release tsfile 1.1.4 (dev/1.1) first.
2. Cut rc/1.3.8 from dev/1.3 shortly after, which already contains the
fix above as well as several other correctness fixes in the same area
(e.g. #16993, #16970).
3. If you have other fixes or changes that should go into 1.3.8, please
reply in this thread so we can settle the scope quickly.
Any feedback is welcome.
Best regards,
----------------
Yuan Tian