Hi all,

  I'd like to propose that we finalize the scope of the next open-source
  release 1.3.8 and start the release process as soon as possible. The main
  motivation is a critical query bug that affects both v1.3.6 and v1.3.7.
  The fix is already merged on dev/1.3, so 1.3.8 should at least include it.

  == Bug description and severity ==

  Under a specific but fairly common data pattern, a query on an aligned
  device enters a livelock: it never returns and never errors out, while the
  query driver thread spins at ~100% CPU repeatedly burning its full time
  slice, until the query finally hits the timeout. I would rate this as
  critical:

    * The affected query always fails (by timeout), with no error message
      that points to the cause, which makes it very hard for users to
      diagnose.
    * Each stuck query pins query driver threads at full CPU for the entire
      timeout window. A handful of such queries can saturate the query
      thread pool and CPU, degrading all other queries on the node.
    * The trigger pattern is common in practice: an aligned device where the
      queried measurement is sparse (contains nulls), combined with a time
      range filter. Aggregations (e.g. count) and raw queries are both
      affected, in both ASC and DESC order.

  We hit this in production on v1.3.6: EXPLAIN ANALYZE snapshots showed the
  scan operators' CPU time growing linearly (~60s per 15s wall time across
  driver threads) while output rows and all I/O statistics stayed completely
  frozen, and CPU flame graphs showed ~90% of samples inside
  SeriesScanUtil.initFirstChunkMetadata (with ~1/3 of that in the
  System.nanoTime() calls of the time-slice guard loop, i.e. a pure busy
  wait).

  == How to reproduce (verified on 1.3.6 / 1.3.7) ==

    CREATE DATABASE root.sg1;
    INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (1, 1, 1);
    INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (2, null, 2);
    INSERT INTO root.sg1.d1(timestamp, s1, s2) ALIGNED VALUES (3, null, 3);
    FLUSH;
    SELECT s1 FROM root.sg1.d1 WHERE time >= 3 AND time <= 4 ORDER BY time
DESC;

  Expected: an empty result set. Actual: the query hangs until timeout.
  An ascending variant triggers the same livelock, e.g.
  "SELECT count(s1) FROM ... WHERE time <= X" when s1's non-null values all
  lie after X (this is the shape we hit in production).

  == Root cause ==

  Two statistics sources got out of sync:

    * File-level pruning (TimeFilter#canSkip) used the *time-column*
      statistics of the aligned timeseries metadata.
    * SeriesScanUtil's overlap checks use
ITimeSeriesMetadata#getStatistics(),
      which for a single-measurement aligned scan returns the *value-column*
      statistics (the non-null range, a subset of the time-column range).

  Since v1.3.6, the memtable scan optimization (commit dbc0133a on dev/1.3)
  additionally clamps the overlap-check endpoint by the global time filter
    * SeriesScanUtil's overlap checks use
ITimeSeriesMetadata#getStatistics(),
      which for a single-measurement aligned scan returns the *value-column*
      statistics (the non-null range, a subset of the time-column range).

  Since v1.3.6, the memtable scan optimization (commit dbc0133a on dev/1.3)
  additionally clamps the overlap-check endpoint by the global time filter
  range. As a result, a file whose time-column range overlaps the filter but
  whose queried measurement has no non-null value inside the filter range
  passes canSkip() and gets loaded, yet the clamped endpoint can never
  overlap the metadata's own statistics. initFirstChunkMetadata() then
  neither unpacks nor discards firstTimeSeriesMetadata, hasNextChunk() keeps
  returning Optional.empty(), and the operator's time-slice loop spins
  forever. v1.3.5 and earlier are not affected because the overlap endpoint
  was the metadata's own endTime, which always overlaps itself.

  == The fix ==

  Already on dev/1.3:

    * apache/tsfile#716 — TimeFilter.canSkip()/allSatisfy() now use
      getStatistics(), consistent with the scan-side overlap checks.
      (develop-branch equivalent: apache/tsfile#715)
    * apache/iotdb#17120 — bumps dev/1.3 to a tsfile version containing the
      fix and adds a regression IT
      (testQueryWithGlobalTimeFilterOrderByTimeDesc).

  Note that dev/1.3 currently depends on tsfile 1.1.4-SNAPSHOT, so an
  official tsfile 1.1.4 release is a prerequisite for releasing IoTDB 1.3.8.

  == Proposal ==

    1. Release tsfile 1.1.4 (dev/1.1) first.
    2. Cut rc/1.3.8 from dev/1.3 shortly after, which already contains the
       fix above as well as several other correctness fixes in the same area
       (e.g. #16993, #16970).
    3. If you have other fixes or changes that should go into 1.3.8, please
       reply in this thread so we can settle the scope quickly.

  Any feedback is welcome.

  Best regards,
  ----------------
  Yuan Tian

Reply via email to