Hi all, I’d like to introduce a new feature in TsFile Python API:

PR: https://github.com/apache/tsfile/pull/816

Overview

This PR extends TsFileDataFrame to support tree-model TsFile files in 
addition to the existing table-model, while keeping the current dataset API 
unchanged.

In the same change set, it also refactors the internal dataset index to reduce 
memory overhead, especially for sparse and wide schemas.

Key changes

1. Tree-model support

Automatically detects table vs tree model when opening a file

Adapts tree structure into a virtual table view (device path → columns)

Ensures only actually written (device, field) pairs are registered (no 
phantom series)

Tree reads are executed via query_table_on_tree with client-side adaptation

Prevents mixing table-model and tree-model inputs in one dataset

2. Dataset index optimization

Replace per-series dict with compact NamedTuple representation

Remove _DerivedCache and compute derived views lazily

Aggregate device time bounds for O(1) query access

Remove unused placeholder series entries in table model

General cleanup of internal catalog structures

3. New public API

TsFileDataFrame.model → indicates "table" or "tree"

list_timeseries_metadata() → unified metadata view for both models

Compatibility

No change to existing dataset APIs (__getitem__, .loc, __len__, etc.)

No change to on-disk format or C++/Java layers

Existing table-model workflows remain fully compatible

Tree-model support is additive only

Memory impact

Up to ~33% reduction in dense workloads

Up to ~38% reduction in sparse/wide schemas due to removal of phantom 
series tracking

Testing

All existing tests pass (41/41)

New tests added for tree-model functionality, alignment, metadata, and 
mixed-model safety

Feedback

Feedback is welcome, especially on:

Tree-model → table abstraction design

Query routing strategy in tree mode

Any edge cases in mixed schema handling

Thanks!

Best regards,
Le Yang




Lyyy
[email protected]

Reply via email to