Hi xinyu, You're right, I'll reply to him with my explanation. The main reason is that we use the new api(TsFileDataframe) in 2.3.0 which is more effective and is a totally different implementation from his pr.
On Thu, Jun 4, 2026 at 5:32 PM Xinyu Tan <[email protected]> wrote: > Hi, > > Congratulations, this is a big step forward for the TSFile ecosystem. The > next step might be further integration with pandas. > > Additionally, we might want to pay attention to a potential open-source > contributor: https://github.com/huggingface/datasets/pull/7933. It would > be better for the health of the community to respond and explain why we did > not provide any feedback but still moved forward with this work. > > Best > Xinyu Tan > > > Hongzhi Gao <[email protected]> 于2026年6月4日周四 12:32写道: > Congratulations! > > This is a fantastic milestone for TsFile. Native support in Hugging Face > Datasets will significantly increase TsFile's visibility and usability in > the AI/ML community. > > Thanks for sharing the news! > > Best regards, > hongzhigao > > On 2026/06/04 04:20:11 Yuan Tian wrote: > > Hi all, > > > > I'm happy to share some news that connects TsFile with the broader AI/ML > > world: Hugging Face's `datasets` library now has native, built-in support > > for the TsFile format. The pull request was merged on June 1st: > > > > https://github.com/huggingface/datasets/pull/8160 > > > > For those less familiar with that ecosystem, a quick introduction. > > > > About Hugging Face > > ------------------ > > Hugging Face is the most widely used open hub for the AI/ML community. > Its > > Hub hosts hundreds of thousands of openly shared models and datasets, and > > the companion `datasets` Python library is the standard tool > practitioners > > use to load training data. A single load_dataset(...) call handles > > downloading, caching, streaming of larger-than-memory data, and > conversion > > to PyTorch / TensorFlow / JAX / NumPy / Arrow. Anything published to the > Hub > > gets free hosting, versioning, and an automatic online preview. In short: > > once data is on the Hub in a format `datasets` understands, the whole > > community can load it in one line -- and TsFile is now one of those > formats. > > > > What the integration does > > ------------------------- > > .tsfile files are auto-detected, so loading is simply: > > > > from datasets import load_dataset > > ds = load_dataset("tsfile", data_files="my_data.tsfile") > > > > The loader is time-series-aware and follows the TsFile table model rather > > than a generic tabular layout: > > > > - It emits one row per device. TAG columns become scalar strings, while > > the time column and each FIELD become Arrow list<...> columns holding > > that device's full series. > > - start_time / end_time filters are pushed down to TsFile's internal > time > > index, so only the matching blocks are read from disk. > > - It also handles schema evolution across files (column union + IoTDB > > numeric widening), table/column selection, timestamp unit & timezone, > > and configurable batching for memory control. > > > > It relies on the `tsfile` PyPI package (pip install "tsfile>=2.3.0"), > lazily > > imported so users who don't touch TsFile data pay no cost. > > > > Documentation > > ------------- > > The change is already merged into main. The official docs at > > huggingface.co/docs/datasets are expected to reflect it in about 30 > days, > > but the rendered guide is viewable right now: > > > > > https://moon-ci-docs.huggingface.co/docs/datasets/pr_8160/en/tsfile_load > > > > A call to action > > ---------------- > > There is already a working example dataset on the Hub -- > tsfile/lotsa_data > > -- which you can load directly with load_dataset("tsfile/lotsa_data"). > > > > If you have time-series datasets, please consider publishing them to the > > Hugging Face Hub as .tsfile files. No conversion is required: they are > > auto-detected and become usable by anyone with a single load_dataset > call. > > This is a great opportunity to make TsFile a first-class format for > > time-series data in the AI community and to grow the public collection of > > openly available time-series datasets. > > > > Happy to answer any questions. > > > > Best regards, > > ---------------- > > Yuan Tian > > >
