Hi,

Congratulations, this is a big step forward for the TSFile ecosystem. The next 
step might be further integration with pandas.

Additionally, we might want to pay attention to a potential open-source 
contributor: https://github.com/huggingface/datasets/pull/7933. It would be 
better for the health of the community to respond and explain why we did not 
provide any feedback but still moved forward with this work.

Best
Xinyu Tan


Hongzhi Gao <[email protected]> 于2026年6月4日周四 12:32写道:
Congratulations!

This is a fantastic milestone for TsFile. Native support in Hugging Face 
Datasets will significantly increase TsFile's visibility and usability in the 
AI/ML community.

Thanks for sharing the news!

Best regards,
hongzhigao

On 2026/06/04 04:20:11 Yuan Tian wrote:
> Hi all,
> 
> I'm happy to share some news that connects TsFile with the broader AI/ML
> world: Hugging Face's `datasets` library now has native, built-in support
> for the TsFile format. The pull request was merged on June 1st:
> 
>   https://github.com/huggingface/datasets/pull/8160
> 
> For those less familiar with that ecosystem, a quick introduction.
> 
> About Hugging Face
> ------------------
> Hugging Face is the most widely used open hub for the AI/ML community. Its
> Hub hosts hundreds of thousands of openly shared models and datasets, and
> the companion `datasets` Python library is the standard tool practitioners
> use to load training data. A single load_dataset(...) call handles
> downloading, caching, streaming of larger-than-memory data, and conversion
> to PyTorch / TensorFlow / JAX / NumPy / Arrow. Anything published to the Hub
> gets free hosting, versioning, and an automatic online preview. In short:
> once data is on the Hub in a format `datasets` understands, the whole
> community can load it in one line -- and TsFile is now one of those formats.
> 
> What the integration does
> -------------------------
> .tsfile files are auto-detected, so loading is simply:
> 
>   from datasets import load_dataset
>   ds = load_dataset("tsfile", data_files="my_data.tsfile")
> 
> The loader is time-series-aware and follows the TsFile table model rather
> than a generic tabular layout:
> 
>   - It emits one row per device. TAG columns become scalar strings, while
>     the time column and each FIELD become Arrow list<...> columns holding
>     that device's full series.
>   - start_time / end_time filters are pushed down to TsFile's internal time
>     index, so only the matching blocks are read from disk.
>   - It also handles schema evolution across files (column union + IoTDB
>     numeric widening), table/column selection, timestamp unit & timezone,
>     and configurable batching for memory control.
> 
> It relies on the `tsfile` PyPI package (pip install "tsfile>=2.3.0"), lazily
> imported so users who don't touch TsFile data pay no cost.
> 
> Documentation
> -------------
> The change is already merged into main. The official docs at
> huggingface.co/docs/datasets are expected to reflect it in about 30 days,
> but the rendered guide is viewable right now:
> 
>   https://moon-ci-docs.huggingface.co/docs/datasets/pr_8160/en/tsfile_load
> 
> A call to action
> ----------------
> There is already a working example dataset on the Hub -- tsfile/lotsa_data
> -- which you can load directly with load_dataset("tsfile/lotsa_data").
> 
> If you have time-series datasets, please consider publishing them to the
> Hugging Face Hub as .tsfile files. No conversion is required: they are
> auto-detected and become usable by anyone with a single load_dataset call.
> This is a great opportunity to make TsFile a first-class format for
> time-series data in the AI community and to grow the public collection of
> openly available time-series datasets.
> 
> Happy to answer any questions.
> 
> Best regards,
> ----------------
> Yuan Tian
> 

Reply via email to