Re: Hugging Face datasets now natively supports the TsFile format

Yuan Tian Thu, 04 Jun 2026 06:53:50 -0700

Hi xinyu,

You're right, I'll reply to him with my explanation. The main reason is
that we use the new api(TsFileDataframe) in 2.3.0 which is more effective
and is a totally different implementation from his pr.


On Thu, Jun 4, 2026 at 5:32 PM Xinyu Tan <[email protected]> wrote:

> Hi,
>
> Congratulations, this is a big step forward for the TSFile ecosystem. The
> next step might be further integration with pandas.
>
> Additionally, we might want to pay attention to a potential open-source
> contributor: https://github.com/huggingface/datasets/pull/7933. It would
> be better for the health of the community to respond and explain why we did
> not provide any feedback but still moved forward with this work.
>
> Best
> Xinyu Tan
>
>
> Hongzhi Gao <[email protected]> 于2026年6月4日周四 12:32写道：
> Congratulations!
>
> This is a fantastic milestone for TsFile. Native support in Hugging Face
> Datasets will significantly increase TsFile's visibility and usability in
> the AI/ML community.
>
> Thanks for sharing the news!
>
> Best regards,
> hongzhigao
>
> On 2026/06/04 04:20:11 Yuan Tian wrote:
> > Hi all,
> >
> > I'm happy to share some news that connects TsFile with the broader AI/ML
> > world: Hugging Face's `datasets` library now has native, built-in support
> > for the TsFile format. The pull request was merged on June 1st:
> >
> >   https://github.com/huggingface/datasets/pull/8160
> >
> > For those less familiar with that ecosystem, a quick introduction.
> >
> > About Hugging Face
> > ------------------
> > Hugging Face is the most widely used open hub for the AI/ML community.
> Its
> > Hub hosts hundreds of thousands of openly shared models and datasets, and
> > the companion `datasets` Python library is the standard tool
> practitioners
> > use to load training data. A single load_dataset(...) call handles
> > downloading, caching, streaming of larger-than-memory data, and
> conversion
> > to PyTorch / TensorFlow / JAX / NumPy / Arrow. Anything published to the
> Hub
> > gets free hosting, versioning, and an automatic online preview. In short:
> > once data is on the Hub in a format `datasets` understands, the whole
> > community can load it in one line -- and TsFile is now one of those
> formats.
> >
> > What the integration does
> > -------------------------
> > .tsfile files are auto-detected, so loading is simply:
> >
> >   from datasets import load_dataset
> >   ds = load_dataset("tsfile", data_files="my_data.tsfile")
> >
> > The loader is time-series-aware and follows the TsFile table model rather
> > than a generic tabular layout:
> >
> >   - It emits one row per device. TAG columns become scalar strings, while
> >     the time column and each FIELD become Arrow list<...> columns holding
> >     that device's full series.
> >   - start_time / end_time filters are pushed down to TsFile's internal
> time
> >     index, so only the matching blocks are read from disk.
> >   - It also handles schema evolution across files (column union + IoTDB
> >     numeric widening), table/column selection, timestamp unit & timezone,
> >     and configurable batching for memory control.
> >
> > It relies on the `tsfile` PyPI package (pip install "tsfile>=2.3.0"),
> lazily
> > imported so users who don't touch TsFile data pay no cost.
> >
> > Documentation
> > -------------
> > The change is already merged into main. The official docs at
> > huggingface.co/docs/datasets are expected to reflect it in about 30
> days,
> > but the rendered guide is viewable right now:
> >
> >
> https://moon-ci-docs.huggingface.co/docs/datasets/pr_8160/en/tsfile_load
> >
> > A call to action
> > ----------------
> > There is already a working example dataset on the Hub --
> tsfile/lotsa_data
> > -- which you can load directly with load_dataset("tsfile/lotsa_data").
> >
> > If you have time-series datasets, please consider publishing them to the
> > Hugging Face Hub as .tsfile files. No conversion is required: they are
> > auto-detected and become usable by anyone with a single load_dataset
> call.
> > This is a great opportunity to make TsFile a first-class format for
> > time-series data in the AI community and to grow the public collection of
> > openly available time-series datasets.
> >
> > Happy to answer any questions.
> >
> > Best regards,
> > ----------------
> > Yuan Tian
> >
>

Re: Hugging Face datasets now natively supports the TsFile format

Reply via email to